Implement timer-based RCU callback batching (also known as lazy
callbacks). With this we save about 5-10% of power consumed due
to RCU requests that happen when system is lightly loaded or idle.
By default, all async callbacks (queued via call_rcu) are marked
lazy. An alternate API call_rcu_hurry() is provided for the few users,
for example synchronize_rcu(), that need the old behavior.
The batch is flushed whenever a certain amount of time has passed, or
the batch on a particular CPU grows too big. Also memory pressure will
flush it in a future patch.
To handle several corner cases automagically (such as rcu_barrier() and
hotplug), we re-use bypass lists which were originally introduced to
address lock contention, to handle lazy CBs as well. The bypass list
length has the lazy CB length included in it. A separate lazy CB length
counter is also introduced to keep track of the number of lazy CBs.
[ paulmck: Fix formatting of inline call_rcu_lazy() definition. ]
[ paulmck: Apply Zqiang feedback. ]
[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
Suggested-by: Paul McKenney <paulmck@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 3cb278e73b)
Bug: 258241771
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4909030
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: I557d5af2a5d317bd66e9ec55ed40822bb5c54390
When the bypass cblist gets too big or its timeout has occurred, it is
flushed into the main cblist. However, the bypass timer is still running
and the behavior is that it would eventually expire and wake the GP
thread.
Since we are going to use the bypass cblist for lazy CBs, do the wakeup
soon as the flush for "too big or too long" bypass list happens.
Otherwise, long delays can happen for callbacks which get promoted from
lazy to non-lazy.
This is a good thing to do anyway (regardless of future lazy patches),
since it makes the behavior consistent with behavior of other code paths
where flushing into the ->cblist makes the GP kthread into a
non-sleeping state quickly.
[ Frederic Weisbecker: Changes to avoid unnecessary GP-thread wakeups plus
comment changes. ]
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit b50606f35f)
Bug: 258241771
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4909028
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Reviewed-by: Sean Paul <sean@poorly.run>
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: If8da96d7ba6ed90a2a70f7d56f7bb03af44fd649
This patch adds GKI symbol list for Exynosauto Soc.
We need to add 1 function(flush_signals) symbol to send buffer
to other domains.
1 function symbol(s) added
'void flush_signals(struct task_struct*)'
Bug: 320368458
Signed-off-by: Daehyun Seo <dae.seo@samsung.com>
Change-Id: I66a9264b70dc24f30029b413077363996b3339cd
[ Upstream commit 3701cd390fd731ee7ae8b8006246c8db82c72bea ]
If dynset expressions provided by userspace is larger than the declared
set expressions, then bail out.
Bug: 316085841
Fixes: 48b0ae046e ("netfilter: nftables: netlink support for several set element expressions")
Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit cf5f113c41)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I4bd3f7e9148d4bc12bbc67ecdd605c2957eb8010
Add hooks to support oem's binder feature of improving certain
scenarios sched priority by moving these scenarios' work to
a fixed binder thread.
Add the following new vendor hooks to drivers/android/binder.c:
1 trace_android_vh_binder_spawn_new_thread
in our os, some binder_transaction will be marked as vip flag,
it can be named vip transaction, the binder_work within the vip
transaction can be named vip work.
here will force a thread (named vip thread) to be spawned and
skip the normal conditions to spawn a thread. vip thread will
just select vip transaction to process.
2 trace_android_vh_binder_ioctl_end
in our os, in binder_proc, about binder threads,special thread
(called vip thread) will work for special binder_transaction
(called vip transaction).
here will expand one ioctl cmd for binder driver, it will
set the max count about special thread (called vip thread) in
binder_proc, and if it has vip thread in the binder_proc.
3 trace_android_vh_binder_looper_exited
while BC_REGISTER_LOOPER cmd, will set special thread as vip
thread. the flag saved in binder_thread:looper
here will unset the vip flag saved in binder_thread,
while BC_EXIT_LOOPER cmd, if the thread is vip thread(reference
above about vip thread)
4 trace_android_vh_binder_has_special_work_ilocked
for special binder thread(called vip thread), it will deal with
special binder_work (called vip work) within special transaction
(call vip transaction),
here, if the thread is vip thread, it will check the vip work if
exist or not.
5 trace_android_vh_binder_select_special_worklist
for special binder thread(called vip thread), it will select the
worklist for special binder_work(called vip work) within special
binder_transaction(called vip transaction)
here, it will make sure the selected worklist, the head of it is
vip work within vip transaction
Bug: 318782978
Change-Id: I8e544d9be2644a6144a9cfbd477e087d46b0073f
Signed-off-by: songfeng <songfeng@oppo.com>
After demand paging is captured during APP launch,
we can do it in advance before next launch.
Add the symbols for it here.
INFO: 4 function symbol(s) added
'unsigned int filemap_get_folios(struct address_space*, unsigned long*, unsigned long, struct folio_batch*)'
'unsigned int find_get_pages_range_tag(struct address_space*, unsigned long*, unsigned long, xa_mark_t, unsigned int, struct page**)'
'void page_cache_async_ra(struct readahead_control*, struct folio*, unsigned long)'
'void page_cache_sync_ra(struct readahead_control*, unsigned long)'
Bug: 315913896
Signed-off-by: Lianjun Huang <huanglianjun@xiaomi.com>
Signed-off-by: Lianjun Huang <huanglianjun@xiaomi.corp-partner.google.com>
Change-Id: I3f42c39c6432303e69f1fbae56fabf620381d8c5
Current implementation blocks the running operations when Plug-out and
Plug-In is performed continuously, process gets stuck in
dwc3_thread_interrupt().
Code Flow:
CPU1
->Gadget_start
->dwc3_interrupt
->dwc3_thread_interrupt
->dwc3_process_event_buf
->dwc3_process_event_entry
->dwc3_endpoint_interrupt
->dwc3_ep0_interrupt
->dwc3_ep0_inspect_setup
->dwc3_ep0_stall_and_restart
By this time if pending_list is not empty, it will get the next request
on the given list and calls dwc3_gadget_giveback which will unmap request
and call its complete() callback to notify upper layers that it has
completed. Currently dwc3_gadget_giveback status is set to -ECONNRESET,
whereas it should be -ESHUTDOWN based on condition if not dwc->connected
is true.
Cc: <stable@vger.kernel.org>
Fixes: d742220b35 ("usb: dwc3: ep0: giveback requests on stall_and_restart")
Signed-off-by: Uttkarsh Aggarwal <quic_uaggarwa@quicinc.com>
Link: https://lore.kernel.org/r/20231222094704.20276-1-quic_uaggarwa@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 320413810
(cherry picked from commit e9d40b215e38480fd94c66b06d79045717a59e9c
https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git/ usb-next)
Change-Id: I7f0afebbcfa88b6b4e622a708b9838dd461661fc
Signed-off-by: Sriram Dash <quic_sriramd@quicinc.com>
Introduce a config option for each QMP PHY driver now that the QMP PHY
mega-driver has been split up into different modules. This allows kernel
configurators to limit the binary size of the kernel by only compiling
in the QMP PHY driver that they need.
Leave the old config QCOM_QMP in place and make it into a menuconfig so
that 'make olddefconfig' continues to work. Furthermore, set the default
of the new Kconfig symbols to be QCOM_QMP so that the transition is
smooth.
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/all/20230202215330.2152726-1-swboyd@chromium.org/
Bug: 319064658
Change-Id: I633e6e1bbc3e79292bfde927e46f84219f0178ae
(cherry picked from commit d1abd69534)
[quic_kuruva: Resolved minor conflict in drivers/phy/qualcomm/Kconfig ]
Signed-off-by: Rajashekar kuruva <quic_kuruva@quicinc.com>
Add symbols of vendor hooks to capture demand paging during APP launch,
so we can do it in advance in next launch.
INFO: 1 function symbol(s) added
'int __traceiter_android_vh_read_pages(void*, struct readahead_control*)'
1 variable symbol(s) added
'struct tracepoint __tracepoint_android_vh_read_pages'
Bug: 315913896
Signed-off-by: Lianjun Huang <huanglianjun@xiaomi.com>
Signed-off-by: Lianjun Huang <huanglianjun@xiaomi.corp-partner.google.com>
Change-Id: Ibb1e31b6912f7b6b92b76727f7e5043897434def
Add vendor hooks to capture demand paging during APP launch,
so we can do it in advance in next launch.
Bug: 315913896
Signed-off-by: Lianjun Huang <huanglianjun@xiaomi.com>
Signed-off-by: Lianjun Huang <huanglianjun@xiaomi.corp-partner.google.com>
Change-Id: I2698fefd347745fb4ff84b111caedbb3bb365ce3
The current placement of trace_cma_alloc_start/finish misses the fail
cases: !cma || !cma->count || !cma->bitmap.
trace_cma_alloc_finish is also not emitted for the failure case
where bitmap_count > bitmap_maxno.
Fix these missed cases by moving the start event before the failure
checks and moving the finish event to the out label.
Link: https://lkml.kernel.org/r/20240110012234.3793639-1-kaleshsingh@google.com
Fixes: 7bc1aec5e2 ("mm: cma: add trace events for CMA alloc perf testing")
Change-Id: I61153fe078da4f9f3338147f1fbb7697a5554078
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Liam Mark <lmark@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 3b08ab9a811caebe1327f25f51557f95200d94bf https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
Bug: 315897033
[ Remove ret arg from trace_cma_alloc_finish - Kalesh Singh ]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Under certain circumstances a SoC can reach a critical temperaturelimit
and is unable to stabilize the temperature around a temperaturecontrol.
The system may ask for a specific power budget butbecause of the OPP
density, we can only choose an OPP with a powerbudget lower than the
requested one and under-utilize the CPU, thuslosing performance. In
other words, one OPP under-utilizes the CPUwith a power less than the
requested power budget and the next OPPexceeds the power budget. The
cpu idle cooling can solve this problem.
Bug: 299411923
Signed-off-by: Aran Dalton <arda@allwinnertech.com>
Change-Id: I1c17b340617e88be075097dc47f30ce94be2a4d7
Enabling CONFIG_NETFILTER_FAMILY_BRIDGE causes the new element,
hooks_bridge[] to be added to netns_nf. Since the KMI is frozen
this could not be added.
The only instantiation of struct netns_nf is as an embedded field
of struct net. So instead of adding the field to struct netns_nf,
a new "struct ext_net" is added that contains struct net and
the new hooks_bridge[] field. An accessor function,
get_nf_hooks_bridge() is added to get a pointer to the new
field.
There is a global init_net of type struct net which must be special
cased since it is not a member of a struct ext_net. All other
instances of struct net are allocated via net_alloc() which now
allocates a struct ext_net.
Since CONFIG_NETFILTER_FAMILY_BRIDGE is a hidden config that is
needed for vendor modules, it is enabled via init/Kconfig.gki.
Bug: 316040984
Fixes: 0145780bfc78 ("fix KASAN-related kernel crash by KMI W/A for NETFILTER_FAMILY_BRIDGE")
Change-Id: I2c7384e3df9b88f12464dc0138986fed12ca626a
Signed-off-by: Norihiko Hama <Norihiko.Hama@alpsalpine.com>
The commit 5aec776ef8 ("BACKPORT: ANDROID: dma-buf: Move sysfs work
out of DMA-BUF export path) re-purposed kobject as work_struct temporarily
to create the sysfs entries asynchronously. The author knows what he is
doing and rightly added a build assert if kobject struct size is smaller than
the work_struct size. We are hitting this build assert on a non-GKI platform
where CONFIG_ANDROID_KABI_RESERVE is not set. Fix this problem by allocating
a new union with dma_buf_sysfs_entry structure and temporary structure as
members. We only end up allocating more memory (because of union) only when
kobject size is smaller than work_struct which the original patch any way
assumed would never be true.
Bug: 261818147
Bug: 262666413
Change-Id: Ifb089bf80d8a3a44ece9f05fc0b99ee76cb11645
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
(cherry picked from commit ce18af9b5d)
Signed-off-by: T.J. Mercier <tjmercier@google.com>
We have identified an animation lag issue on our Android 14-6.1 product
which seems to be caused by contention in the rwsem lock during the
dmabuf request process. It appears that other processes are holding
sysfs read locks, resulting in the blocking of dmabuf sysfs node
creation. We encountered an issue in android14-6.1 that is similar to
the problem described in [1]. So we cherry-pick this commit to
android14-6.1.
[1] https://android-review.googlesource.com/c/kernel/common/+/2111974
Bug: 311282169
Bug: 206979019
Link: https://lore.kernel.org/lkml/CABdmKX2dNYhgOYdrrJU6-jt6F=LjCidbKhR6t4F7yaa0SPr+-A@mail.gmail.com/T/
Signed-off-by: Dezhi Huang <huangdezhi@hihonor.com>
Conflicts:
include/linux/dma-buf.h
1. The android14-6.1 KMI is frozen, and the modification to struct
dma_buf_sysfs_entry in the original patch triggers ABI check
failures. Instead of an anonymous union, use the existing struct
kobject directly as a work_struct with type punning.
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Change-Id: Ic0386849b6b248b0a72215633fc1a50782455bac
commit 7315dc1e122c85ffdfc8defffbb8f8b616c2eb1a upstream.
NFT_MSG_DELSET deactivates all elements in the set, skip
set->ops->commit() to avoid the unnecessary clone (for the pipapo case)
as well as the sync GC cycle, which could deactivate again expired
elements in such set.
Bug: 318548348
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0105571f80)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ie733688e27d9568d797fc1bc477261883b7dc8c1
Under certain circumstances __get_fault_info() may resolve the faulting
address using the AT instruction. Given that this is being done outside
of the host lock critical section, it is racy and the resolution via AT
may fail. We currently BUG() in this situation, which is obviously less
than ideal. Moving the address resolution to the critical section may
have a performance impact, so let's keep it where it is, but bail out
and return to the host to try a second time.
Bug: 311830307
Change-Id: I26d61b04a4ccf040bd31802abb3c6b998ff4a48b
Signed-off-by: Quentin Perret <qperret@google.com>
commit d920abd1e7 upstream.
From Alon:
"Due to a logical bug in the NVMe-oF/TCP subsystem in the Linux kernel,
a malicious user can cause a UAF and a double free, which may lead to
RCE (may also lead to an LPE in case the attacker already has local
privileges)."
Hence, when a queue initialization fails after the ahash requests are
allocated, it is guaranteed that the queue removal async work will be
called, hence leave the deallocation to the queue removal.
Also, be extra careful not to continue processing the socket, so set
queue rcv_state to NVMET_TCP_RECV_ERR upon a socket error.
Bug: 310114968
Cc: stable@vger.kernel.org
Reported-by: Alon Zahavi <zahavi.alon@gmail.com>
Tested-by: Alon Zahavi <zahavi.alon@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e985d78bdcf37f7ef73666a43b0d2407715f00d3)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ifd7ec8294182a6bf6d8c261aeda5d989e909f7ff
Current EP0 dequeue path will share the same as other EPs. However, there
are some special considerations that need to be made for EP0 transfers:
- EP0 transfers never transition into the started_list
- EP0 only has one active request at a time
In case there is a vendor specific control message for a function over USB
FFS, then there is no guarantee on the timeline which the DATA/STATUS stage
is responded to. While this occurs, any attempt to end transfers on
non-control EPs will end up having the DWC3_EP_DELAY_STOP flag set, and
defer issuing of the end transfer command. If the USB FFS application
decides to timeout the control transfer, or if USB FFS AIO path exits, the
USB FFS driver will issue a call to usb_ep_dequeue() for the ep0 request.
In case of the AIO exit path, the AIO FS blocks until all pending USB
requests utilizing the AIO path is completed. However, since the dequeue
of ep0 req does not happen properly, all non-control EPs with the
DWC3_EP_DELAY_STOP flag set will not be handled, and the AIO exit path will
be stuck waiting for the USB FFS data endpoints to receive a completion
callback.
Fix is to utilize dwc3_ep0_reset_state() in the dequeue API to ensure EP0
is brought back to the SETUP state, and ensures that any deferred end
transfer commands are handled. This also will end any active transfers
on EP0, compared to the previous implementation which directly called
giveback only.
Fixes: fcd2def663 ("usb: dwc3: gadget: Refactor dwc3_gadget_ep_dequeue")
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
Bug: 318577849
Change-Id: Ic00684db4b502f1aab128f7e49f22510dda24f60
(cherry picked from commit 730e12fbec53ab59dd807d981a204258a4cfb29a https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing)
Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
commit 7644b1a1c9 upstream.
We could race with SQ thread exit, and if we do, we'll hit a NULL pointer
dereference when the thread is cleared. Grab the SQPOLL data lock before
attempting to get the task cpu and pid for fdinfo, this ensures we have a
stable view of it.
Bug: 309790656
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9236d2ea64)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I044e0285d4535440606ff593230b873e3145db91
commit 4b7de801606e504e69689df71475d27e35336fb3 upstream.
Lee pointed out issue found by syscaller [0] hitting BUG in prog array
map poke update in prog_array_map_poke_run function due to error value
returned from bpf_arch_text_poke function.
There's race window where bpf_arch_text_poke can fail due to missing
bpf program kallsym symbols, which is accounted for with check for
-EINVAL in that BUG_ON call.
The problem is that in such case we won't update the tail call jump
and cause imbalance for the next tail call update check which will
fail with -EBUSY in bpf_arch_text_poke.
I'm hitting following race during the program load:
CPU 0 CPU 1
bpf_prog_load
bpf_check
do_misc_fixups
prog_array_map_poke_track
map_update_elem
bpf_fd_array_map_update_elem
prog_array_map_poke_run
bpf_arch_text_poke returns -EINVAL
bpf_prog_kallsyms_add
After bpf_arch_text_poke (CPU 1) fails to update the tail call jump, the next
poke update fails on expected jump instruction check in bpf_arch_text_poke
with -EBUSY and triggers the BUG_ON in prog_array_map_poke_run.
Similar race exists on the program unload.
Fixing this by moving the update to bpf_arch_poke_desc_update function which
makes sure we call __bpf_arch_text_poke that skips the bpf address check.
Each architecture has slightly different approach wrt looking up bpf address
in bpf_arch_text_poke, so instead of splitting the function or adding new
'checkip' argument in previous version, it seems best to move the whole
map_poke_run update as arch specific code.
[0] https://syzkaller.appspot.com/bug?extid=97a4fe20470e9bc30810
Bug: 309551558
Fixes: ebf7d1f508 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Reported-by: syzbot+97a4fe20470e9bc30810@syzkaller.appspotmail.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Cc: Lee Jones <lee@kernel.org>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20231206083041.1306660-2-jolsa@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 57a6b0a464)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I251c3da579e5d48cd7de4043913fd42d0671d6b5
Export sysctl_sched_min_granularity and
sysctl_sched_idle_min_granularity. In the vendor module, it will use
several static function in GKI, while we do not want to export these
static functions, which will need to make them not static, we copied
them to the vendor module, so we need the export the symbols used in
those static functions. For example, sysctl_sched_min_granularity
and sysctl_sched_idle_min_granularity are referred in sched_slice(),
and they are only used as read-only.
Bug: 316276520
Change-Id: I976d0a1f3a70e8e60099e55fdd3cc99a90053fbb
Signed-off-by: Rick Yiu <rickyiu@google.com>