[ Upstream commit 22dd70eb2c3d754862964377a75abafd3167346b ]
Currently, we can read OOB data without MSG_OOB by using MSG_PEEK
when OOB data is sitting on the front row, which is apparently
wrong.
>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
>>> c1.send(b'a', MSG_OOB)
1
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'a'
If manage_oob() is called when no data has been copied, we only
check if the socket enables SO_OOBINLINE or MSG_PEEK is not used.
Otherwise, the skb is returned as is.
However, here we should return NULL if MSG_PEEK is set and no data
has been copied.
Also, in such a case, we should not jump to the redo label because
we will be caught in the loop and hog the CPU until normal data
comes in.
Then, we need to handle skb == NULL case with the if-clause below
the manage_oob() block.
With this patch:
>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
>>> c1.send(b'a', MSG_OOB)
1
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
BlockingIOError: [Errno 11] Resource temporarily unavailable
Bug: 342490466
Fixes: 314001f0bf ("af_unix: Add OOB support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240410171016.7621-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 022d81a709)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I4728977f8f908c19dfa5c861c7381a50499b7fe0
[ Upstream commit b46f4eaa4f0ec38909fb0072eea3aeddb32f954e ]
syzkaller started to report deadlock of unix_gc_lock after commit
4090fa373f0e ("af_unix: Replace garbage collection algorithm."), but
it just uncovers the bug that has been there since commit 314001f0bf
("af_unix: Add OOB support").
The repro basically does the following.
from socket import *
from array import array
c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
c1.sendmsg([b'a'], [(SOL_SOCKET, SCM_RIGHTS, array("i", [c2.fileno()]))], MSG_OOB)
c2.recv(1) # blocked as no normal data in recv queue
c2.close() # done async and unblock recv()
c1.close() # done async and trigger GC
A socket sends its file descriptor to itself as OOB data and tries to
receive normal data, but finally recv() fails due to async close().
The problem here is wrong handling of OOB skb in manage_oob(). When
recvmsg() is called without MSG_OOB, manage_oob() is called to check
if the peeked skb is OOB skb. In such a case, manage_oob() pops it
out of the receive queue but does not clear unix_sock(sk)->oob_skb.
This is wrong in terms of uAPI.
Let's say we send "hello" with MSG_OOB, and "world" without MSG_OOB.
The 'o' is handled as OOB data. When recv() is called twice without
MSG_OOB, the OOB data should be lost.
>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM, 0)
>>> c1.send(b'hello', MSG_OOB) # 'o' is OOB data
5
>>> c1.send(b'world')
5
>>> c2.recv(5) # OOB data is not received
b'hell'
>>> c2.recv(5) # OOB date is skipped
b'world'
>>> c2.recv(5, MSG_OOB) # This should return an error
b'o'
In the same situation, TCP actually returns -EINVAL for the last
recv().
Also, if we do not clear unix_sk(sk)->oob_skb, unix_poll() always set
EPOLLPRI even though the data has passed through by previous recv().
To avoid these issues, we must clear unix_sk(sk)->oob_skb when dequeuing
it from recv queue.
The reason why the old GC did not trigger the deadlock is because the
old GC relied on the receive queue to detect the loop.
When it is triggered, the socket with OOB data is marked as GC candidate
because file refcount == inflight count (1). However, after traversing
all inflight sockets, the socket still has a positive inflight count (1),
thus the socket is excluded from candidates. Then, the old GC lose the
chance to garbage-collect the socket.
With the old GC, the repro continues to create true garbage that will
never be freed nor detected by kmemleak as it's linked to the global
inflight list. That's why we couldn't even notice the issue.
Bug: 342490466
Fixes: 314001f0bf ("af_unix: Add OOB support")
Reported-by: syzbot+7f7f201cc2668a8fd169@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7f7f201cc2668a8fd169
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240405221057.2406-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 601a89ea24)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ib4a11eed6b5710d9934d4f31cd29dfd4c7b3658f
The correct fragment is the one in build/kernel,
enabled by --page_size or kernel_build.page_size.
This fragment:
- Does not correctly enable incremental FS
- Does not correctly clear LOCALVERSION that
has 4k for the 4k build.
Bug: 347036722
Bug: 340631213
Bug: 338659380
Change-Id: I31cb004ca639d8ec3dd6201112391c7214971eba
Signed-off-by: Yifan Hong <elsk@google.com>
Depending on the platform binary being executed, the linker
(interpreter) requested can be one of:
1) /system/bin/bootstrap/linker64
2) /system/bin/linker64
3) /apex/com.android.runtime/bin/linker64
Relax the check to the basename (linker64), instead of the path.
Bug: 330767927
Bug: 335584973
Change-Id: I4a1f95b7cecd126f85ad8cefd9ff10d272947f9e
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
This introduces a new environment variable, KCPPFLAGS_COMPAT.
One use-case is to ensure -ffile-prefix-map is passed to the arm32
compiler to normalise compilation directory and make the ELF build ID
reproducible.
Bug: 345452375
Change-Id: I6ae1df58172f4dadeac1dbbee2e3241b704a9256
Signed-off-by: Giuliano Procida <gprocida@google.com>
Pixel MM Metrics:
add the missing symbol 'seq_put_decimal_ll' and re-do update list
Bug: 299190787
Test: local build
Change-Id: I005ccfa15cee8252bc51242460bbab9b7d0eb2ab
Signed-off-by: Robin Hsu <robinhsu@google.com>
For GKI targets with kmi_symbol_list set,
also set rewrite_absolute_paths_in_config to True
so that CONFIG_UNUSED_KSYMS_WHITELIST is not an
absolute path any more, increasing reproducibility
on different machines.
This is made possible with d94e16579b
"Revert^2 "BACKPORT: FROMGIT: module: allow UNUSED_KSYMS_WHITELIST to be relative against objtree.""
so that Kbuild recognizes relative paths by searching
$(objtree) first.
This does not change x86 builds because it
doesn't have kmi_symbol_list set.
This change does not affect device builds like
db845c, rockpi and fips140. Adding the attribute
to these targets would be okay, but it is not
covered by this change.
Bug: 333769605
Bug: 342390208
Change-Id: I844c251979bae5277474837eaa055f1346b0afa2
Signed-off-by: Yifan Hong <elsk@google.com>
udc device and gadget device are tightly coupled, yet there's no good
way to corelate the two. Add a sysfs link in udc that points to the
corresponding gadget device.
An example use case: userspace configures a f_midi configfs driver and
bind the udc device, then it tries to locate the corresponding midi
device, which is a child device of the gadget device. The gadget device
that's associated to the udc device has to be identified in order to
index the midi device. Having a sysfs link would make things much
easier.
Signed-off-by: Roy Luo <royluo@google.com>
Link: https://lore.kernel.org/r/20240307030922.3573161-1-royluo@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 333778731
Change-Id: I9e3f782543eba5e026a65031aaae754daafb69ab
(cherry picked from commit 0ef40f399aa2be8c04aee9b7430705612c104ce5)
[royluo: Resolved conflict in drivers/usb/gadget/udc/core.c ]
Signed-off-by: Roy Luo <royluo@google.com>
In commit 92f1655aa2b2 ("net: fix __dst_negative_advice() race") the
struct dst_ops callback negative_advice is callback changes function
parameters. But as this pointer is part of a structure that is tracked
in the ABI checker, the tool triggers when this is changed.
However, the callback pointer is internal to the networking stack, so
changing the function type is safe, so needing to preserve this is not
required. To do so, switch the function pointer type back to the old
one so that the checking tools pass, AND then do a hard cast of the
function pointer to the new type when assigning and calling the
function.
Bug: 343727534
Fixes: 92f1655aa2b2 ("net: fix __dst_negative_advice() race")
Change-Id: I48d4ab4bbd29f8edc8fbd7923828b7f78a23e12e
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
__dst_negative_advice() does not enforce proper RCU rules when
sk->dst_cache must be cleared, leading to possible UAF.
RCU rules are that we must first clear sk->sk_dst_cache,
then call dst_release(old_dst).
Note that sk_dst_reset(sk) is implementing this protocol correctly,
while __dst_negative_advice() uses the wrong order.
Given that ip6_negative_advice() has special logic
against RTF_CACHE, this means each of the three ->negative_advice()
existing methods must perform the sk_dst_reset() themselves.
Note the check against NULL dst is centralized in
__dst_negative_advice(), there is no need to duplicate
it in various callbacks.
Many thanks to Clement Lecigne for tracking this issue.
This old bug became visible after the blamed commit, using UDP sockets.
Bug: 343727534
Fixes: a87cb3e48e ("net: Facility to report route quality of connected sockets")
Reported-by: Clement Lecigne <clecigne@google.com>
Diagnosed-by: Clement Lecigne <clecigne@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240528114353.1794151-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 92f1655aa2b2294d0b49925f3b875a634bd3b59e)
[Lee: Trivial/unrelated conflict - no change to the patch]
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I293734dca1b81fcb712e1de294f51e96a405f7e4
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit b7d4cf60e7.
Reason for revert: relanding change that should be safe to go in on its own. Below is the original commit message.
BACKPORT: FROMGIT: module: allow UNUSED_KSYMS_WHITELIST to be relative against objtree.
If UNUSED_KSYMS_WHITELIST is a file generated
before Kbuild runs, and the source tree is in
a read-only filesystem, the developer must put
the file somewhere and specify an absolute
path to UNUSED_KSYMS_WHITELIST. This worked,
but if IKCONFIG=y, an absolute path is embedded
into .config and eventually into vmlinux, causing
the build to be less reproducible when building
on a different machine.
This patch makes the handling of
UNUSED_KSYMS_WHITELIST to be similar to
MODULE_SIG_KEY.
First, check if UNUSED_KSYMS_WHITELIST is an
absolute path, just as before this patch. If so,
use the path as is.
If it is a relative path, use wildcard to check
the existence of the file below objtree first.
If it does not exist, fall back to the original
behavior of adding $(srctree)/ before the value.
After this patch, the developer can put the generated
file in objtree, then use a relative path against
objtree in .config, eradicating any absolute paths
that may be evaluated differently on different machines.
Signed-off-by: Yifan Hong <elsk@google.com>
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
(cherry picked from commit a2e3c811938b4902725e259c03b2d6c539613992
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git modules-next)
Bug: 333769605
Change-Id: Ibf3f09dc8559e92023811abaabb33fd6b4ed8fa5
[elsk: apply change to gen_autoksyms.sh instead because
CONFIG_UNUSED_KSYMS_WHITELIST is parsed there. Revert change
to Makefile.modpost.]
Bug: 342390208
Signed-off-by: Yifan Hong <elsk@google.com>
[ Upstream commit 076361362122a6d8a4c45f172ced5576b2d4a50d ]
The struct adjtimex freq field takes a signed value who's units are in
shifted (<<16) parts-per-million.
Unfortunately for negative adjustments, the straightforward use of:
freq = ppm << 16 trips undefined behavior warnings with clang:
valid-adjtimex.c:66:6: warning: shifting a negative signed value is undefined [-Wshift-negative-value]
-499<<16,
~~~~^
valid-adjtimex.c:67:6: warning: shifting a negative signed value is undefined [-Wshift-negative-value]
-450<<16,
~~~~^
..
Fix it by using a multiply by (1 << 16) instead of shifting negative values
in the valid-adjtimex test case. Align the values for better readability.
Bug: 339526723
Reported-by: Lee Jones <joneslee@google.com>
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Change-Id: Ied611c13a802acf9c7a2427f0a61eb358b571a3d
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20240409202222.2830476-1-jstultz@google.com
Link: https://lore.kernel.org/lkml/0c6d4f0d-2064-4444-986b-1d1ed782135f@collabora.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 1f3484dec9)
Signed-off-by: Edward Liaw <edliaw@google.com>
__netlink_dump_start() releases nlk->cb_mutex right before
calling netlink_dump() which grabs it again.
This seems dangerous, even if KASAN did not bother yet.
Add a @lock_taken parameter to netlink_dump() to let it
grab the mutex if called from netlink_recvmsg() only.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b5590270068c4324dac4a2b5a4a156e02e21339f)
Bug: 339546075
Change-Id: I29a711ea804794b556674011cbd23c5bf9a03ab6
Signed-off-by: yenchia.chen <yenchia.chen@mediatek.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
If UNUSED_KSYMS_WHITELIST is a file generated
before Kbuild runs, and the source tree is in
a read-only filesystem, the developer must put
the file somewhere and specify an absolute
path to UNUSED_KSYMS_WHITELIST. This worked,
but if IKCONFIG=y, an absolute path is embedded
into .config and eventually into vmlinux, causing
the build to be less reproducible when building
on a different machine.
This patch makes the handling of
UNUSED_KSYMS_WHITELIST to be similar to
MODULE_SIG_KEY.
First, check if UNUSED_KSYMS_WHITELIST is an
absolute path, just as before this patch. If so,
use the path as is.
If it is a relative path, use wildcard to check
the existence of the file below objtree first.
If it does not exist, fall back to the original
behavior of adding $(srctree)/ before the value.
After this patch, the developer can put the generated
file in objtree, then use a relative path against
objtree in .config, eradicating any absolute paths
that may be evaluated differently on different machines.
Signed-off-by: Yifan Hong <elsk@google.com>
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
(cherry picked from commit a2e3c811938b4902725e259c03b2d6c539613992
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git modules-next)
Bug: 333769605
Change-Id: I0696ac8f686329795034ada5a4587af4ecbb774f
[elsk: apply change to gen_autoksyms.sh instead because
CONFIG_UNUSED_KSYMS_WHITELIST is parsed there. Revert change
to Makefile.modpost. Edit init/Kconfig instead of kernel/module/Kconfig
because UNUSED_KSYMS_WHITELIST is defined there.]
Bug: 342390208
Signed-off-by: Yifan Hong <elsk@google.com>
export function for sysfs node formating
Bug: 299190787
Change-Id: I71e6a0815efa8df99d036bf457b8a0081999f3de
Signed-off-by: Robin Hsu <robinhsu@google.com>
The correct printk format specifier when calculating buffer offsets
should be "%tx" as it is a pointer difference (a.k.a ptrdiff_t). This
fixes some W=1 build warnings reported by the kernel test robot.
Bug: 329799092
Fixes: 63f7ddea2e ("ANDROID: binder: fix KMI-break due to address type change")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401100511.A4BKMwoq-lkp@intel.com/
Change-Id: Iaa87433897b507c47fe8601464445cb6de4b61db
Signed-off-by: Carlos Llamas <cmllamas@google.com>
The data transfer rate using Google Restore in USB3.2 mode is slower,
only about 140MB/s at 5Gbps.
The bMaxBurst is not set, and num_fifos in
dwc3_gadget_resize_tx_fifosis 1, which results
in only 131btye of dwc3 ram space being allocated to ep.
Modify bMaxBurst to 6.
The 5Gbps rate increases from 140MB/s to 350MB/s.
The 10Gbps rate is increased from 220MB/s to 500MB/s.
Bug: 340049583
BUG: 341178033
Change-Id: I5710af32c72d0b57afaecc00c4f0909af4b9a299
Signed-off-by: Lianqin Hu <hulianqin@vivo.corp-partner.google.com>
Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
(cherry picked from commit 23f2a9f5f1)
Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
When a Fast Role Swap control message attempt results in a transition
to ERROR_RECOVERY, the TCPC can still queue a TCPM_SOURCING_VBUS event.
If the event is queued but processed after the tcpm_reset_port() call
in the PORT_RESET state, then the following occurs:
1. tcpm_reset_port() calls tcpm_init_vbus() to reset the vbus sourcing and
sinking state
2. tcpm_pd_event_handler() turns VBUS on before the port is in the default
state.
3. The port resolves as a sink. In the SNK_DISCOVERY state,
tcpm_set_charge() cannot set vbus to charge.
Clear pd events within PORT_RESET to get rid of non-applicable events.
Fixes: b17dd57118 ("staging: typec: tcpm: Improve role swap with non PD capable partners")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20240423202715.3375827-2-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 311127232
(cherry picked from commit bf20c69cf3cf9c6445c4925dd9a8a6ca1b78bfdf)
Signed-off-by: RD Babiera <rdbabiera@google.com>
Change-Id: I9b27d040d0acdeb2af74fd3fe90d246b864b5141
Before sending Enter Mode for an Alt Mode, there is a gap between Discover
Modes and the Alt Mode driver queueing the Enter Mode VDM for the port
partner to send a message to the port.
If this message results in unregistering Alt Modes such as in a DR_SWAP,
then the following deadlock can occur with respect to the DisplayPort Alt
Mode driver:
1. The DR_SWAP state holds port->lock. Unregistering the Alt Mode driver
results in a cancel_work_sync() that waits for the current dp_altmode_work
to finish.
2. dp_altmode_work makes a call to tcpm_altmode_enter. The deadlock occurs
because tcpm_queue_vdm_unlock attempts to hold port->lock.
Before attempting to grab the lock, ensure that the port is in a state
vdm_run_state_machine can run in. Alt Mode unregistration will not occur
in these states.
Fixes: 03eafcfb60 ("usb: typec: tcpm: Add tcpm_queue_vdm_unlocked() helper")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20240423202356.3372314-2-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 333787869
(cherry picked from commit cdc9946ea6377e8e214b135ccc308c5e514ba25f)
[rd: removed SRC_VDM_IDENTITY_REQUEST check, state not defined in branch]
Signed-off-by: RD Babiera <rdbabiera@google.com>
Change-Id: I8018d1fdc294885ae609b6e45e9bf6ab190897b9
It gives a way to enable/disable IO aware feature for background
discard, so that we can tune background discard more precisely
based on undiscard condition. e.g. force to disable IO aware if
there are large number of discard extents, and discard IO may
always be interrupted by frequent common IO.
Bug:340148900
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit d346fa09abff46988de9267b67b6900d9913d5a2)
Change-Id: Icff98ecc1f9e3a294afa08cfe2e8ba4e9f7fdaf3
Of course, the initramfs needs to be ready before procfs can be
mounted... in the initramfs. While at it, only mount if a pKVM module
must be loaded and only print a warning in case of failure.
Bug: 278749606
Bug: 301483379
Bug: 331152809
Change-Id: Ie56bd26d4575f69cb1f06ba6317a098649f6da44
Reported-by: Mankyum Kim <mankyum.kim@samsung-slsi.corp-partner.google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
(cherry picked from commit 7d5843b59548672c23c977b4666c3779d31695fb)
typec_register_partner() does not guarantee partner registration
to always succeed. In the event of failure, port->partner is set
to the error value or NULL. Given that port->partner validity is
not checked, this results in the following crash:
Unable to handle kernel NULL pointer dereference at virtual address 00000000000003c0
pc : run_state_machine+0x1bc8/0x1c08
lr : run_state_machine+0x1b90/0x1c08
..
Call trace:
run_state_machine+0x1bc8/0x1c08
tcpm_state_machine_work+0x94/0xe4
kthread_worker_fn+0x118/0x328
kthread+0x1d0/0x23c
ret_from_fork+0x10/0x20
To prevent the crash, check for port->partner validity before
derefencing it in all the call sites.
Cc: stable@vger.kernel.org
Fixes: c97cd0b4b5 ("usb: typec: tcpm: set initial svdm version based on pd revision")
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240427202812.3435268-1-badhri@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 321849121
(cherry picked from commit ae11f04b452b5205536e1c02d31f8045eba249dd
https://kernel.googlesource.com/pub/scm/linux/kernel/git/gregkh/usb
usb-linus)
Change-Id: I01510c86e147b3011afc5d475fc1dc38d2636a60
Signed-off-by: Zheng Pan <zhengpan@google.com>
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
There is an issue in clang's ThinLTO caching (enabled for the kernel via
'--thinlto-cache-dir') with .incbin, which the kernel occasionally uses
to include data within the kernel, such as the .config file for
/proc/config.gz. For example, when changing the .config and rebuilding
vmlinux, the copy of .config in vmlinux does not match the copy of
.config in the build folder:
$ echo 'CONFIG_LTO_NONE=n
CONFIG_LTO_CLANG_THIN=y
CONFIG_IKCONFIG=y
CONFIG_HEADERS_INSTALL=y' >kernel/configs/repro.config
$ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 clean defconfig repro.config vmlinux
...
$ grep CONFIG_HEADERS_INSTALL .config
CONFIG_HEADERS_INSTALL=y
$ scripts/extract-ikconfig vmlinux | grep CONFIG_HEADERS_INSTALL
CONFIG_HEADERS_INSTALL=y
$ scripts/config -d HEADERS_INSTALL
$ make -kj"$(nproc)" ARCH=x86_64 LLVM=1 vmlinux
...
UPD kernel/config_data
GZIP kernel/config_data.gz
CC kernel/configs.o
...
LD vmlinux
...
$ grep CONFIG_HEADERS_INSTALL .config
# CONFIG_HEADERS_INSTALL is not set
$ scripts/extract-ikconfig vmlinux | grep CONFIG_HEADERS_INSTALL
CONFIG_HEADERS_INSTALL=y
Without '--thinlto-cache-dir' or when using full LTO, this issue does
not occur.
Benchmarking incremental builds on a few different machines with and
without the cache shows a 20% increase in incremental build time without
the cache when measured by touching init/main.c and running 'make all'.
ARCH=arm64 defconfig + CONFIG_LTO_CLANG_THIN=y on an arm64 host:
Benchmark 1: With ThinLTO cache
Time (mean ± σ): 56.347 s ± 0.163 s [User: 83.768 s, System: 24.661 s]
Range (min … max): 56.109 s … 56.594 s 10 runs
Benchmark 2: Without ThinLTO cache
Time (mean ± σ): 67.740 s ± 0.479 s [User: 718.458 s, System: 31.797 s]
Range (min … max): 67.059 s … 68.556 s 10 runs
Summary
With ThinLTO cache ran
1.20 ± 0.01 times faster than Without ThinLTO cache
ARCH=x86_64 defconfig + CONFIG_LTO_CLANG_THIN=y on an x86_64 host:
Benchmark 1: With ThinLTO cache
Time (mean ± σ): 85.772 s ± 0.252 s [User: 91.505 s, System: 8.408 s]
Range (min … max): 85.447 s … 86.244 s 10 runs
Benchmark 2: Without ThinLTO cache
Time (mean ± σ): 103.833 s ± 0.288 s [User: 232.058 s, System: 8.569 s]
Range (min … max): 103.286 s … 104.124 s 10 runs
Summary
With ThinLTO cache ran
1.21 ± 0.00 times faster than Without ThinLTO cache
While it is unfortunate to take this performance improvement off the
table, correctness is more important. If/when this is fixed in LLVM, it
can potentially be brought back in a conditional manner. Alternatively,
a developer can just disable LTO if doing incremental compiles quickly
is important, as a full compile cycle can still take over a minute even
with the cache and it is unlikely that LTO will result in functional
differences for a kernel change.
Cc: stable@vger.kernel.org
Fixes: dc5723b02e ("kbuild: add support for Clang LTO")
Reported-by: Yifan Hong <elsk@google.com>
Closes: https://github.com/ClangBuiltLinux/linux/issues/2021
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Closes: https://lore.kernel.org/r/20220327115526.cc4b0ff55fc53c97683c3e4d@kernel.org/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Bug: 312268956
Bug: 335301039
Change-Id: Iace492db67f28e172427669b1b7eb6a8c44dd3aa
(cherry picked from commit aba091547ef6159d52471f42a3ef531b7b660ed8
https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git kbuild)
[elsk: Resolved minor conflict in Makefile]
Signed-off-by: Yifan Hong <elsk@google.com>
If the kernel is built CONFIG_CFI_CLANG=y, reading smaps
may cause a panic. This is due to a failed CFI check; which
is triggered becuase the signature of the function pointer for
printing smaps padding VMAs does not match exactly with that
for show_smap().
Fix this by casting the function pointer to the expected type
based on whether printing maps or smaps padding.
Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: I65564a547dacbc4131f8557344c8c96e51f90cd5
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
In some cases a VMA with padding representation may be split, and
therefore the padding flags must be updated accordingly.
There are 3 cases to handle:
Given:
| DDDDPPPP |
where:
- D represents 1 page of data;
- P represents 1 page of padding;
- | represents the boundaries (start/end) of the VMA
1) Split exactly at the padding boundary
| DDDDPPPP | --> | DDDD | PPPP |
- Remove padding flags from the first VMA.
- The second VMA is all padding
2) Split within the padding area
| DDDDPPPP | --> | DDDDPP | PP |
- Subtract the length of the second VMA from the first VMA's
padding.
- The second VMA is all padding, adjust its padding length (flags)
3) Split within the data area
| DDDDPPPP | --> | DD | DDPPPP |
- Remove padding flags from the first VMA.
- The second VMA is has the same padding as from before the split.
To simplify the semantics merging of padding VMAs is not allowed.
If a split produces a VMA that is entirely padding, show_[s]maps()
only outputs the padding VMA entry (as the data entry is of length 0).
Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: Ie2628ced5512e2c7f8af25fabae1f38730c8bb1a
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Some file systems like F2FS use a custom filemap_fault ops. Remove this
check, as checking vm_file is sufficient.
Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: Id6a584d934f06650c0a95afd1823669fc77ba2c2
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Only preform padding advise from the execution context on bionic's
dynamic linker. This ensures that madvise() doesn't have unwanted
side effects.
Also rearrange the order of fail checks in madvise_vma_pad_pages()
in order of ascending cost.
Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: I3e05b8780c6eda78007f86b613f8c11dd18ac28f
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
It is still disabled by default. Must specify
rcutree.android_enable_rcu_lazy and rcu_nocbs=all in boot time parameter
to actually enable it.
Bug: 258241771
Change-Id: Ic9e15b846d58ffa3d5dd81842c568da79352ff2d
Signed-off-by: Qais Yousef <qyousef@google.com>
It is still disabled by default. Must specify
rcutree.android_enable_rcu_lazy and rcu_nocbs=all in boot time parameter
to actually enable it.
Bug: 258241771
Change-Id: I11c920aa5edde2fc42ab54245cd198eb8cb47616
Signed-off-by: Qais Yousef <qyousef@google.com>
To allow more flexible arrangements while still provide a single kernel
for distros, provide a boot time parameter to enable/disable lazy RCU.
Specify:
rcutree.enable_rcu_lazy=[y|1|n|0]
Which also requires
rcu_nocbs=all
at boot time to enable/disable lazy RCU.
To disable it by default at build time when CONFIG_RCU_LAZY=y, the new
CONFIG_RCU_LAZY_DEFAULT_OFF can be used.
Bug: 258241771
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Tested-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/lkml/20231203011252.233748-1-qyousef@layalina.io/
[Fix trivial conflicts rejecting newer code that doesn't exist on 5.15]
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: Ib5585ae717a2ba7749f2802101b785c4e5de8a90
On many systems, a great deal of boot (in userspace) happens after the
kernel thinks the boot has completed. It is difficult to determine if
the system has really booted from the kernel side. Some features like
lazy-RCU can risk slowing down boot time if, say, a callback has been
added that the boot synchronously depends on. Further expedited callbacks
can get unexpedited way earlier than it should be, thus slowing down
boot (as shown in the data below).
For these reasons, this commit adds a config option
'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter rcupdate.boot_end_delay.
Userspace can also make RCU's view of the system as booted, by writing the
time in milliseconds to: /sys/module/rcupdate/parameters/android_rcu_boot_end_delay
Or even just writing a value of 0 to this sysfs node.
However, under no circumstance will the boot be allowed to end earlier
than just before init is launched.
The default value of CONFIG_RCU_BOOT_END_DELAY is chosen as 15s. This
suites ChromeOS and also a PREEMPT_RT system below very well, which need
no config or parameter changes, and just a simple application of this
patch. A system designer can also choose a specific value here to keep
RCU from marking boot completion. As noted earlier, RCU's perspective
of the system as booted will not be marker until at least
android_rcu_boot_end_delay milliseconds have passed or an update is made
via writing a small value (or 0) in milliseconds to:
/sys/module/rcupdate/parameters/android_rcu_boot_end_delay.
One side-effect of this patch is, there is a risk that a real-time workload
launched just after the kernel boots will suffer interruptions due to expedited
RCU, which previous ended just before init was launched. However, to mitigate
such an issue (however unlikely), the user should either tune
CONFIG_RCU_BOOT_END_DELAY to a smaller value than 15 seconds or write a value
of 0 to /sys/module/rcupdate/parameters/android_rcu_boot_end_delay, once userspace
boots, and before launching the real-time workload.
Qiuxu also noted impressive boot-time improvements with earlier version
of patch. An excerpt from the data he shared:
1) Testing environment:
OS : CentOS Stream 8 (non-RT OS)
Kernel : v6.2
Machine : Intel Cascade Lake server (2 sockets, each with 44 logical threads)
Qemu args : -cpu host -enable-kvm, -smp 88,threads=2,sockets=2, …
2) OS boot time definition:
The time from the start of the kernel boot to the shell command line
prompt is shown from the console. [ Different people may have
different OS boot time definitions. ]
3) Measurement method (very rough method):
A timer in the kernel periodically prints the boot time every 100ms.
As soon as the shell command line prompt is shown from the console,
we record the boot time printed by the timer, then the printed boot
time is the OS boot time.
4) Measured OS boot time (in seconds)
a) Measured 10 times w/o this patch:
8.7s, 8.4s, 8.6s, 8.2s, 9.0s, 8.7s, 8.8s, 9.3s, 8.8s, 8.3s
The average OS boot time was: ~8.7s
b) Measure 10 times w/ this patch:
8.5s, 8.2s, 7.6s, 8.2s, 8.7s, 8.2s, 7.8s, 8.2s, 9.3s, 8.4s
The average OS boot time was: ~8.3s.
(CHROMIUM tag rationale: Submitted upstream but got lots of pushback as
it may harm a PREEMPT_RT system -- the concern is VERY theoretical and
this improves things for ChromeOS. Plus we are not a PREEMPT_RT system.
So I am strongly suggesting this mostly simple change for ChromeOS.)
Bug: 258241771
Bug: 268129466
Test: boot
Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Change-Id: Ibd262189d7f92dbcc57f1508efe90fcfba95a6cc
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4350228
Commit-Queue: Joel Fernandes <joelaf@google.com>
Commit-Queue: Vineeth Pillai <vineethrp@google.com>
Tested-by: Vineeth Pillai <vineethrp@google.com>
Tested-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
(cherry picked from commit 7968079ec77b320ee9d4115fe13048a8f7afbc02)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style. Prefix boot param with android_]
Signed-off-by: Qais Yousef <qyousef@google.com>
The need_offload_krc() function currently holds the krcp->lock in order
to safely check krcp->head. This commit removes the need for this lock
in that function by updating the krcp->head pointer using WRITE_ONCE()
macro so that readers can carry out lockless loads of that pointer.
Bug: 258241771
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 8fc5494ad5)
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: Iddde5ec15e8574216abc95d8c64efa5c66868508
As per the comments in include/linux/shrinker.h, .count_objects callback
should return the number of freeable items, but if there are no objects
to free, SHRINK_EMPTY should be returned. The only time 0 is returned
should be when we are unable to determine the number of objects, or the
cache should be skipped for another reason.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 3826909635)
Bug: 258241771
Bug: 222463781
Test: CQ
Change-Id: I5cb380fceaccc85971a47773d9058f0ea044c6dd
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4332178
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Reviewed-by: Sean Paul <sean@poorly.run>
(cherry picked from commit 3243f1e22bf915c9b805a96cc4a8cbc03ed5d7a8)
[Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
Currently the monitor work is scheduled with a fixed interval of HZ/20,
which is roughly 50 milliseconds. The drawback of this approach is
low utilization of the 512 page slots in scenarios with infrequence
kvfree_rcu() calls. For example on an Android system:
<snip>
kworker/3:3-507 [003] .... 470.286305: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=6
kworker/6:1-76 [006] .... 470.416613: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000ea0d6556 nr_records=1
kworker/6:1-76 [006] .... 470.416625: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000003e025849 nr_records=9
kworker/3:3-507 [003] .... 471.390000: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000815a8713 nr_records=48
kworker/1:1-73 [001] .... 471.725785: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000fda9bf20 nr_records=3
kworker/1:1-73 [001] .... 471.725833: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000a425b67b nr_records=76
kworker/0:4-1411 [000] .... 472.085673: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007996be9d nr_records=1
kworker/0:4-1411 [000] .... 472.085728: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=5
kworker/6:1-76 [006] .... 472.260340: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000065630ee4 nr_records=102
<snip>
In many cases, out of 512 slots, fewer than 10 were actually used.
In order to improve batching and make utilization more efficient this
commit sets a drain interval to a fixed 5-seconds interval. Floods are
detected when a page fills quickly, and in that case, the reclaim work
is re-scheduled for the next scheduling-clock tick (jiffy).
After this change:
<snip>
kworker/7:1-371 [007] .... 5630.725708: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000005ab0ffb3 nr_records=121
kworker/7:1-371 [007] .... 5630.989702: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000060c84761 nr_records=47
kworker/7:1-371 [007] .... 5630.989714: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000000babf308 nr_records=510
kworker/7:1-371 [007] .... 5631.553790: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000bb7bd0ef nr_records=169
kworker/7:1-371 [007] .... 5631.553808: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044c78753 nr_records=510
kworker/5:6-9428 [005] .... 5631.746102: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d98519aa nr_records=123
kworker/4:7-9434 [004] .... 5632.001758: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000526c9d44 nr_records=322
kworker/4:7-9434 [004] .... 5632.002073: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000002c6a8afa nr_records=185
kworker/7:1-371 [007] .... 5632.277515: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007f4a962f nr_records=510
<snip>
Here, all but one of the cases, more than one hundreds slots were used,
representing an order-of-magnitude improvement.
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 51824b780b)
Bug: 258241771
Bug: 222463781
Test: CQ
Change-Id: I4635ba0dbece4e029d5271ef3950b8eaa1ae5e81
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4332177
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Reviewed-by: Sean Paul <sean@poorly.run>
(cherry picked from commit b1bf359877e084383be107bf0008d58d0a6b15e3)
[Conflict due to 71cf9c9835 adding a new
function in the same location.
Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>
monitor_todo is not needed as the work struct already tracks
if work is pending. Just use that to know if work is pending
using schedule_delayed_work() helper.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
(cherry picked from commit 82d26c36cc)
Bug: 258241771
Bug: 222463781
Test: CQ
Change-Id: I4c13f89da735a628a5030ab55a13e338b97da4b8
Signed-off-by: Joel Fernandes <joelaf@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4332176
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
(cherry picked from commit bb867be28d6a70b36ff1d6563f794c489072ab7e)
[Minor conflict with 71cf9c9835 where it
added a new function in the same location.
Cherry picked from chromeos-5.15 tree. Minor tweaks to commit message
to match Android style]
Signed-off-by: Qais Yousef <qyousef@google.com>