[ Upstream commit 8c6790b5c25dfac11b589cc37346bcf9e23ad468 ]
The below commit introduced a warning message when phy state is not in
the states: PHY_HALTED, PHY_READY, and PHY_UP.
commit 744d23c71a ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
mtk-star-emac doesn't need mdiobus suspend/resume. To fix the warning
message during resume, indicate the phy resume/suspend is managed by the
mac when probing.
Fixes: 744d23c71a ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Signed-off-by: Jian Hui Lee <jianhui.lee@canonical.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240708065210.4178980-1-jianhui.lee@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit af253aef183a31ce62d2e39fc520b0ebfb562bb9 ]
The original function call passed size of smap->bucket before the number of
buckets which raises the error 'calloc-transposed-args' on compilation.
Vlastimil Babka added:
The order of parameters can be traced back all the way to 6ac99e8f23
("bpf: Introduce bpf sk local storage") accross several refactorings,
and that's why the commit is used as a Fixes: tag.
In v6.10-rc1, a different commit 2c321f3f70bc ("mm: change inlined
allocation helpers to account at the call site") however exposed the
order of args in a way that gcc-14 has enough visibility to start
warning about it, because (in !CONFIG_MEMCG case) bpf_map_kvcalloc is
then a macro alias for kvcalloc instead of a static inline wrapper.
To sum up the warning happens when the following conditions are all met:
- gcc-14 is used (didn't see it with gcc-13)
- commit 2c321f3f70bc is present
- CONFIG_MEMCG is not enabled in .config
- CONFIG_WERROR turns this from a compiler warning to error
Fixes: 6ac99e8f23 ("bpf: Introduce bpf sk local storage")
Reviewed-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Mohammad Shehar Yaar Tausif <sheharyaar48@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20240710100521.15061-2-vbabka@suse.cz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 62827d612a ]
bpf_local_storage_map_alloc() is the only caller of
__bpf_local_storage_map_alloc(). The remaining logic in
bpf_local_storage_map_alloc() is only a one liner setting
the smap->cache_idx.
Remove __bpf_local_storage_map_alloc() to simplify code.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230308065936.1550103-4-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: af253aef183a ("bpf: fix order of args in call to bpf_map_kvcalloc")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ddef81b5fd ]
Introduce new helper bpf_map_kvcalloc() for the memory allocation in
bpf_local_storage(). Then the allocation will charge the memory from the
map instead of from current, though currently they are the same thing as
it is only used in map creation path now. By charging map's memory into
the memcg from the map, it will be more clear.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://lore.kernel.org/r/20230210154734.4416-3-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: af253aef183a ("bpf: fix order of args in call to bpf_map_kvcalloc")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 552d42a356 ]
'struct bpf_local_storage_elem' has an unused 56 byte padding at the
end due to struct's cache-line alignment requirement. This padding
space is overlapped by storage value contents, so if we use sizeof()
to calculate the total size, we overinflate it by 56 bytes. Use
offsetof() instead to calculate more exact memory use.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221221013036.3427431-1-martin.lau@linux.dev
Stable-dep-of: af253aef183a ("bpf: fix order of args in call to bpf_map_kvcalloc")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c83597fa5d ]
Refactor codes so that inode/task/sk storage implementation
can maximally share the same code. I also added some comments
in new function bpf_local_storage_unlink_nolock() to make
codes easy to understand. There is no functionality change.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042845.672944-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: af253aef183a ("bpf: fix order of args in call to bpf_map_kvcalloc")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f153831097b4435f963e385304cc0f1acba1c657 ]
X would not start in my old 32-bit partition (and the "n"-handling looks
just as wrong on 64-bit, but for whatever reason did not show up there):
"n" must be accumulated over all pages before it's added to "offset" and
compared with "copy", immediately after the skb_frag_foreach_page() loop.
Fixes: d2d30a376d9c ("net: allow skb_datagram_iter to be called from any context")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://patch.msgid.link/fef352e8-b89a-da51-f8ce-04bc39ee6481@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 442e26af9aa8115c96541026cbfeaaa76c85d178 ]
In rvu_check_rsrc_availability() in case of invalid SSOW req, an incorrect
data is printed to error log. 'req->sso' value is printed instead of
'req->ssow'. Looks like "copy-paste" mistake.
Fix this mistake by replacing 'req->sso' with 'req->ssow'.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 746ea74241 ("octeontx2-af: Add RVU block LF provisioning support")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240705095317.12640-1-amishin@t-argos.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f0c18025693707ec344a70b6887f7450bf4c826b ]
When running BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
platform, the following kernel panic occurs:
[...]
Oops[#1]:
CPU: 22 PID: 2824 Comm: test_progs Tainted: G OE 6.10.0-rc2+ #18
Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
... ...
ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
PRMD: 0000000c (PPLV0 +PIE +PWE)
EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
BADV: 0000000000000040
PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
Stack : ...
Call Trace:
[<9000000004162774>] copy_page_to_iter+0x74/0x1c0
[<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
[<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
[<90000000049aae34>] inet_recvmsg+0x54/0x100
[<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
[<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
[<900000000481e27c>] sys_recvfrom+0x1c/0x40
[<9000000004c076ec>] do_syscall+0x8c/0xc0
[<9000000003731da4>] handle_syscall+0xc4/0x160
Code: ...
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Fatal exception
Kernel relocated by 0x3510000
.text @ 0x9000000003710000
.data @ 0x9000000004d70000
.bss @ 0x9000000006469400
---[ end Kernel panic - not syncing: Fatal exception ]---
[...]
This crash happens every time when running sockmap_skb_verdict_shutdown
subtest in sockmap_basic.
This crash is because a NULL pointer is passed to page_address() in the
sk_msg_recvmsg(). Due to the different implementations depending on the
architecture, page_address(NULL) will trigger a panic on Loongarch
platform but not on x86 platform. So this bug was hidden on x86 platform
for a while, but now it is exposed on Loongarch platform. The root cause
is that a zero length skb (skb->len == 0) was put on the queue.
This zero length skb is a TCP FIN packet, which was sent by shutdown(),
invoked in test_sockmap_skb_verdict_shutdown():
shutdown(p1, SHUT_WR);
In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
page is put to this sge (see sg_set_page in sg_set_page), but this empty
sge is queued into ingress_msg list.
And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by
sg_page(sge). Pass this NULL page to copy_page_to_iter(), which passes it
to kmap_local_page() and to page_address(), then kernel panics.
To solve this, we should skip this zero length skb. So in sk_msg_recvmsg(),
if copy is zero, that means it's a zero length skb, skip invoking
copy_page_to_iter(). We are using the EFAULT return triggered by
copy_page_to_iter to check for is_fin in tcp_bpf.c.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Suggested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/e3a16eacdc6740658ee02a33489b1b9d4912f378.1719992715.git.tanggeliang@kylinos.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ec986ed7bab6801faed1440e8839dcc710331ff ]
Loss recovery undo_retrans bookkeeping had a long-standing bug where a
DSACK from a spurious TLP retransmit packet could cause an erroneous
undo of a fast recovery or RTO recovery that repaired a single
really-lost packet (in a sequence range outside that of the TLP
retransmit). Basically, because the loss recovery state machine didn't
account for the fact that it sent a TLP retransmit, the DSACK for the
TLP retransmit could erroneously be implicitly be interpreted as
corresponding to the normal fast recovery or RTO recovery retransmit
that plugged a real hole, thus resulting in an improper undo.
For example, consider the following buggy scenario where there is a
real packet loss but the congestion control response is improperly
undone because of this bug:
+ send packets P1, P2, P3, P4
+ P1 is really lost
+ send TLP retransmit of P4
+ receive SACK for original P2, P3, P4
+ enter fast recovery, fast-retransmit P1, increment undo_retrans to 1
+ receive DSACK for TLP P4, decrement undo_retrans to 0, undo (bug!)
+ receive cumulative ACK for P1-P4 (fast retransmit plugged real hole)
The fix: when we initialize undo machinery in tcp_init_undo(), if
there is a TLP retransmit in flight, then increment tp->undo_retrans
so that we make sure that we receive a DSACK corresponding to the TLP
retransmit, as well as DSACKs for all later normal retransmits, before
triggering a loss recovery undo. Note that we also have to move the
line that clears tp->tlp_high_seq for RTO recovery, so that upon RTO
we remember the tp->tlp_high_seq value until tcp_init_undo() and clear
it only afterward.
Also note that the bug dates back to the original 2013 TLP
implementation, commit 6ba8a3b19e ("tcp: Tail loss probe (TLP)").
However, this patch will only compile and work correctly with kernels
that have tp->tlp_retrans, which was added only in v5.8 in 2020 in
commit 76be93fc07 ("tcp: allow at most one TLP probe per flight").
So we associate this fix with that later commit.
Fixes: 76be93fc07 ("tcp: allow at most one TLP probe per flight")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Kevin Yang <yyd@google.com>
Link: https://patch.msgid.link/20240703171246.1739561-1-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aabfe57ebaa75841db47ea59091ec3c5a06d2f52 ]
The nr_dentry_negative counter is intended to only account negative
dentries that are present on the superblock LRU. Therefore, the LRU
add, remove and isolate helpers modify the counter based on whether
the dentry is negative, but the shrinker list related helpers do not
modify the counter, and the paths that change a dentry between
positive and negative only do so if DCACHE_LRU_LIST is set.
The problem with this is that a dentry on a shrinker list still has
DCACHE_LRU_LIST set to indicate ->d_lru is in use. The additional
DCACHE_SHRINK_LIST flag denotes whether the dentry is on LRU or a
shrink related list. Therefore if a relevant operation (i.e. unlink)
occurs while a dentry is present on a shrinker list, and the
associated codepath only checks for DCACHE_LRU_LIST, then it is
technically possible to modify the negative dentry count for a
dentry that is off the LRU. Since the shrinker list related helpers
do not modify the negative dentry count (because non-LRU dentries
should not be included in the count) when the dentry is ultimately
removed from the shrinker list, this can cause the negative dentry
count to become permanently inaccurate.
This problem can be reproduced via a heavy file create/unlink vs.
drop_caches workload. On an 80xcpu system, I start 80 tasks each
running a 1k file create/delete loop, and one task spinning on
drop_caches. After 10 minutes or so of runtime, the idle/clean cache
negative dentry count increases from somewhere in the range of 5-10
entries to several hundred (and increasingly grows beyond
nr_dentry_unused).
Tweak the logic in the paths that turn a dentry negative or positive
to filter out the case where the dentry is present on a shrink
related list. This allows the above workload to maintain an accurate
negative dentry count.
Fixes: af0c9af1b3 ("fs/dcache: Track & report number of negative dentries")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://lore.kernel.org/r/20240703121301.247680-1-bfoster@redhat.com
Acked-by: Ian Kent <ikent@redhat.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8bfb40be31ddea0cb4664b352e1797cfe6c91976 ]
Currently, the __d_clear_type_and_inode() writes the value flags to
dentry->d_flags, then immediately re-reads it in order to use it in a if
statement. This re-read is useless because no other update to
dentry->d_flags can occur at this point.
This commit therefore re-use flags in the if statement instead of
re-reading dentry->d_flags.
Signed-off-by: linke li <lilinke99@qq.com>
Link: https://lore.kernel.org/r/tencent_5E187BD0A61BA28605E85405F15228254D0A@qq.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Stable-dep-of: aabfe57ebaa7 ("vfs: don't mod negative dentry count when on shrinker list")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cf5bb09e742a9cf6349127e868329a8f69b7a014 ]
Add missing lock protection in poll routine when iterating xarray,
otherwise:
Even with RCU read lock held, only the slot of the radix tree is
ensured to be pinned there, while the data structure (e.g. struct
cachefiles_req) stored in the slot has no such guarantee. The poll
routine will iterate the radix tree and dereference cachefiles_req
accordingly. Thus RCU read lock is not adequate in this case and
spinlock is needed here.
Fixes: b817e22b2e91 ("cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode")
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240628062930.2467993-10-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 19f4f399091478c95947f6bd7ad61622300c30d9 ]
Reusing the msg_id after a maliciously completed reopen request may cause
a read request to remain unprocessed and result in a hung, as shown below:
t1 | t2 | t3
-------------------------------------------------
cachefiles_ondemand_select_req
cachefiles_ondemand_object_is_close(A)
cachefiles_ondemand_set_object_reopening(A)
queue_work(fscache_object_wq, &info->work)
ondemand_object_worker
cachefiles_ondemand_init_object(A)
cachefiles_ondemand_send_req(OPEN)
// get msg_id 6
wait_for_completion(&req_A->done)
cachefiles_ondemand_daemon_read
// read msg_id 6 req_A
cachefiles_ondemand_get_fd
copy_to_user
// Malicious completion msg_id 6
copen 6,-1
cachefiles_ondemand_copen
complete(&req_A->done)
// will not set the object to close
// because ondemand_id && fd is valid.
// ondemand_object_worker() is done
// but the object is still reopening.
// new open req_B
cachefiles_ondemand_init_object(B)
cachefiles_ondemand_send_req(OPEN)
// reuse msg_id 6
process_open_req
copen 6,A.size
// The expected failed copen was executed successfully
Expect copen to fail, and when it does, it closes fd, which sets the
object to close, and then close triggers reopen again. However, due to
msg_id reuse resulting in a successful copen, the anonymous fd is not
closed until the daemon exits. Therefore read requests waiting for reopen
to complete may trigger hung task.
To avoid this issue, allocate the msg_id cyclically to avoid reusing the
msg_id for a very short duration of time.
Fixes: c838305450 ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240628062930.2467993-9-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 12e009d60852f7bce0afc373ca0b320f14150418 ]
When queuing ondemand_object_worker() to re-open the object,
cachefiles_object is not pinned. The cachefiles_object may be freed when
the pending read request is completed intentionally and the related
erofs is umounted. If ondemand_object_worker() runs after the object is
freed, it will incur use-after-free problem as shown below.
process A processs B process C process D
cachefiles_ondemand_send_req()
// send a read req X
// wait for its completion
// close ondemand fd
cachefiles_ondemand_fd_release()
// set object as CLOSE
cachefiles_ondemand_daemon_read()
// set object as REOPENING
queue_work(fscache_wq, &info->ondemand_work)
// close /dev/cachefiles
cachefiles_daemon_release
cachefiles_flush_reqs
complete(&req->done)
// read req X is completed
// umount the erofs fs
cachefiles_put_object()
// object will be freed
cachefiles_ondemand_deinit_obj_info()
kmem_cache_free(object)
// both info and object are freed
ondemand_object_worker()
When dropping an object, it is no longer necessary to reopen the object,
so use cancel_work_sync() to cancel or wait for ondemand_object_worker()
to finish.
Fixes: 0a7e54c1959c ("cachefiles: resend an open request if the read request's object is closed")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240628062930.2467993-8-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 751f524635a4f076117d714705eeddadaf6748ee ]
Because after an object is dropped, requests for that object are useless,
cancel them to avoid causing other problems.
This prepares for the later addition of cancel_work_sync(). After the
reopen requests is generated, cancel it to avoid cancel_work_sync()
blocking by waiting for daemon to complete the reopen requests.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240628062930.2467993-7-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Stable-dep-of: 12e009d60852 ("cachefiles: wait for ondemand_object_worker to finish when dropping object")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b2415d1f4566b6939acacc69637eaa57815829c1 ]
Added CACHEFILES_ONDEMAND_OBJSTATE_DROPPING indicates that the cachefiles
object is being dropped, and is set after the close request for the dropped
object completes, and no new requests are allowed to be sent after this
state.
This prepares for the later addition of cancel_work_sync(). It prevents
leftover reopen requests from being sent, to avoid processing unnecessary
requests and to avoid cancel_work_sync() blocking by waiting for daemon to
complete the reopen requests.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240628062930.2467993-6-libaokun@huaweicloud.com
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Stable-dep-of: 12e009d60852 ("cachefiles: wait for ondemand_object_worker to finish when dropping object")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ece614a52bc9d219b839a6a29282b30d10e0c48 ]
In cachefiles_check_volume_xattr(), the error returned by vfs_getxattr()
is not passed to ret, so it ends up returning -ESTALE, which leads to an
endless loop as follows:
cachefiles_acquire_volume
retry:
ret = cachefiles_check_volume_xattr
ret = -ESTALE
xlen = vfs_getxattr // return -EIO
// The ret is not updated when xlen < 0, so -ESTALE is returned.
return ret
// Supposed to jump out of the loop at this judgement.
if (ret != -ESTALE)
goto error_dir;
cachefiles_bury_object
// EIO causes rename failure
goto retry;
Hence propagate the error returned by vfs_getxattr() to avoid the above
issue. Do the same in cachefiles_check_auxdata().
Fixes: 32e150037d ("fscache, cachefiles: Store the volume coherency data")
Fixes: 72b957856b ("cachefiles: Implement metadata/coherency data storage in xattrs")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20240628062930.2467993-5-libaokun@huaweicloud.com
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 82f0b6f041fad768c28b4ad05a683065412c226e ]
Commit 5ec8e8ea8b77 ("mm/sparsemem: fix race in accessing
memory_section->usage") changed pfn_section_valid() to add a READ_ONCE()
call around "ms->usage" to fix a race with section_deactivate() where
ms->usage can be cleared. The READ_ONCE() call, by itself, is not enough
to prevent NULL pointer dereference. We need to check its value before
dereferencing it.
Link: https://lkml.kernel.org/r/20240626001639.1350646-1-longman@redhat.com
Fixes: 5ec8e8ea8b77 ("mm/sparsemem: fix race in accessing memory_section->usage")
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Charan Teja Kalla <quic_charante@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit fd7eea27a3aed79b63b1726c00bde0d50cf207e2 upstream.
With INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO enabled the kernel will
be compiled with -ftrivial-auto-var-init=<...> which causes initialization
of stack variables at function entry time.
In order to avoid the performance impact that comes with this users can use
the "uninitialized" attribute to prevent such initialization.
Therefore provide the __uninitialized macro which can be used for cases
where INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO is enabled, but only
selected variables should not be initialized.
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240205154844.3757121-2-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit ffd603f214 ("usb: gadget: u_serial: Add null pointer check in
gs_start_io") adds null pointer checks to gs_start_io(), but it doesn't
fully fix the potential null pointer dereference issue. While
gserial_connect() calls gs_start_io() with port_lock held, gs_start_rx()
and gs_start_tx() release the lock during endpoint request submission.
This creates a window where gs_close() could set port->port_tty to NULL,
leading to a dereference when the lock is reacquired.
This patch adds a null pointer check for port->port_tty after RX/TX
submission, and removes the initial null pointer check in gs_start_io()
since the caller must hold port_lock and guarantee non-null values for
port_usb and port_tty.
Fixes: ffd603f214 ("usb: gadget: u_serial: Add null pointer check in gs_start_io")
Cc: stable@vger.kernel.org
Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
Bug: 283247551
Link: https://lore.kernel.org/lkml/20240116141801.396398-1-khtsai@google.com/
Change-Id: Ib850c7d313194074941576a7fdd3a9f58486ad78
Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
In commit 18685451fc4e ("inet: inet_defrag: prevent sk release while
still in use"), struct sk_buff dropped an unneeded union structure.
This did not change the actual structure size or layout at all, but the
abi checker didn't like it. So trick it by putting some __GENKSYMS__
markers in to preserve the abi correctly.
Bug: 335584858
Fixes: 18685451fc4e ("inet: inet_defrag: prevent sk release while still in use")
Change-Id: I78ca54f9df3e03cccebc326babf1d84ccb5dc781
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
ip_local_out() and other functions can pass skb->sk as function argument.
If the skb is a fragment and reassembly happens before such function call
returns, the sk must not be released.
This affects skb fragments reassembled via netfilter or similar
modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline.
Eric Dumazet made an initial analysis of this bug. Quoting Eric:
Calling ip_defrag() in output path is also implying skb_orphan(),
which is buggy because output path relies on sk not disappearing.
A relevant old patch about the issue was :
8282f27449 ("inet: frag: Always orphan skbs inside ip_defrag()")
[..]
net/ipv4/ip_output.c depends on skb->sk being set, and probably to an
inet socket, not an arbitrary one.
If we orphan the packet in ipvlan, then downstream things like FQ
packet scheduler will not work properly.
We need to change ip_defrag() to only use skb_orphan() when really
needed, ie whenever frag_list is going to be used.
Eric suggested to stash sk in fragment queue and made an initial patch.
However there is a problem with this:
If skb is refragmented again right after, ip_do_fragment() will copy
head->sk to the new fragments, and sets up destructor to sock_wfree.
IOW, we have no choice but to fix up sk_wmem accouting to reflect the
fully reassembled skb, else wmem will underflow.
This change moves the orphan down into the core, to last possible moment.
As ip_defrag_offset is aliased with sk_buff->sk member, we must move the
offset into the FRAG_CB, else skb->sk gets clobbered.
This allows to delay the orphaning long enough to learn if the skb has
to be queued or if the skb is completing the reasm queue.
In the former case, things work as before, skb is orphaned. This is
safe because skb gets queued/stolen and won't continue past reasm engine.
In the latter case, we will steal the skb->sk reference, reattach it to
the head skb, and fix up wmem accouting when inet_frag inflates truesize.
Fixes: 7026b1ddb6 ("netfilter: Pass socket pointer down through okfn().")
Diagnosed-by: Eric Dumazet <edumazet@google.com>
Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Reported-by: syzbot+e5167d7144a62715044c@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
(cherry picked from commit 18685451fc4e546fc0e718580d32df3c0e5c8272)
Bug: 335584858
Change-Id: I008a7b5fc4f51c9ad0ee14cf05ba21ca3ff5d6b3
Cc: Lee Jones <joneslee@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This catches the android14-6.1-lts branch up with many abi updates and
other changes that have been merged into the android14-6.1 branch
recently. Included in here are the following commits:
* 96d66062d0 ANDROID: GKI: Add initialization for mutex oem_data.
* 9e55f41695 ANDROID: abi_gki_aarch64_qcom: whitelist some mm symbols
* 56526cf940 ANDROID: mm: swap: export and whitelist get_shadow_from_swap_cache
* 74be75dd10 ANDROID: mm: madvise: vendor hook to tune page flags
* 7fc3794962 ANDROID: GKI: Add initialization for rwsem's oem_data and vendor_data.
* f6b99539f8 UPSTREAM: usb: dwc3: core: Skip setting event buffers for host only controllers
* ce6f9cab9e ANDROID: GKI: Update symbol list for mtk
* 256660feeb ANDROID: GKI: Add symbol to symbol list for vivo.
* d256bfafa9 ANDROID: vendor_hooks: add hooks in prctl_set_vma
* 0468527935 ANDROID: GKI: Update symbols list for vivo
* 81e9f0610c UPSTREAM: arm64: mm: Make hibernation aware of KFENCE
* aa8621e002 ANDROID: Update the ABI symbol list: set_normalized_timespec64
* 87fb1f2f1e Merge tag 'android14-6.1.84_r00' into android14-6.1
* 76d91af9da ANDROID: fix kernelci build breaks due to hid/uhid cyclic dependency
* c2dad37627 ANDROID: fix kernelci GCC builds of fips140.ko
* 8bffcfee7a UPSTREAM: sched/fair: Use all little CPUs for CPU-bound workloads
* a1926c3f2b ANDROID: GKI: Extend Tuxera symbol list
* 6aaa06c15d FROMLIST: locking/rwsem: Add __always_inline annotation to __down_write_common() and inlined callers
* 7682e638eb ANDROID: fips140: remove unnecessary no_sanitize(cfi)
* 681c91500c ANDROID: GKI: Add symbol to symbol list for vivo.
* 88b8a0c173 ANDROID: vendor_hooks: add hooks to modify pageflags
* 85a0c4bef6 ANDROID: GKI: Add pageflags for OEM
* 724b50f143 ANDROID: GKI: Update symbol list for vivo
* a5329424ea ANDROID: GKI: export sys_exit tracepoint
* 616650627d ANDROID: gki_defconfig: enable CONFIG_SYN_COOKIES
* 74a3c59c80 ANDROID: GKI: Update symbol list for vivo
* 1df05952a1 ANDROID: vendor_hooks: add hooks in rwsem
* 5747d79ab0 ANDROID: Update the ABI symbol list
* a7daeb4de8 ANDROID: GKI: Update symbol list for vivo
* 2870c78530 ANDROID: GKI: add percpu_rwsem vendor hooks
* 49203a2850 FROMGIT: erofs: fix possible memory leak in z_erofs_gbuf_exit()
* 0013c55474 BACKPORT: erofs: add a reserved buffer pool for lz4 decompression
* a35a90635c BACKPORT: erofs: do not use pagepool in z_erofs_gbuf_growsize()
* 2a23d59fd9 BACKPORT: erofs: rename per-CPU buffers to global buffer pool and make it configurable
* bb687ee6b6 BACKPORT: erofs: rename utils.c to zutil.c
* bab4765e5f BACKPORT: erofs: relaxed temporary buffers allocation on readahead
* a3fb83b3f5 BACKPORT: erofs: avoid pcpubuf.c inclusion if CONFIG_EROFS_FS_ZIP is off
* 91f4830fba ANDROID: GKI: Update rockchip symbols for bcmdhd sdio wifi.
* a8b3ebe7f9 ANDROID: 16K: Avoid mmap lock assertions for padding VMAs
* caa8ffe476 BACKPORT: scsi: ufs: core: Fix handling of lrbp->cmd
* d682bd3b2f ANDROID: GKI: Update symbol list for xiaomi
* e270773646 ANDROID: vendor_hooks: add hooks in rwsem read trylock
* 1a72e2f692 ANDROID: GKI: update symbol list file for xiaomi
* cd89d4fa07 ANDROID: GKI: Update symbol list for vivo
* 40f3c9d658 ANDROID: vendor_hooks: add vendor hooks for fuse request
* f9840ee562 ANDROID: Update the ABI symbol list
* 12709c5c1e ANDROID: GKI: add symbol list for meizu
* bda57805ab UPSTREAM: objtool: Fix HOSTCC flag usage
* b5164fdc98 UPSTREAM: objtool: Properly support make V=1
* fd5c2e1399 UPSTREAM: objtool: Install libsubcmd in build
* de6fb073c6 UPSTREAM: af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock.
* 0e9ee9221f UPSTREAM: af_unix: Don't peek OOB data without MSG_OOB.
* 30d168eb06 UPSTREAM: af_unix: Clear stale u->oob_skb.
* c0618d182a Revert "f2fs: fix to tag gcing flag on page during block migration"
* 25216be1ac ANDROID: Delete obsolete 16k_gki.fragment.
* 4c45e2f340 UPSTREAM: f2fs: clear writeback when compression failed
* 7c734edeaa ANDROID: GKI: Add symbol list for exynosauto
* b22d7c4ca0 FROMGIT: arm64: mte: Make mte_check_tfsr_*() conditional on KASAN instead of MTE
* 1331956fb5 ANDROID: gki_defconfig: Disable CONFIG_BRCMSTB_DPFE and CONFIG_BRCMSTB_MEMC
* 002be199aa FROMGIT: f2fs: fix to avoid use SSR allocate when do defragment
* dda68b1657 ANDROID: 16K: Only check basename of linker context
* 65aed0e2f7 ANDROID: 16K: Avoid and document padding madvise lock warning
* ec795e4eaa ANDROID: arm64: vdso32: support user-supplied flags
* ac9706483e ANDROID: GKI: Add initial symbol list for bcmstb
* b164ce27fa ANDROID: gki_defconfig: Enable Broadcom STB SoCs
* d385f8f23f UPSTREAM: mmc: core: Do not force a retune before RPMB switch
* 5c4b00b73e UPSTREAM: arm64/arm: arm_pmuv3: perf: Don't truncate 64-bit registers
* ad62d386c8 BACKPORT: net: phy: Allow drivers to always call into ->suspend()
* 7c09ddbf94 UPSTREAM: ARM: perf: Mark all accessor functions inline
* f5e0452e91 UPSTREAM: arm64: perf: Mark all accessor functions inline
* 6d7eea37f7 UPSTREAM: perf/core: Drop __weak attribute from arch_perf_update_userpage() prototype
* bf3022d3c3 UPSTREAM: ARM: perf: Allow the use of the PMUv3 driver on 32bit ARM
* 393be9f6d0 UPSTREAM: ARM: Make CONFIG_CPU_V7 valid for 32bit ARMv8 implementations
* 9d0a91c993 UPSTREAM: perf: pmuv3: Change GENMASK to GENMASK_ULL
* ab26945ffd UPSTREAM: perf: pmuv3: Move inclusion of kvm_host.h to the arch-specific helper
* 3e7e6fa4da UPSTREAM: perf: pmuv3: Abstract PMU version checks
* 55cfecdaaa UPSTREAM: arm64: perf: Abstract system register accesses away
* 278e973f01 UPSTREAM: arm64: perf: Move PMUv3 driver to drivers/perf
* 222a79a1bb UPSTREAM: arm64/perf: Replace PMU version number '0' with ID_AA64DFR0_EL1_PMUVer_NI
* 62a4d78dda ANDROID: GKI: Update oplus symbol list
* bfacfd198e UPSTREAM: block/blk-mq: Don't complete locally if capacities are different
* cf4893eb95 BACKPORT: sched: Add a new function to compare if two cpus have the same capacity
* e4622d460e ANDROID: GKI: Update rockchip symbols for rndis_host.
* f601b06a7e ANDROID: GKI: Update rockchip symbols for snd multi dais.
* 986fffb590 UPSTREAM: usb: gadget: f_fs: Fix race between aio_cancel() and AIO request complete
* 163070bc79 UPSTREAM: usb: gadget: f_fs: use io_data->status consistently
* 3c19f7015e ANDROID: set rewrite_absolute_paths_in_config for GKI aarch64.
* 9fcc2459ef UPSTREAM: wifi: cfg80211: Clear mlo_links info when STA disconnects
* e2c5ee3d15 ANDROID: ABI: Add usb_gadget_connect & usb_gadget_disconnect symbol
* c5abb61725 ANDROID: GKI: Update symbol list for mtk
* c36abc6d42 BACKPORT: iommu: Have __iommu_probe_device() check for already probed devices
* a7462d7032 ANDROID: ABI fixup for abi break in struct dst_ops
* bd2bcb81d4 BACKPORT: net: fix __dst_negative_advice() race
* 997e6b3f6a Revert "crypto: api - Disallow identical driver names"
Change-Id: I474f6f67727992fbe8ca9e3f85d9be7b33cd284c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 93aef9eda1cea9e84ab2453fcceb8addad0e46f1 upstream.
If the bitmap block that manages the inode allocation status is corrupted,
nilfs_ifile_create_inode() may allocate a new inode from the reserved
inode area where it should not be allocated.
Previous fix commit d325dc6eb7 ("nilfs2: fix use-after-free bug of
struct nilfs_root"), fixed the problem that reserved inodes with inode
numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to
bitmap corruption, but since the start number of non-reserved inodes is
read from the super block and may change, in which case inode allocation
may occur from the extended reserved inode area.
If that happens, access to that inode will cause an IO error, causing the
file system to degrade to an error state.
Fix this potential issue by adding a wraparound option to the common
metadata object allocation routine and by modifying
nilfs_ifile_create_inode() to disable the option so that it only allocates
inodes with inode numbers greater than or equal to the inode number read
in "nilfs->ns_first_ino", regardless of the bitmap status of reserved
inodes.
Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b164316808ec5de391c3e7b0148ec937d32d280d ]
A zoned device with a smaller last zone together with a zone capacity
smaller than the zone size does make any sense as that does not
correspond to any possible setup for a real device:
1) For ZNS and zoned UFS devices, all zones are always the same size.
2) For SMR HDDs, all zones always have the same capacity.
In other words, if we have a smaller last runt zone, then this zone
capacity should always be equal to the zone size.
Add a check in null_init_zoned_dev() to prevent a configuration to have
both a smaller zone size and a zone capacity smaller than the zone size.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240530054035.491497-2-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4a69c1264ff41bc5bf7c03101ada0454fbf08868 ]
During dummy-cycles xSPI will switch GPIO into Hi-Z mode. In that dummy
period voltage on data lines will slowly drop, what can cause
unintentional modebyte transmission. Value send to SPI memory chip will
depend on last address, and clock frequency.
To prevent unforeseen consequences of that behaviour, force send
single modebyte(0x00).
Modebyte will be send only if number of dummy-cycles is not equal
to 0. Code must also reduce dummycycle byte count by one - as one byte
is send as modebyte.
Signed-off-by: Witold Sadowski <wsadowski@marvell.com>
Link: https://msgid.link/r/20240529074037.1345882-2-wsadowski@marvell.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 068648aab72c9ba7b0597354ef4d81ffaac7b979 ]
write$nci(r0, &(0x7f0000000740)=ANY=[@ANYBLOB="610501"], 0xf)
Syzbot constructed a write() call with a data length of 3 bytes but a count value
of 15, which passed too little data to meet the basic requirements of the function
nci_rf_intf_activated_ntf_packet().
Therefore, increasing the comparison between data length and count value to avoid
problems caused by inconsistent data length and count.
Reported-and-tested-by: syzbot+71bfed2b2bcea46c98f2@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c758b77d4a0a0ed3a1292b3fd7a2aeccd1a169a4 ]
In nvmet_sq_destroy we capture sq->ctrl early and if it is non-NULL we
know that a ctrl was allocated (in the admin connect request handler)
and we need to release pending AERs, clear ctrl->sqs and sq->ctrl
(for nvme-loop primarily), and drop the final reference on the ctrl.
However, a small window is possible where nvmet_sq_destroy starts (as
a result of the client giving up and disconnecting) concurrently with
the nvme admin connect cmd (which may be in an early stage). But *before*
kill_and_confirm of sq->ref (i.e. the admin connect managed to get an sq
live reference). In this case, sq->ctrl was allocated however after it was
captured in a local variable in nvmet_sq_destroy.
This prevented the final reference drop on the ctrl.
Solve this by re-capturing the sq->ctrl after all inflight request has
completed, where for sure sq->ctrl reference is final, and move forward
based on that.
This issue was observed in an environment with many hosts connecting
multiple ctrls simoutanuosly, creating a delay in allocating a ctrl
leading up to this race window.
Reported-by: Alex Turin <alex@vastdata.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 611b7eb19d0a305d4de00280e4a71a1b15c507fc ]
Currently, when an adapter defines a max_write_len quirk,
the data will be chunked into data sizes equal to the
max_write_len quirk value. But the payload will be increased by
the size of the register address before transmission. The
resulting value always ends up larger than the limit set
by the quirk.
Avoid this error by setting regmap's max_write to the quirk's
max_write_len minus the number of bytes for the register and
padding. This allows the chunking to work correctly for this
limited case without impacting other use-cases.
Signed-off-by: Jim Wylder <jwylder@google.com>
Link: https://msgid.link/r/20240523211437.2839942-1-jwylder@google.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1bd293fcf3af84674e82ed022c049491f3768840 ]
bio_vec start offset may be relatively large particularly when large
folio gets added to the bio. A bigger offset will result in avoiding the
single-segment mapping optimization and end up using expensive
mempool_alloc further.
Rather than using absolute value, adjust bv_offset by
NVME_CTRL_PAGE_SIZE while checking if segment can be fitted into one/two
PRP entries.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f7c9ccaadffd13066353332c13d7e9bf73b8f92d ]
If do_map_benchmark() has failed, there is nothing useful to copy back
to userspace.
Suggested-by: Barry Song <21cnbao@gmail.com>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>