[ Upstream commit b80b829e9e ]
When route4_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Bug: 296347075
Fixes: 1109c00547 ("net: sched: RCU cls_route")
Reported-by: valis <sec@valis.email>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-4-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d4d3b53a4c)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Iefbd201b92847ec1349f92c107d7ef5aec3fb359
[ Upstream commit 76e42ae831 ]
When fw_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Bug: 296347075
Fixes: e35a8ee599 ("net: sched: fw use RCU")
Reported-by: valis <sec@valis.email>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-3-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7f691439b2)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I33c91c83d1cd8e889a7261adfa3779ca6c141088
[ Upstream commit 3044b16e7c ]
When u32_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.
This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
Fix this by no longer copying the tcf_result struct from the old filter.
Bug: 296347075
Fixes: de5df63228 ("net: sched: cls_u32 changes to knode must appear atomic to readers")
Reported-by: valis <sec@valis.email>
Reported-by: M A Ramdhan <ramdhan@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-2-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit aab2d095ce)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I1a8381c308cc97cf61d6f95a02992d2c553455c5
[ Upstream commit 0ebc1064e4 ]
Bail out with EOPNOTSUPP when adding rule to bound chain via
NFTA_RULE_CHAIN_ID. The following warning splat is shown when
adding a rule to a deleted bound chain:
WARNING: CPU: 2 PID: 13692 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]
CPU: 2 PID: 13692 Comm: chain-bound-rul Not tainted 6.1.39 #1
RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]
Bug: 296128351
Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 268cb07ef3)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Icf97f57d18bb2b30ed28a3de6cdd18661d7f1c3d
Prevents mishandling USB requests that are no longer present.
Bug: 161010552
Fixes: 483cb5629e ("ANDROID: usb: gadget: f_accessory: Add Android Accessory function")
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I8ff24d6d49214c3bd10a1b5d5e72814ec2a91c61
(cherry picked from commit 5228cb193d)
Signed-off-by: Lee Jones <joneslee@google.com>
Adding the following symbols:
- drm_connector_attach_max_bpc_property
Bug: 295934766
Change-Id: I7e32466caae2ae200b9549c62a3bcb2652612181
Signed-off-by: Petri Gynther <pgynther@google.com>
This patch is based on the commit 5da226dbfce3("mm: skip CMA pages when
they are not available") which skips cma pages reclaim when they are not
eligible for the current allocation context. In mglru, such pages are
added to the tail of the immediate generation to maintain better LRU
order, which is unlike the case of conventional LRU where such pages are
directly added to the head of the LRU list(akin to adding to head of the
youngest generation in mglru).
No observable issue without this patch on MGLRU, but logically it make
sense to skip the CMA page reclaim when those pages can't be satisfied for
the current allocation context.
Link: https://lkml.kernel.org/r/1691568344-13475-1-git-send-email-quic_charante@quicinc.com
Change-Id: I586415b3e3a92da23f3e79b9d63802a2ced03432
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 75d52d9304ef5b268eb798b0c679815290a0fc83 https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
Bug: 288383787
Bug: 291719697
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Bug: 195565510
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Ic8134eb902aa7d929e3121b2f69b1d258f570652
(cherry picked from commit 98c15b2bad)
Signed-off-by: Lee Jones <joneslee@google.com>
[ Upstream commit 3e337087c3 ]
Lion says:
-------
In the QFQ scheduler a similar issue to CVE-2023-31436
persists.
Consider the following code in net/sched/sch_qfq.c:
static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
unsigned int len = qdisc_pkt_len(skb), gso_segs;
// ...
if (unlikely(cl->agg->lmax < len)) {
pr_debug("qfq: increasing maxpkt from %u to %u for class %u",
cl->agg->lmax, len, cl->common.classid);
err = qfq_change_agg(sch, cl, cl->agg->class_weight, len);
if (err) {
cl->qstats.drops++;
return qdisc_drop(skb, sch, to_free);
}
// ...
}
Similarly to CVE-2023-31436, "lmax" is increased without any bounds
checks according to the packet length "len". Usually this would not
impose a problem because packet sizes are naturally limited.
This is however not the actual packet length, rather the
"qdisc_pkt_len(skb)" which might apply size transformations according to
"struct qdisc_size_table" as created by "qdisc_get_stab()" in
net/sched/sch_api.c if the TCA_STAB option was set when modifying the qdisc.
A user may choose virtually any size using such a table.
As a result the same issue as in CVE-2023-31436 can occur, allowing heap
out-of-bounds read / writes in the kmalloc-8192 cache.
-------
We can create the issue with the following commands:
tc qdisc add dev $DEV root handle 1: stab mtu 2048 tsize 512 mpu 0 \
overhead 999999999 linklayer ethernet qfq
tc class add dev $DEV parent 1: classid 1:1 htb rate 6mbit burst 15k
tc filter add dev $DEV parent 1: matchall classid 1:1
ping -I $DEV 1.1.1.2
This is caused by incorrectly assuming that qdisc_pkt_len() returns a
length within the QFQ_MIN_LMAX < len < QFQ_MAX_LMAX.
Bug: 292249631
Fixes: 462dbc9101 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Reported-by: Lion <nnamrec@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 70feebdbfa)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I69bec7b092e980fe8e0946c26ed9b5ac7c57bf3d
[ Upstream commit 25369891fc ]
Two parameters can be transformed into netlink policies and
validated while parsing the netlink message.
Bug: 292249631
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 3e337087c3 ("net/sched: sch_qfq: account for stab overhead in qfq_enqueue")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8e0326cbc4)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Idb17a2f9a50b060bcadca6cf7a8c6b7ecaf7d0f1
[ Upstream commit 87b5a5c209 ]
end key should be equal to start unless NFT_SET_EXT_KEY_END is present.
Its possible to add elements that only have a start key
("{ 1.0.0.0 . 2.0.0.0 }") without an internval end.
Insertion treats this via:
if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END))
end = (const u8 *)nft_set_ext_key_end(ext)->data;
else
end = start;
but removal side always uses nft_set_ext_key_end().
This is wrong and leads to garbage remaining in the set after removal
next lookup/insert attempt will give:
BUG: KASAN: slab-use-after-free in pipapo_get+0x8eb/0xb90
Read of size 1 at addr ffff888100d50586 by task nft-pipapo_uaf_/1399
Call Trace:
kasan_report+0x105/0x140
pipapo_get+0x8eb/0xb90
nft_pipapo_insert+0x1dc/0x1710
nf_tables_newsetelem+0x31f5/0x4e00
..
Bug: 293587745
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Reported-by: lonial con <kongln9170@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 90c3955beb)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I51a423aaa2c31c4df89776505b602aa2c1523b82
Running the following will run scripts/checkpatch.pl on a
patch of HEAD
tools/bazel run //common:checkpatch
or a given Git SHA1:
tools/bazel run //common:checkpatch -- --git_sha1 ...
For additional flags, see
tools/bazel run //common:checkpatch -- --help
For details, see
build/kernel/kleaf/docs/checkpatch.md
in your source tree.
Test: TH
Bug: 259995152
Change-Id: Iaad8fd69508cf9be11340166aafbb84930d4805c
Signed-off-by: Yifan Hong <elsk@google.com>
(cherry picked from commit 7dbf26568fcccde88470e7a25c07f0c7229e85f1)
When connecting to some DisplayPort partners, the initial status update
after entering DisplayPort Alt Mode notifies that the DFP_D/UFP_D is not in
the connected state. This leads to sending a configure message that keeps
the device in USB mode. The port partner then sets DFP_D/UFP_D to the
connected state and HPD to high in the same Attention message. Currently,
the HPD signal is dropped in order to handle configuration.
This patch saves changes to the HPD signal when the device chooses to
configure during dp_altmode_status_update, and invokes sysfs_notify if
necessary for HPD after configuring.
Fixes: 0e3bb7d689 ("usb: typec: Add driver for DisplayPort alternate mode")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/all/20230726020903.1409072-1-rdbabiera@google.com/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 288137625
(cherry picked from commit 5a5ccd61cfhttps://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
usb-linus)
[rd: resolved minor conflict in displayport.c]
Change-Id: I26ef49949aa0681ea6b88d8c95f5ddd54fe84664
Signed-off-by: RD Babiera <rdbabiera@google.com>
Avichal Rakesh reported a kernel panic that occurred when the UVC
gadget driver was removed from a gadget's configuration. The panic
involves a somewhat complicated interaction between the kernel driver
and a userspace component (as described in the Link tag below), but
the analysis did make one thing clear: The Gadget core should
accomodate gadget drivers calling usb_gadget_deactivate() as part of
their unbind procedure.
Currently this doesn't work. gadget_unbind_driver() calls
driver->unbind() while holding the udc->connect_lock mutex, and
usb_gadget_deactivate() attempts to acquire that mutex, which will
result in a deadlock.
The simple fix is for gadget_unbind_driver() to release the mutex when
invoking the ->unbind() callback. There is no particular reason for
it to be holding the mutex at that time, and the mutex isn't held
while the ->bind() callback is invoked. So we'll drop the mutex
before performing the unbind callback and reacquire it afterward.
We'll also add a couple of comments to usb_gadget_activate() and
usb_gadget_deactivate(). Because they run in process context they
must not be called from a gadget driver's ->disconnect() callback,
which (according to the kerneldoc for struct usb_gadget_driver in
include/linux/usb/gadget.h) may run in interrupt context. This may
help prevent similar bugs from arising in the future.
Reported-and-tested-by: Avichal Rakesh <arakesh@google.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: 286d9975a8 ("usb: gadget: udc: core: Prevent soft_connect_store() race")
Link: https://lore.kernel.org/linux-usb/4d7aa3f4-22d9-9f5a-3d70-1bd7148ff4ba@google.com/
Cc: Badhri Jagan Sridharan <badhri@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/48b2f1f1-0639-46bf-bbfc-98cb05a24914@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 291976100
Change-Id: Icff01d8e88f041af4bda8726242de9cd518a247a
(cherry picked from commit 65dadb2bee)
[arakesh: Current branch has diverged from upstream, but the
semantics of the bug remains the same. Manually cp'ed over
the diff from upstream patch]
Signed-off-by: Avichal Rakesh <arakesh@google.com>
From the UFSHCI 4.0 specification, about the legacy (single queue) mode:
"The host controller always process transfer requests in-order according
to the order submitted to the list. In case of multiple commands with
single doorbell register ringing (batch mode), The dispatch order for
these transfer requests by host controller will base on their index in
the List. A transfer request with lower index value will be executed
before a transfer request with higher index value."
From the UFSHCI 4.0 specification, about the MCQ mode:
"Command Submission
1. Host SW writes an Entry to SQ
2. Host SW updates SQ doorbell tail pointer
Command Processing
3. After fetching the Entry, Host Controller updates SQ doorbell head
pointer
4. Host controller sends COMMAND UPIU to UFS device"
In other words, for both legacy and MCQ mode, UFS controllers are
required to forward commands to the UFS device in the order these
commands have been received from the host.
Notes:
- For legacy mode this is only correct if the host submits one
command at a time. The UFS driver does this.
- Also in legacy mode, the command order is not preserved if
auto-hibernation is enabled in the UFS controller. Hence, enable
zone write locking if auto-hibernation is enabled.
This patch improves performance as follows on my test setup:
- With the mq-deadline scheduler: 2.5x more IOPS for small writes.
- When not using an I/O scheduler compared to using mq-deadline with
zone locking: 4x more IOPS for small writes.
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Change-Id: Ie569b09ed78ecb91af0a13f56992f075f8d1f07a
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Bug: 234829282
Link: https://lore.kernel.org/linux-block/20230804154821.3232094-1-bvanassche@acm.org/T/#mbfe0e160cc8a47c16af598f1c0b3aeeb125d66de
Signed-off-by: Bart Van Assche <bvanassche@google.com>
If zoned writes (REQ_OP_WRITE) for a sequential write required zone have
a starting LBA that differs from the write pointer, e.g. because zoned
writes have been reordered, then the storage device will respond with an
UNALIGNED WRITE COMMAND error. Send commands that failed with an
unaligned write error to the SCSI error handler if zone write locking is
disabled. Let the SCSI error handler sort SCSI commands per LBA before
resubmitting these.
If zone write locking is disabled, increase the number of retries for
write commands sent to a sequential zone to the maximum number of
outstanding commands because in the worst case the number of times
reordered zoned writes have to be retried is (number of outstanding
writes per sequential zone) - 1.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Change-Id: I519cb3aedea7e0635be8af554edee76e8d162498
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Bug: 234829282
Link: https://lore.kernel.org/linux-block/20230804154821.3232094-1-bvanassche@acm.org/T/#ma6573df4361baa921c258e3b39685a5c4c509faa
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Measurements have shown that limiting the queue depth to one per zone for
zoned writes has a significant negative performance impact on zoned UFS
devices. Hence this patch that disables zone locking by the mq-deadline
scheduler if the storage controller preserves the command order. This
patch is based on the following assumptions:
- It happens infrequently that zoned write requests are reordered by the
block layer.
- The I/O priority of all write requests is the same per zone.
- Either no I/O scheduler is used or an I/O scheduler is used that
serializes write requests per zone.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Change-Id: I2000beaa430f29ea1096149a7d3b8a5423b679ec
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Bug: 234829282
Link: https://lore.kernel.org/linux-block/20230804154821.3232094-1-bvanassche@acm.org/T/#made79fdd0e1d8d466f734250f3868b4d6f8105db
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Writes in sequential write required zones must happen at the write
pointer. Even if the submitter of the write commands (e.g. a filesystem)
submits writes for sequential write required zones in order, the block
layer or the storage controller may reorder these write commands.
The zone locking mechanism in the mq-deadline I/O scheduler serializes
write commands for sequential zones. Some but not all storage controllers
require this serialization. Introduce a new request queue flag to allow
block drivers to indicate that they preserve the order of write commands
and thus do not require serialization of writes per zone.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Change-Id: Ibf578f0ec92074ccc2c11d2c24c8d6309294bc8c
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Bug: 234829282
Link: https://lore.kernel.org/linux-block/20230804154821.3232094-1-bvanassche@acm.org/T/#m539c093dff4322599af97bccea975f73f3f67013
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Prepare for applying the FROMLIST version of this patch.
This reverts commit 54678c4c75.
Bug: 234829282
Change-Id: I37215b4527d993e56b1e9a85c390c9ce843850e7
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Prepare for applying the FROMLIST version of this patch.
This reverts commit f57c245b52.
Bug: 234829282
Change-Id: I34cc27653fc8d22f8e068f2776be285c5e2806f3
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Prepare for applying the FROMLIST version of this patch.
This reverts commit 33aea9741e.
Bug: 234829282
Change-Id: I4928e41be2dfcf25dd21ce5feb4e3dcdd8b9df35
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Prepare for applying the FROMLIST version of this patch.
This reverts commit d701aaaeb4.
Bug: 234829282
Change-Id: I7f267cf661807c7c8c96ea05966d77b338d62de2
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Prepare for applying the FROMLIST version of this patch.
This reverts commit ca743f9d51.
Bug: 234829282
Change-Id: Id95600f80be575da720b098a86aa46b84d33a733
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Prepare for applying the FROMLIST version of this patch.
This reverts commit 2dffc600ca.
Bug: 234829282
Change-Id: I58fad37a8586350676b6d9be93aa01b1ce170a00
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Prepare for applying the FROMLIST version of this patch.
This reverts commit 3bfdcf06e4.
Bug: 234829282
Change-Id: I5b2492d9b1475f69a0650a7621a2ea1606d7fb92
Signed-off-by: Bart Van Assche <bvanassche@google.com>
MGLRU has a LRU list for each zone for each type (anon/file) in each
generation:
long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
The min_seq (oldest generation) can progress independently for each
type but the max_seq (youngest generation) is shared for both anon and
file. This is to maintain a common frame of reference.
In order for eviction to advance the min_seq of a type, all the per-zone
lists in the oldest generation of that type must be empty.
The eviction logic only considers pages from eligible zones for
eviction or promotion.
scan_folios() {
...
for (zone = sc->reclaim_idx; zone >= 0; zone--) {
...
sort_folio(); // Promote
...
isolate_folio(); // Evict
}
...
}
Consider the system has the movable zone configured and default 4
generations. The current state of the system is as shown below
(only illustrating one type for simplicity):
Type: ANON
Zone DMA32 Normal Movable Device
Gen 0 0 0 4GB 0
Gen 1 0 1GB 1MB 0
Gen 2 1MB 4GB 1MB 0
Gen 3 1MB 1MB 1MB 0
Now consider there is a GFP_KERNEL allocation request (eligible zone
index <= Normal), evict_folios() will return without doing any work
since there are no pages to scan in the eligible zones of the oldest
generation. Reclaim won't make progress until triggered from a ZONE_MOVABLE
allocation request; which may not happen soon if there is a lot of free
memory in the movable zone. This can lead to OOM kills, although there
is 1GB pages in the Normal zone of Gen 1 that we have not yet tried to
reclaim.
This issue is not seen in the conventional active/inactive LRU since
there are no per-zone lists.
If there are no (not enough) folios to scan in the eligible zones, move
folios from ineligible zone (zone_index > reclaim_index) to the next
generation. This allows for the progression of min_seq and reclaiming
from the next generation (Gen 1).
Qualcomm, Mediatek and raspberrypi [1] discovered this issue independently.
[1] https://github.com/raspberrypi/linux/issues/5395
Link: https://lkml.kernel.org/r/20230802025606.346758-1-kaleshsingh@google.com
Fixes: ac35a49023 ("mm: multi-gen LRU: minimal implementation")
Change-Id: I5bbf44bd7ffe42f4347df4be59a75c1603c9b947
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
Reported-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> [mediatek]
Tested-by: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Steven Barrett <steven@liquorix.net>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 1462260adc41c5974362cb54ff577c2a15b8c7b2 https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
Bug: 288383787
Bug: 291719697
[ folio -> page renames - Kalesh Singh ]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
blk_mq_run_queue() runs the queue asynchronously if BLK_MQ_F_BLOCKING
has been set. This is suboptimal since running the queue asynchronously
is slower than running the queue synchronously. This patch modifies
blk_mq_run_queue() as follows if BLK_MQ_F_BLOCKING has been set:
- Run the queue synchronously if it is allowed to sleep.
- Run the queue asynchronously if it is not allowed to sleep.
Additionally, blk_mq_run_hw_queue(hctx, false) calls are modified into
blk_mq_run_hw_queue(hctx, hctx->flags & BLK_MQ_F_BLOCKING) if the caller
may be invoked from atomic context.
The following caller chains have been reviewed:
blk_mq_run_hw_queue(hctx, false)
blk_mq_get_tag() /* may sleep, hence the functions it calls may also sleep */
blk_execute_rq() /* may sleep */
blk_mq_run_hw_queues(q, async=false)
blk_freeze_queue_start() /* may sleep */
blk_mq_requeue_work() /* may sleep */
scsi_kick_queue()
scsi_requeue_run_queue() /* may sleep */
scsi_run_host_queues()
scsi_ioctl_reset() /* may sleep */
blk_mq_insert_requests(hctx, ctx, list, run_queue_async=false)
blk_mq_dispatch_plug_list(plug, from_sched=false)
blk_mq_flush_plug_list(plug, from_schedule=false)
__blk_flush_plug(plug, from_schedule=false)
blk_add_rq_to_plug()
blk_mq_submit_bio() /* may sleep if REQ_NOWAIT has not been set */
blk_mq_plug_issue_direct()
blk_mq_flush_plug_list() /* see above */
blk_mq_dispatch_plug_list(plug, from_sched=false)
blk_mq_flush_plug_list() /* see above */
blk_mq_try_issue_directly()
blk_mq_submit_bio() /* may sleep if REQ_NOWAIT has not been set */
blk_mq_try_issue_list_directly(hctx, list)
blk_mq_insert_requests() /* see above */
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Change-Id: I597e0fa587057ac22e52f73351cbfa1de2c72a6b
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230721172731.955724-4-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bug: 291379528
(cherry picked from commit 65a558f66c git://git.kernel.dk/linux-block for-next)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
blk_mq_kick_requeue_list() calls blk_mq_run_hw_queues() asynchronously.
Leave out the direct blk_mq_run_hw_queues() call. This patch causes
scsi_run_queue() to call blk_mq_run_hw_queues() asynchronously instead
of synchronously. Since scsi_run_queue() is not called from the hot I/O
submission path, this patch does not affect the hot path.
This patch prepares for allowing blk_mq_run_hw_queue() to sleep if
BLK_MQ_F_BLOCKING has been set. scsi_run_queue() may be called from
atomic context and must not sleep. Hence the removal of the
blk_mq_run_hw_queues(q, false) call. See also scsi_unblock_requests().
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Change-Id: Ia7447d2cc803cb56728a902fd8562e85640d5fd0
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20230721172731.955724-3-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bug: 291379528
(cherry picked from commit d42e2e3448 git://git.kernel.dk/linux-block for-next)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
If you have trouble reading this new file format, please refresh your
prebuilt version of STG with repo sync.
Bug: 294213765
Change-Id: I4d7ee716231956c5f4da1343cc0db5170aaaa3b1
Signed-off-by: Giuliano Procida <gprocida@google.com>
In struct usb_phy, the reserved slot 0 is reserved for a notify_port_status
callback addition for ABI freeze. Now this api is accepted on upstream.
Therefore, use ANDROID_KABI_USE to apply this api.
Bug: 286930662
Change-Id: Iae894f9dfff77fd1f23bb48fefdb9b682c54de57
Signed-off-by: Stanley Chang <stanley_chang@realtek.com>