The UFS driver uses blk_mq_tagset_busy_iter() when identifying task
management requests to complete, however blk_mq_tagset_busy_iter()
doesn't work.
blk_mq_tagset_busy_iter() only iterates requests dispatched by the block
layer. That appears as if it might have started since commit 37f4a24c24
("blk-mq: centralise related handling into blk_mq_get_driver_tag") which
removed 'data->hctx->tags->rqs[rq->tag] = rq' from blk_mq_rq_ctx_init()
which gets called:
blk_get_request
blk_mq_alloc_request
__blk_mq_alloc_request
blk_mq_rq_ctx_init
Since UFS task management requests are not dispatched by the block
layer, hctx->tags->rqs[rq->tag] remains NULL, and since
blk_mq_tagset_busy_iter() relies on finding requests using
hctx->tags->rqs[rq->tag], UFS task management requests are never found by
blk_mq_tagset_busy_iter().
By using blk_mq_tagset_busy_iter(), the UFS driver was relying on internal
details of the block layer, which was fragile and subsequently got
broken. Fix by removing the use of blk_mq_tagset_busy_iter() and having
the driver keep track of task management requests.
Fixes: 1235fc569e ("scsi: ufs: core: Fix task management request completion timeout")
Fixes: 69a6c269c0 ("scsi: ufs: Use blk_{get,put}_request() to allocate and free TMFs")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Bug: 200291871
Link: https://lore.kernel.org/linux-scsi/20210922091059.4040-1-adrian.hunter@intel.com/
Change-Id: I4e11f40c7371fd66b3a6b570d06c956b113df142
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Before adding more data members in struct ufs_hba_with_hpb, rename this
data structure. This patch does not change any functionality.
Bug: 200291871
Change-Id: I6b0365ebcf8adf6cfa009218d8c4dc96fa629bde
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Currently the core UFS driver does not have a vops to notify when the
device is operational. This commit introduces a hook, which serves to
notify device completing initialization and is ready to accept I/O.
This is required by the FIPS140-2 [1] self integrity test of inline
encryption engine, which must run whenever the host controller is reset.
The code requires sleeping while waiting for I/O to complete and allocating
some memory dynamically, which requires the vendor hook to be restricted.
[1] https://csrc.nist.gov/publications/detail/fips/140/2/final
Bug: 185809932
Signed-off-by: Konstantin Vyshetsky <vkon@google.com>
Change-Id: I6f476f9c2e2b50574d2898c3f1ef6b648d92df24
Before this patch, someone who wants to use VMAP_STACK when
KASAN_GENERIC enabled must explicitly select KASAN_VMALLOC.
>From Will's suggestion [1]:
> I would _really_ like to move to VMAP stack unconditionally, and
> that would effectively force KASAN_VMALLOC to be set if KASAN is in use
Because VMAP_STACK now depends on either HW_TAGS or KASAN_VMALLOC if
KASAN enabled, in order to make VMAP_STACK selected unconditionally,
we bind KANSAN_GENERIC and KASAN_VMALLOC together.
Note that SW_TAGS supports neither VMAP_STACK nor KASAN_VMALLOC now,
so this is the first step to make VMAP_STACK selected unconditionally.
Bind KANSAN_GENERIC and KASAN_VMALLOC together is supposed to cost more
memory at runtime, thus the alternative is using SW_TAGS KASAN instead.
[1]: https://lore.kernel.org/lkml/20210204150100.GE20815@willie-the-truck/
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Link: https://lore.kernel.org/r/20210324040522.15548-6-lecopzer.chen@mediatek.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Bug: 201711661
(cherry picked from commit acc3042d62)
Signed-off-by: Yee Lee <yee.lee@mediatek.com>
Change-Id: Ic7884e3acafa02361c8a250028ceebdb1780a49e
After KASAN_VMALLOC works in arm64, we can randomize module region
into vmalloc area now.
Test:
VMALLOC area ffffffc010000000 fffffffdf0000000
before the patch:
module_alloc_base/end ffffffc008b80000 ffffffc010000000
after the patch:
module_alloc_base/end ffffffdcf4bed000 ffffffc010000000
And the function that insmod some modules is fine.
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Link: https://lore.kernel.org/r/20210324040522.15548-5-lecopzer.chen@mediatek.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Bug: 201711661
(cherry picked from commit 31d02e7ab0)
Signed-off-by: Yee Lee <yee.lee@mediatek.com>
Change-Id: I513f205113752b396245e9e8d64b973fcd7ffcb1
Linux support KAsan for VMALLOC since commit 3c5c3cfb9e
("kasan: support backing vmalloc space with real shadow memory")
Like how the MODULES_VADDR does now, just not to early populate
the VMALLOC_START between VMALLOC_END.
Before:
MODULE_VADDR: no mapping, no zero shadow at init
VMALLOC_VADDR: backed with zero shadow at init
After:
MODULE_VADDR: no mapping, no zero shadow at init
VMALLOC_VADDR: no mapping, no zero shadow at init
Thus the mapping will get allocated on demand by the core function
of KASAN_VMALLOC.
----------- vmalloc_shadow_start
| |
| |
| | <= non-mapping
| |
| |
|-----------|
|///////////|<- kimage shadow with page table mapping.
|-----------|
| |
| | <= non-mapping
| |
------------- vmalloc_shadow_end
|00000000000|
|00000000000| <= Zero shadow
|00000000000|
------------- KASAN_SHADOW_END
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@gmail.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210324040522.15548-2-lecopzer.chen@mediatek.com
[catalin.marinas@arm.com: add a build check on VMALLOC_START != MODULES_END]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Bug: 201711661
(cherry picked from commit 9a0732efa7)
Signed-off-by: Yee Lee <yee.lee@mediatek.com>
Change-Id: I82c4b624685cc9e083d720904787df2326cb9cc0
This lets us avoid doing unnecessary work on hardware that does not
support MTE, and will allow us to freely use MTE instructions in the
code called by mte_thread_switch().
Since this would mean that we do a redundant check in
mte_check_tfsr_el1(), remove it and add two checks now required in its
callers. This also avoids an unnecessary DSB+ISB sequence on the syscall
exit path for hardware not supporting MTE.
Fixes: 65812c6921 ("arm64: mte: Enable async tag check fault")
Cc: <stable@vger.kernel.org> # 5.13.x
Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/I02fd000d1ef2c86c7d2952a7f099b254ec227a5d
Link: https://lore.kernel.org/r/20210915190336.398390-1-pcc@google.com
[catalin.marinas@arm.com: adjust the commit log slightly]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 8c8a3b5bd9)
Change-Id: I131933b18581a40cea9abf556625c80b12e4e974
Bug: 192536783
When KASAN_HW_TAGS is selected, KASAN is enabled at boot time, and the
hardware supports MTE, we'll initialize `kernel_gcr_excl` with a value
dependent on KASAN_TAG_MAX. While the resulting value is a constant
which depends on KASAN_TAG_MAX, we have to perform some runtime work to
generate the value, and have to read the value from memory during the
exception entry path. It would be better if we could generate this as a
constant at compile-time, and use it as such directly.
Early in boot within __cpu_setup(), we initialize GCR_EL1 to a safe
value, and later override this with the value required by KASAN. If
CONFIG_KASAN_HW_TAGS is not selected, or if KASAN is disabeld at boot
time, the kernel will not use IRG instructions, and so the initial value
of GCR_EL1 is does not matter to the kernel. Thus, we can instead have
__cpu_setup() initialize GCR_EL1 to a value consistent with
KASAN_TAG_MAX, and avoid the need to re-initialize it during hotplug and
resume form suspend.
This patch makes arem64 use a compile-time constant KERNEL_GCR_EL1
value, which is compatible with KASAN_HW_TAGS when this is selected.
This removes the need to re-initialize GCR_EL1 dynamically, and acts as
an optimization to the entry assembly, which no longer needs to load
this value from memory. The redundant initialization hooks are removed.
In order to do this, KASAN_TAG_MAX needs to be visible outside of the
core KASAN code. To do this, I've moved the KASAN_TAG_* values into
<linux/kasan-tags.h>.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20210714143843.56537-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 8286824789)
Change-Id: I397025af7051bedcdd35987af5deec46c228f8e2
Signed-off-by: Peter Collingbourne <pcc@google.com>
Bug: 192536783
Somehow the android/abi_gki_aarch64.xml file became executable in a
previous commit. Fix this up by setting the mode properly.
Bug: 149040612
Cc: Chun-Hung Wu <chun-hung.wu@mediatek.com>
Cc: Sandeep Patil <sspatil@google.com>
Fixes: 55d7c4eca6 ("ANDROID: Update symbol list for mtk")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iecb61df3481cf2a257d2f999297a98cdbcb38295
This patch does not add any new symbol, but make the symbol list order
by the module.
Bug: 194515348
Signed-off-by: Kever Yang <kever.yang@rock-chips.com>
Change-Id: I0bec2f90a46ec92d7e391ab9e9c49180c4d87ae9
If the 1st NONHEAD lcluster of a pcluster isn't CBLKCNT lcluster type
rather than a HEAD or PLAIN type instead, which means its pclustersize
_must_ be 1 lcluster (since its uncompressed size < 2 lclusters),
as illustrated below:
HEAD HEAD / PLAIN lcluster type
____________ ____________
|_:__________|_________:__| file data (uncompressed)
. .
.____________.
|____________| pcluster data (compressed)
Such on-disk case was explained before [1] but missed to be handled
properly in the runtime implementation.
It can be observed if manually generating 1 lcluster-sized pcluster
with 2 lclusters (thus CBLKCNT doesn't exist.) Let's fix it now.
[1] https://lore.kernel.org/r/20210407043927.10623-1-xiang@kernel.org
Link: https://lore.kernel.org/r/20210510064715.29123-1-xiang@kernel.org
Fixes: cec6e93bea ("erofs: support parsing big pcluster compress indexes")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <xiang@kernel.org>
Bug: 201372112
Change-Id: I7e46baa993790f8908287ac36e19d32e536116db
(cherry picked from commit 0852b6ca94)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Prior to big pcluster, there was only one compressed page so it'd
easy to map this. However, when big pcluster is enabled, more work
needs to be done to handle multiple compressed pages. In detail,
- (maptype 0) if there is only one compressed page + no need
to copy inplace I/O, just map it directly what we did before;
- (maptype 1) if there are more compressed pages + no need to
copy inplace I/O, vmap such compressed pages instead;
- (maptype 2) if inplace I/O needs to be copied, use per-CPU
buffers for decompression then.
Another thing is how to detect inplace decompression is feasable or
not (it's still quite easy for non big pclusters), apart from the
inplace margin calculation, inplace I/O page reusing order is also
needed to be considered for each compressed page. Currently, if the
compressed page is the xth page, it shouldn't be reused as [0 ...
nrpages_out - nrpages_in + x], otherwise a full copy will be triggered.
Although there are some extra optimization ideas for this, I'd like
to make big pcluster work correctly first and obviously it can be
further optimized later since it has nothing with the on-disk format
at all.
Link: https://lore.kernel.org/r/20210407043927.10623-10-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Bug: 201372112
Change-Id: I8e8b90bd0a401850ea81d895dc55e5d8a2772ed7
(cherry picked from commit 598162d050)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Different from non-compact indexes, several lclusters are packed
as the compact form at once and an unique base blkaddr is stored for
each pack, so each lcluster index would take less space on avarage
(e.g. 2 bytes for COMPACT_2B.) btw, that is also why BIG_PCLUSTER
switch should be consistent for compact head0/1.
Prior to big pcluster, the size of all pclusters was 1 lcluster.
Therefore, when a new HEAD lcluster was scanned, blkaddr would be
bumped by 1 lcluster. However, that way doesn't work anymore for
big pcluster since we actually don't know the compressed size of
pclusters in advance (before reading CBLKCNT lcluster).
So, instead, let blkaddr of each pack be the first pcluster blkaddr
with a valid CBLKCNT, in detail,
1) if CBLKCNT starts at the pack, this first valid pcluster is
itself, e.g.
_____________________________________________________________
|_CBLKCNT0_|_NONHEAD_| .. |_HEAD_|_CBLKCNT1_| ... |_HEAD_| ...
^ = blkaddr base ^ += CBLKCNT0 ^ += CBLKCNT1
2) if CBLKCNT doesn't start at the pack, the first valid pcluster
is the next pcluster, e.g.
_________________________________________________________
| NONHEAD_| .. |_HEAD_|_CBLKCNT0_| ... |_HEAD_|_HEAD_| ...
^ = blkaddr base ^ += CBLKCNT0
^ += 1
When a CBLKCNT is found, blkaddr will be increased by CBLKCNT
lclusters, or a new HEAD is found immediately, bump blkaddr by 1
instead (see the picture above.)
Also noted if CBLKCNT is the end of the pack, instead of storing
delta1 (distance of the next HEAD lcluster) as normal NONHEADs,
it still uses the compressed block count (delta0) since delta1
can be calculated indirectly but the block count can't.
Adjust decoding logic to fit big pcluster compact indexes as well.
Link: https://lore.kernel.org/r/20210407043927.10623-9-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Bug: 201372112
Change-Id: I09d6bb4c9d390dc6169cb9dd4efbc7fd6b3be5d0
(cherry picked from commit b86269f438)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
When INCOMPAT_BIG_PCLUSTER sb feature is enabled, legacy compress indexes
will also have the same on-disk header compact indexes to keep per-file
configurations instead of leaving it zeroed.
If ADVISE_BIG_PCLUSTER is set for a file, CBLKCNT will be loaded for each
pcluster in this file by parsing 1st non-head lcluster.
Link: https://lore.kernel.org/r/20210407043927.10623-8-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Bug: 201372112
Change-Id: I97b8d1cc54e057789d22f1cdfafc21ed6a69b149
(cherry picked from commit cec6e93bea)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Big pcluster indicates the size of compressed data for each physical
pcluster is no longer fixed as block size, but could be more than 1
block (more accurately, 1 logical pcluster)
When big pcluster feature is enabled for head0/1, delta0 of the 1st
non-head lcluster index will keep block count of this pcluster in
lcluster size instead of 1. Or, the compressed size of pcluster
should be 1 lcluster if pcluster has no non-head lcluster index.
Also note that BIG_PCLUSTER feature reuses COMPR_CFGS feature since
it depends on COMPR_CFGS and will be released together.
Link: https://lore.kernel.org/r/20210407043927.10623-6-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Bug: 201372112
Change-Id: Ibd33f2764f4420f370f9293ed325efdba9ea70a7
(cherry picked from commit 5404c33010)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
When picking up inplace I/O pages, it should be traversed in reverse
order in aligned with the traversal order of file-backed online pages.
Also, index should be updated together when preloading compressed pages.
Previously, only page-sized pclustersize was supported so no problem
at all. Also rename `compressedpages' to `icpage_ptr' to reflect its
functionality.
Link: https://lore.kernel.org/r/20210407043927.10623-5-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Bug: 201372112
Change-Id: I752769d004c18f74636bcfdea767572daa6f7072
(cherry picked from commit 81382f5f5c)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Since multiple pcluster sizes could be used at once, the number of
compressed pages will become a variable factor. It's necessary to
introduce slab pools rather than a single slab cache now.
This limits the pclustersize to 1M (Z_EROFS_PCLUSTER_MAX_SIZE), and
get rid of the obsolete EROFS_FS_CLUSTER_PAGE_LIMIT, which has no
use now.
Link: https://lore.kernel.org/r/20210407043927.10623-4-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Bug: 201372112
Change-Id: I821f3cf35820dbb320eeedc3ae934fd1d455dfd7
(cherry picked from commit 9f6cc76e6f)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
To deal the with the cases which inplace decompression is infeasible
for some inplace I/O. Per-CPU buffers was introduced to get rid of page
allocation latency and thrash for low-latency decompression algorithms
such as lz4.
For the big pcluster feature, introduce multipage per-CPU buffers to
keep such inplace I/O pclusters temporarily as well but note that
per-CPU pages are just consecutive virtually.
When a new big pcluster fs is mounted, its max pclustersize will be
read and per-CPU buffers can be growed if needed. Shrinking adjustable
per-CPU buffers is more complex (because we don't know if such size
is still be used), so currently just release them all when unloading.
Link: https://lore.kernel.org/r/20210409190630.19569-1-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Bug: 201372112
Change-Id: I5b3ab69ef58aaea635911b0c08cceeb4b38122da
(cherry picked from commit 524887347f)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>