The below is one path where race between page_ext and offline of the
respective memory blocks will cause use-after-free on the access of
page_ext structure.
process1 process2
--------- ---------
a)doing /proc/page_owner doing memory offline
through offline_pages.
b)PageBuddy check is failed
thus proceed to get the
page_owner information
through page_ext access.
page_ext = lookup_page_ext(page);
migrate_pages();
.................
Since all pages are successfully
migrated as part of the offline
operation,send MEM_OFFLINE notification
where for page_ext it calls:
offline_page_ext()-->
__free_page_ext()-->
free_page_ext()-->
vfree(ms->page_ext)
mem_section->page_ext = NULL
c) Check for the PAGE_EXT flags
in the page_ext->flags access
results into the use-after-free(leading
to the translation faults).
As mentioned above, there is really no synchronization between page_ext
access and its freeing in the memory_offline.
The memory offline steps(roughly) on a memory block is as below:
1) Isolate all the pages
2) while(1)
try free the pages to buddy.(->free_list[MIGRATE_ISOLATE])
3) delete the pages from this buddy list.
4) Then free page_ext.(Note: The struct page is still alive as it is
freed only during hot remove of the memory which frees the memmap, which
steps the user might not perform).
This design leads to the state where struct page is alive but the struct
page_ext is freed, where the later is ideally part of the former which
just representing the page_flags (check [3] for why this design is
chosen).
The above mentioned race is just one example __but the problem persists
in the other paths too involving page_ext->flags access(eg:
page_is_idle())__.
Fix all the paths where offline races with page_ext access by
maintaining synchronization with rcu lock and is achieved in 3 steps:
1) Invalidate all the page_ext's of the sections of a memory block by
storing a flag in the LSB of mem_section->page_ext.
2) Wait till all the existing readers to finish working with the
->page_ext's with synchronize_rcu(). Any parallel process that starts
after this call will not get page_ext, through lookup_page_ext(), for
the block parallel offline operation is being performed.
3) Now safely free all sections ->page_ext's of the block on which
offline operation is being performed.
Note: If synchronize_rcu() takes time then optimizations can be done in
this path through call_rcu()[2].
Thanks to David Hildenbrand for his views/suggestions on the initial
discussion[1] and Pavan kondeti for various inputs on this patch.
[1] https://lore.kernel.org/linux-mm/59edde13-4167-8550-86f0-11fc67882107@quicinc.com/
[2] https://lore.kernel.org/all/a26ce299-aed1-b8ad-711e-a49e82bdd180@quicinc.com/T/#u
[3] https://lore.kernel.org/all/6fa6b7aa-731e-891c-3efb-a03d6a700efa@redhat.com/
Bug: 236222283
Bug: 240196534
Link: https://lore.kernel.org/all/1661496993-11473-1-git-send-email-quic_charante@quicinc.com/
Change-Id: Ib439ae19c61a557a5c70ea90e3c4b35a5583ba0d
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Signed-off-by: Minchan Kim <minchan@google.com>
(fixed merge conflicts and still exported lookup_page_ext)
(minchan: fixed page_pinner with new page_ext scheme)
For CMA allocation, it's really critical to migrate a page but
sometimes it fails. One of the reasons is some driver holds a
page refcount for a long time so VM couldn't migrate the page
at that time.
The concern here is there is no way to find the who hold the
refcount of the page effectively. This patch introduces feature
to keep tracking page's pinner. All get_page sites are vulnerable
to pin a page for a long time but the cost to keep track it would
be significat since get_page is the most frequent kernel operation.
Furthermore, the page could be not user page but kernel page which
is not related to the page migration failure.
Thus, this patch keeps tracks of only migration failed pages to
reduce runtime cost. Once page migration fails in CMA allocation
path, those pages are marked as "migration failure" and every
put_page operation against those pages, callstack of the put
are recorded into page_pinner buffer. Later, admin can see
what pages were failed and who released the refcount since the
failure. It really helps effectively to find out longtime refcount
holder to prevent the page migration.
note: page_pinner doesn't guarantee attributing/unattributing are
atomic if they happen at the same time. It's just best effort so
false-positive could happen.
Bug: 183414571
BUg: 240196534
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I603d0c0122734c377db6b1eb95848a6f734173a0
(cherry picked from commit 898cfbf094a2fc13c67fab5b5d3c916f0139833a)
The hardware-wrapped key support in this branch is based on my patch
"[RFC PATCH v3 3/3] fscrypt: add support for hardware-wrapped keys"
(https://lore.kernel.org/r/20211021181608.54127-4-ebiggers@kernel.org)
I've since made several updates to that patch and it is now at v7.
This commit brings in the updates from v3 to v7, to the extent possible
while retaining compatibility with the UAPI and on-disk format used for
this feature in Android. This mainly includes some improved log
messages, and compatibility with the blk-crypto updates.
Bug: 160883801
Link: https://lore.kernel.org/all/20221216203636.81491-5-ebiggers@kernel.org
Change-Id: I1c43ca55ec7e95dd06f8f7944100ffd14771d3a7
Signed-off-by: Eric Biggers <ebiggers@google.com>
Update this code to be compatible with the updated version of
"block: add basic hardware-wrapped key support".
Bug: 160883801
Change-Id: Ic6991ad163035870ace3cd468f53b21a824c5359
Signed-off-by: Eric Biggers <ebiggers@google.com>
The hardware-wrapped key support in this branch is based on my patch
"[RFC PATCH v3 1/3] block: add basic hardware-wrapped key support"
(https://lore.kernel.org/all/20211021181608.54127-2-ebiggers@kernel.org).
I've since made several updates to that patch and it is now at v7.
This commit brings in the updates from v3 to v7. The main change is
making blk_crypto_derive_sw_secret() operate on a struct block_device,
and adding blk_crypto_hw_wrapped_keys_compatible(). This aligns with
changes upstream in v6.1 and v6.2 that removed block-layer internal
structures from the API that blk-crypto exposes to upper layers.
There's also a slight change in prototype for ->derive_sw_secret, so a
couple out-of-tree drivers will need to be updated, but people
maintaining out-of-tree drivers know what they are dealing with anyway.
Bug: 160883801
Link: https://lore.kernel.org/r/20221216203636.81491-2-ebiggers@kernel.org
Change-Id: I0f285c11c2764064cd4a9d6eac0089099a9601ed
Signed-off-by: Eric Biggers <ebiggers@google.com>
The prototypes of blk_crypto_evict_key() and
blk_crypto_start_using_key() changed, so update the callers in
dm-default-key which is not upstream.
Bug: 160885805
Change-Id: Ie39a298d8aca77c042f11bbfa25fd9bf50593c52
Signed-off-by: Eric Biggers <ebiggers@google.com>
blk_crypto_get_keyslot, blk_crypto_put_keyslot, __blk_crypto_evict_key
and __blk_crypto_cfg_supported are only used internally by the
blk-crypto code, so move the out of blk-crypto-profile.h, which is
included by drivers that supply blk-crypto functionality.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20221114042944.1009870-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 3569788c08)
Change-Id: I80b07a1c3b6e6f41ffe48adbdb27a3ca4480ff75
Signed-off-by: Eric Biggers <ebiggers@google.com>
Add a blk_crypto_config_supported_natively helper that wraps
__blk_crypto_cfg_supported to retrieve the crypto_profile from the
request queue. With this fscrypt can stop including
blk-crypto-profile.h and rely on the public consumer interface in
blk-crypto.h.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20221114042944.1009870-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 6715c98b6c)
(resolved conflicts in blk_crypto_config_supported() and
__blk_crypto_bio_prep())
Change-Id: I40c4ab6bd9a108661c40c837227b6aed64685ae7
Signed-off-by: Eric Biggers <ebiggers@google.com>
Switch all public blk-crypto interfaces to use struct block_device
arguments to specify the device they operate on instead of th
request_queue, which is a block layer implementation detail.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20221114042944.1009870-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit fce3caea0f)
(resolved conflict in blk_crypto_config_supported())
Change-Id: Ifde7cf1c8a2a5ddfb2fde4e5fb118269a3bfcdb0
Signed-off-by: Eric Biggers <ebiggers@google.com>
After recent fixes [1], speculative page fault walks are performed with
disabled interrupts, therefore do not depend on ALLOC_SPLIT_PTLOCKS
which would affect them if performed under RCU protection. Remove
unnecessary config dependency.
[1] 5fcb50b0559a ("ANDROID: mm: fix speculative walk which is unsafe under RCU")
Bug: 253557903
Change-Id: Ia1c835c7b08419f8fce61fa4f7e6842fbf786229
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Since Android has pcp list for MIGRATE_CMA[1], it could cause
CMA allocation latency due to not freeing the MIGRATE_ISOLATE
page immediately.
Originally, MIGRATE_ISOLATED page is supposed to go buddy list
with skipping pcp list. Otherwise, the page could be reallocated
from pcp list or staying on the pcp list until the pcp is drained
so that CMA keeps retrying since it couldn't find the freed page
from buddy list. That worked before since the CMA pfnblocks changed
only from MIGRATE_CMA to MIGRATE_ISOLATE and free function logic
in page allocator has checked MIGRATE_ISOLATEness on every CMA
pages using below.
free_unref_page_commit
if (migratetype >= MIGRATE_PCPTYPES)
if(is_migrate_isolate(migratetype))
free_one_page(page);
It worked since enum MIGRATE_CMA was bigger than enum
MIGRATE_PCPTYPES but since [1], the enum MIGRATE_CMA is less than
MIGRATE_PCPTYPES so the logic above doesn't work any more.
It could cause following race
CPU 0 CPU 1
free_unref_page
migratetype = get_pfnblock_migratetype()
set_pcppage_migratetype(MIGRATE_CMA)
cma_alloc
alloc_contig_range
set_migrate_isolate(MIGRATE_ISOLATE)
add the page into pcp list
the page could be reallocated
This patch couldn't fix the race completely due to missing zone->lock
in order-0 page free(for performance reason). However, it's not a new
problem so we need to deal with the issue separately.
[1] ANDROID: mm: add cma pcp list
Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ibea20085ce5bfb4b74b83b041f9bda9a380120f9
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit d9e4b67784)
build.config.gki sources a GKI_BUILD_CONFIG_FRAGMENT before all of
the variables that are considered as part of a GKI kernel build are
declared. This reduces the effectiveness of a
GKI_BUILD_CONFIG_FRAGMENT, as it is only able to modify a subset of
the build variables.
Thus, move the logic to source GKI_BUILD_CONFIG_FRAGMENT to the end
of the GKI build config files to provide more flexibility for a
GKI_BUILD_CONFIG_FRAGMENT.
Bug: 262930113
Change-Id: I74abb45f9043acce04cb0052f54fded4340a9366
[isaacmanjarres: Modified build.config.gki.aarch64.fips140, which
did not exist on android13-5.15.]
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
(cherry picked from commit 69fefbb3db711e543ff0676526b7d285a4d10a14)
Introduce a new default trap handler for the host that can be set
from modules.
Bug: 244543039
Bug: 245034629
Change-Id: Iaabfa44f5f2c41af51f36ed4eec8762e7c951c01
Signed-off-by: Quentin Perret <qperret@google.com>
Introduce a notifier allowing a pKVM module to be notified for major
PSCI events: {CPU,SYSTEM}_SUSPEND, as well as on the resume path.
Bug: 244543039
Bug: 245034629
Change-Id: Ia82923445214925fc77e321457c8eab31f9d42e8
Signed-off-by: Quentin Perret <qperret@google.com>
Introduce a new handler allowing to notify pKVM modules when pKVM
detects an illegal access from the host.
Bug: 244543039
Bug: 245034629
Change-Id: I62133a8d967d91437e5216b307e449f8c83dfab6
Signed-off-by: Quentin Perret <qperret@google.com>
Introduce a new default SMC handler for the host that can be set from
modules.
Bug: 244543039
Bug: 245034629
Change-Id: I8481bfb1926a3cb433b15de5c1a99e3550710689
Signed-off-by: Quentin Perret <qperret@google.com>
Changes in 5.15.85
udf: Discard preallocation before extending file with a hole
udf: Fix preallocation discarding at indirect extent boundary
udf: Do not bother looking for prealloc extents if i_lenExtents matches i_size
udf: Fix extending file within last block
usb: gadget: uvc: Prevent buffer overflow in setup handler
USB: serial: option: add Quectel EM05-G modem
USB: serial: cp210x: add Kamstrup RF sniffer PIDs
USB: serial: f81232: fix division by zero on line-speed change
USB: serial: f81534: fix division by zero on line-speed change
xhci: Apply XHCI_RESET_TO_DEFAULT quirk to ADL-N
igb: Initialize mailbox message for VF reset
usb: dwc3: pci: Update PCIe device ID for USB3 controller on CPU sub-system for Raptor Lake
HID: uclogic: Add HID_QUIRK_HIDINPUT_FORCE quirk
Bluetooth: L2CAP: Fix u8 overflow
selftests: net: Use "grep -E" instead of "egrep"
net: loopback: use NET_NAME_PREDICTABLE for name_assign_type
Linux 5.15.85
Change-Id: Ia398b261925f9370124491034de3bc5e4dcc5022
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Move __pkvm_register_el2_call and __pkvm_load_el2_module out of the
MODULE ifdef so the associated EXPORT_SYMBOL are never alone.
Bug: 244543039
Bug: 244373730
Reported-by: kernel test robot <lkp@intel.com>
Change-Id: Icdac2ccd32d09388472c6500d4af951cc23439fb
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
[ Upstream commit 31d929de5a ]
When the name_assign_type attribute was introduced (commit
685343fc3b, "net: add name_assign_type netdev attribute"), the
loopback device was explicitly mentioned as one which would make use
of NET_NAME_PREDICTABLE:
The name_assign_type attribute gives hints where the interface name of a
given net-device comes from. These values are currently defined:
...
NET_NAME_PREDICTABLE:
The ifname has been assigned by the kernel in a predictable way
that is guaranteed to avoid reuse and always be the same for a
given device. Examples include statically created devices like
the loopback device [...]
Switch to that so that reading /sys/class/net/lo/name_assign_type
produces something sensible instead of returning -EINVAL.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3405a4beaa ]
Commit f7d8e387d9 ("HID: uclogic: Switch to Digitizer usage for
styluses") changed the usage used in UCLogic from "Pen" to "Digitizer".
However, the IS_INPUT_APPLICATION() macro evaluates to false for
HID_DG_DIGITIZER causing issues with the XP-Pen Star G640 tablet.
Add the HID_QUIRK_HIDINPUT_FORCE quirk to bypass the
IS_INPUT_APPLICATION() check.
Reported-by: Torge Matthies <openglfreak@googlemail.com>
Reported-by: Alexander Zhang <alex@alexyzhang.dev>
Tested-by: Alexander Zhang <alex@alexyzhang.dev>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 188c9c2e0c upstream.
The driver leaves the line speed unchanged in case a requested speed is
not supported. Make sure to handle the case where the current speed is
B0 (hangup) without dividing by zero when determining the clock source.
Fixes: 3aacac02f3 ("USB: serial: f81534: add high baud rate support")
Cc: stable@vger.kernel.org # 4.16
Cc: Ji-Ze Hong (Peter Hong) <hpeter@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a08ca6ebaf upstream.
The driver leaves the line speed unchanged in case a requested speed is
not supported. Make sure to handle the case where the current speed is
B0 (hangup) without dividing by zero when determining the clock source.
Fixes: 268ddb5e9b ("USB: serial: f81232: add high baud rate support")
Cc: stable@vger.kernel.org # 5.2
Cc: Ji-Ze Hong (Peter Hong) <hpeter@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e88906b169 upstream.
The RF sniffers are based on cp210x where the RF frontends
are based on a different USB stack.
RF sniffers can analyze packets meta data including power level
and perform packet injection.
Can be used to perform RF frontend self-test when connected to
a concentrator, ex. arch/arm/boot/dts/imx7d-flex-concentrator.dts
Signed-off-by: Bruno Thomsen <bruno.thomsen@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f3868f068 upstream.
When extending file within last block it can happen that the extent is
already rounded to the blocksize and thus contains the offset we want to
grow up to. In such case we would mistakenly expand the last extent and
make it one block longer than it should be, exposing unallocated block
in a file and causing data corruption. Fix the problem by properly
detecting this case and bailing out.
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ad53f0f71 upstream.
If rounded block-rounded i_lenExtents matches block rounded i_size,
there are no preallocation extents. Do not bother walking extent linked
list.
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cfe4c1b25d upstream.
When preallocation extent is the first one in the extent block, the
code would corrupt extent tree header instead. Fix the problem and use
udf_delete_aext() for deleting extent to avoid some code duplication.
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 16d0556568 upstream.
When extending file with a hole, we tried to preserve existing
preallocation for the file. However that is not very useful and
complicates code because the previous extent may need to be rounded to
block boundary as well (which we forgot to do thus causing data
corruption for sequence like:
xfs_io -f -c "pwrite 0x75e63 11008" -c "truncate 0x7b24b" \
-c "truncate 0xabaa3" -c "pwrite 0xac70b 22954" \
-c "pwrite 0x93a43 11358" -c "pwrite 0xb8e65 52211" file
with 512-byte block size. Just discard preallocation before extending
file to simplify things and also fix this data corruption.
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(Backport: neighboring lines changed +
move variable declaration out of the for loop initializer.)
Implement storing stack depot handles for alloc/free stack traces for slab
objects for the tag-based KASAN modes in a ring buffer.
This ring buffer is referred to as the stack ring.
On each alloc/free of a slab object, the tagged address of the object and
the current stack trace are recorded in the stack ring.
On each bug report, if the accessed address belongs to a slab object, the
stack ring is scanned for matching entries. The newest entries are used
to print the alloc/free stack traces in the report: one entry for alloc
and one for free.
The number of entries in the stack ring is fixed in this patch, but one of
the following patches adds a command-line argument to control it.
[andreyknvl@google.com: initialize read-write lock in stack ring]
Link: https://lkml.kernel.org/r/576182d194e27531e8090bad809e4136953895f4.1663700262.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/692de14b6b6a1bc817fd55e4ad92fc1f83c1ab59.1662411799.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 254721825
(cherry picked from commit 7bc0584e5d)
Change-Id: If7c3b88fafbedf30e012c20903a878f125f11355
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Add bug_type and alloc/free_track fields to kasan_report_info and add a
kasan_complete_mode_report_info() function that fills in these fields.
This function is implemented differently for different KASAN mode.
Change the reporting code to use the filled in fields instead of invoking
kasan_get_bug_type() and kasan_get_alloc/free_track().
For the Generic mode, kasan_complete_mode_report_info() invokes these
functions instead. For the tag-based modes, only the bug_type field is
filled in; alloc/free_track are handled in the next patch.
Using a single function that fills in these fields is required for the
tag-based modes, as the values for all three fields are determined in a
single procedure implemented in the following patch.
Link: https://lkml.kernel.org/r/8432b861054fa8d0cee79a8877dedeaf3b677ca8.1662411799.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 254721825
(cherry picked from commit 59e6e098d1)
Change-Id: Id341d252eda1b76b2a6e5fef42c37a745089f647
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>