If a lookup finds an existing inode, it must not change the existing bpf
program since it may be in use.
Bug: 267095363
Test: fuse_test, atest CtsScopedStorageHostTest
Change-Id: Icb00681fbcd51fdd4b0764906509093d98caeec4
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Some android userspace is sending BINDER_TYPE_FDA objects with
num_fds=0. Like the previous patch, this is reproducible when
playing a video.
Before commit 09184ae9b5 BINDER_TYPE_FDA objects with num_fds=0
were 'correctly handled', as in no fixup was performed.
After commit 09184ae9b5 we aggregate fixup and skip regions in
binder_ptr_fixup structs and distinguish between the two by using
the skip_size field: if it's 0, then it's a fixup, otherwise skip.
When processing BINDER_TYPE_FDA objects with num_fds=0 we add a
skip region of skip_size=0, and this causes issues because now
binder_do_deferred_txn_copies will think this was a fixup region.
To address that, return early from binder_translate_fd_array to
avoid adding an empty skip region.
Fixes: 09184ae9b5 ("binder: defer copies of pre-patched txn data")
Acked-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Alessandro Astone <ales.astone@gmail.com>
Link: https://lore.kernel.org/r/20220415120015.52684-1-ales.astone@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 257685302
(cherry picked from commit ef38de9217)
Change-Id: I34fab41c0c1beee366a5df4724b263e4385ad13b
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Lee Jones <joneslee@google.com>
When handling BINDER_TYPE_FDA object we are pushing a parent fixup
with a certain skip_size but no scatter-gather copy object, since
the copy is handled standalone.
If BINDER_TYPE_FDA is the last children the scatter-gather copy
loop will never stop to skip it, thus we are left with an item in
the parent fixup list. This will trigger the BUG_ON().
This is reproducible in android when playing a video.
We receive a transaction that looks like this:
obj[0] BINDER_TYPE_PTR, parent
obj[1] BINDER_TYPE_PTR, child
obj[2] BINDER_TYPE_PTR, child
obj[3] BINDER_TYPE_FDA, child
Fixes: 09184ae9b5 ("binder: defer copies of pre-patched txn data")
Acked-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Alessandro Astone <ales.astone@gmail.com>
Link: https://lore.kernel.org/r/20220415120015.52684-2-ales.astone@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 257685302
(cherry picked from commit 2d1746e3fd)
Change-Id: I3963a98dfc48b01d7bb8166aaa90341818bf6416
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Lee Jones <joneslee@google.com>
binder_uintptr_t is not the same as uintptr_t, so converting it into a
pointer requires a second cast:
drivers/android/binder.c: In function 'binder_translate_fd_array':
drivers/android/binder.c:2511:28: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
2511 | sender_ufda_base = (void __user *)sender_uparent->buffer + fda->parent_offset;
| ^
Fixes: 656e01f3ab ("binder: read pre-translated fds from sender buffer")
Acked-by: Todd Kjos <tkjos@google.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211207122448.1185769-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 257685302
(cherry picked from commit 9a0a930fe2)
Change-Id: I1c9b86a90bcf2be81012e59e0c472869f551e61a
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Lee Jones <joneslee@google.com>
BINDER_TYPE_PTR objects point to memory areas in the
source process to be copied into the target buffer
as part of a transaction. This implements a scatter-
gather model where non-contiguous memory in a source
process is "gathered" into a contiguous region in
the target buffer.
The data can include pointers that must be fixed up
to correctly point to the copied data. To avoid making
source process pointers visible to the target process,
this patch defers the copy until the fixups are known
and then copies and fixeups are done together.
There is a special case of BINDER_TYPE_FDA which applies
the fixup later in the target process context. In this
case the user data is skipped (so no untranslated fds
become visible to the target).
Reviewed-by: Martijn Coenen <maco@android.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20211130185152.437403-5-tkjos@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 137131904
Bug: 257685302
(cherry picked from commit 09184ae9b5)
[cmllamas: fix trivial merge conflict]
Change-Id: I6de75b192d1e3b2cc73c8d91077d97b608e8c5a9
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Lee Jones <joneslee@google.com>
This patch is to prepare for an up coming patch where we read
pre-translated fds from the sender buffer and translate them before
copying them to the target. It does not change run time.
The patch adds two new parameters to binder_translate_fd_array() to
hold the sender buffer and sender buffer parent. These parameters let
us call copy_from_user() directly from the sender instead of using
binder_alloc_copy_from_buffer() to copy from the target. Also the patch
adds some new alignment checks. Previously the alignment checks would
have been done in a different place, but this lets us print more
useful error messages.
Reviewed-by: Martijn Coenen <maco@android.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20211130185152.437403-4-tkjos@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 137131904
Bug: 257685302
(cherry picked from commit 656e01f3ab)
Change-Id: Ib786020e49bd33e35aec88d43965f9d98021fa53
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Lee Jones <joneslee@google.com>
Add an argument to finalize HVC/function that should be used from EL1
driver.
The argument holds standard error code. Incase of any error, pKVM
will erase pvmfw.
Bug: 268607700
Change-Id: I9f6a6bfc89d3381ab88938586d3b73dd5d94102a
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Add function pkvm_handle_system_misconfiguration that is used to
report misconfigurations to pKVM that can undermine its security,
so pKVM can't take the proper action.
This patch only add one event NO_DMA_ISOLATION to indicate that DMA
is not isolated and access the hypervisor.
The patch adds type pkvm_system_misconfiguration to identify the event
instead of having a void function with only one action as in the
future different events can have different responses.
Bug: 268607700
Change-Id: I9f0d2aeee25bd6bed622d327d6cbb36119c54c58
Signed-off-by: Mostafa Saleh <smostafa@google.com>
During page migration, the copy_highpage function is used to copy the
page data to the target page. If the source page is a userspace page
with MTE tags, the KASAN tag of the target page must have the match-all
tag in order to avoid tag check faults during subsequent accesses to the
page by the kernel. However, the target page may have been allocated in
a number of ways, some of which will use the KASAN allocator and will
therefore end up setting the KASAN tag to a non-match-all tag. Therefore,
update the target page's KASAN tag to match the source page.
We ended up unintentionally fixing this issue as a result of a bad
merge conflict resolution between commit e059853d14 ("arm64: mte:
Fix/clarify the PG_mte_tagged semantics") and commit 20794545c1 ("arm64:
kasan: Revert "arm64: mte: reset the page tag in page->flags""), which
preserved a tag reset for PG_mte_tagged pages which was considered to be
unnecessary at the time. Because SW tags KASAN uses separate tag storage,
update the code to only reset the tags when HW tags KASAN is enabled.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/If303d8a709438d3ff5af5fd85706505830f52e0c
Reported-by: "Kuan-Ying Lee (李冠穎)" <Kuan-Ying.Lee@mediatek.com>
Cc: <stable@vger.kernel.org> # 6.1
Fixes: 20794545c1 ("arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"")
Link: https://lore.kernel.org/all/20230215050911.1433132-1-pcc@google.com/
[pcc@google.com: applied merge resolution given in link]
Bug: 265863271
Change-Id: If303d8a709438d3ff5af5fd85706505830f52e0c
Currently the PG_mte_tagged page flag mostly means the page contains
valid tags and it should be set after the tags have been cleared or
restored. However, in mte_sync_tags() it is set before setting the tags
to avoid, in theory, a race with concurrent mprotect(PROT_MTE) for
shared pages. However, a concurrent mprotect(PROT_MTE) with a copy on
write in another thread can cause the new page to have stale tags.
Similarly, tag reading via ptrace() can read stale tags if the
PG_mte_tagged flag is set before actually clearing/restoring the tags.
Fix the PG_mte_tagged semantics so that it is only set after the tags
have been cleared or restored. This is safe for swap restoring into a
MAP_SHARED or CoW page since the core code takes the page lock. Add two
functions to test and set the PG_mte_tagged flag with acquire and
release semantics. The downside is that concurrent mprotect(PROT_MTE) on
a MAP_SHARED page may cause tag loss. This is already the case for KVM
guests if a VMM changes the page protection while the guest triggers a
user_mem_abort().
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[pcc@google.com: fix build with CONFIG_ARM64_MTE disabled]
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221104011041.290951-3-pcc@google.com
(cherry picked from commit e059853d14)
[pcc@google.com: resolved conflict in arch/arm64/include/asm/pgtable.h]
Bug: 265863271
Change-Id: Iff1bfa26982c16eac47120ee48a68b3fe60a5743
Our HID device need KEY_CAMERA_FOCUS event to control camera, but this
event is non-existent in current HID driver.
So we add this event in hid-input.c
Bug: 263846073
Link: https://lore.kernel.org/linux-input/Y+4YcnbPwWAnhrPt@kroah.com/
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: fengqi <fengqi@xiaomi.com>
Change-Id: I500881ea8b6b4e31099f2120e2c492f2793bf086
(cherry picked from commit af8dfb011fd0e434de7f0287e561a67757fb9346)
pKVM modules being rather small, it is expected for some basic sections
to be missing or empty (especially rodata and data). Make those optional
in the loader.
Bug: 269245057
Change-Id: I874050230de5cb4b3b29d316663400bb221e2021
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
It's important to know reserved-mem information in mobile world
since reserved memory via device tree keeps increased in platform
(e.g., 45% in our platform). Therefore, it's crucial to know the
reserved memory sizes breakdown for the memory accounting.
This patch prints out reserved memory details during boot to make
them visible.
Below is an example output:
[ 0.000000] OF: reserved mem: 0x00000009f9400000..0x00000009fb3fffff ( 32768 KB ) map reusable test1
[ 0.000000] OF: reserved mem: 0x00000000ffdf0000..0x00000000ffffffff ( 2112 KB ) map non-reusable test2
[ 0.000000] OF: reserved mem: 0x0000000091000000..0x00000000912fffff ( 3072 KB ) nomap non-reusable test3
Bug: 269588564
Change-Id: Idf77b3a9de70ed13c806d3b03d1886b5ae89da62
Signed-off-by: Martin Liu <liumartin@google.com>
Link: https://lore.kernel.org/r/20230209160954.1471909-1-liumartin@google.com
Signed-off-by: Rob Herring <robh@kernel.org>
(cherry picked from commit aeb9267eb6
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next)
When building gki_defconfig outside of the Android build system, copying
protected_exports fails:
$ make -skj"$(nproc)" LLVM=1 O=build gki_defconfig all
cp: cannot create regular file '/protected_exports': Permission denied
...
OUT_DIR is an Android build.sh specific variable, so it will not be
defined when using just kbuild. Use objtree instead, which is guaranteed
to be available through kbuild directly; OUT_DIR is passed to make via
O, which is used to ultimately define objtree, so there is no functional
change.
Bug: 268678245
Change-Id: I235cef7c848a7cf9df9d7d5343af33d95b501a15
Fixes: 9f3f9a2634e02 ("ANDROID: GKI: Do not modify protected exports source list")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Add android/abi_gki_aarch64.stg as initial ABI representation of the
KMI, enable trimming of symbols outside KMI and start enforcing KMI.
While this is hard enforcement in the code base, we still allow
controlled changes to the ABI until KMI freeze.
Test: TH
Bug: 269323432
Change-Id: I016fe12aff4d781640340e16a2ae278e6bf5cd84
Signed-off-by: Aleksei Vetrov <vvvvvv@google.com>
Updated with: bazel run //common:db845c_abi_update_symbol_lis
Update to add missing symbols required at runtime for unsigned
modules. This fixes failing build time check for the same
introduced with the aosp/2410926 if it re-lands.
Bug: 269240239
Test: TH
Change-Id: I43ce14dd0549fb7974e3e635e3ee1c02194c8b3a
Signed-off-by: Ramji Jiyani <ramjiyani@google.com>
Remove the restriction that the zone size must be a power of two. This
patch has been tested with the following test script:
. tests/zbd/rc
. common/null_blk
. common/scsi_debug
DESCRIPTION="test npo2 zone size support"
QUICK=1
requires() {
_have_fio
_have_driver f2fs
_have_module_param scsi_debug zone_size_mb
_have_scsi_debug
}
test() {
echo "Running ${TEST_NAME}"
local scsi_debug_params=(
delay=0
dev_size_mb=1024
sector_size=4096
zbc=host-managed
zone_nr_conv=0
zone_size_mb=3
)
_init_scsi_debug "${scsi_debug_params[@]}" &&
local zdev="/dev/${SCSI_DEBUG_DEVICES[0]}" fail &&
ls -ld "${zdev}" >>"${FULL}" &&
local fio_args=(
--direct=1
--file="${zdev}"
--gtod_reduce=1
--iodepth=64
--iodepth_batch=16
--ioengine=io_uring
--ioscheduler=none
--name=npo2zs
--runtime=10
--size=1M
--time_based=1
--zonemode=zbd
) &&
_run_fio_verify_io "${fio_args[@]}" >>"${FULL}" 2>&1 ||
fail=true
_exit_scsi_debug
if [ -z "$fail" ]; then
echo "Test complete"
else
echo "Test failed"
return 1
fi
}
Bug: 197782466
Bug: 269471019
Change-Id: I70b498ab8920b4e1a13e04b753fe176a632552b2
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Remove the restriction that the zone size must be a power of two.
Bug: 197782466
Bug: 269471019
Change-Id: I7bd9c8f19ec601b82e0e1c271c3e362ddaf9a0ed
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Add support for zone sizes that are not a power of two in nvmet.
Bug: 197782466
Bug: 269471019
Change-Id: I7ec207985799d3c7a82d26033b87e37b51935640
Signed-off-by: Bart Van Assche <bvanassche@google.com>
dm_zone_endio() updates the bi_sector of orig bio for zoned devices that
uses either native append or append emulation, and it is called before the
endio of the target. But target endio can still update the clone bio
after dm_zone_endio is called, thereby, the orig bio does not contain
the updated information anymore.
Currently, this is not a problem as the targets that support zoned devices
such as dm-zoned, dm-linear, and dm-crypt do not have an endio function,
and even if they do (such as dm-flakey), they don't modify the
bio->bi_iter.bi_sector of the cloned bio that is used to update the
orig_bio's bi_sector in dm_zone_endio function.
This is a prep patch for the new dm-po2zoned target as it modifies
bi_sector in the endio callback.
Call dm_zone_endio for zoned devices after calling the target's endio
function.
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Bug: 197782466
Bug: 269471019
Link: https://lore.kernel.org/linux-block/20220923173618.6899-12-p.raghav@samsung.com/
Change-Id: Ia7a96aac805a040f8ab109e6cfdf50ad9895e2ee
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Convert the power-of-2(po2) based calculation with zone size to be generic
in null_zone_no with optimization for po2 zone sizes.
The nr_zones calculation in null_init_zoned_dev has been replaced with a
division without special handling for po2 zone sizes as this function is
called only during the initialization and will not be invoked in the hot
path.
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed by: Adam Manzanares <a.manzanares@samsung.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Bug: 197782466
Bug: 269471019
Link: https://lore.kernel.org/linux-block/20220923173618.6899-7-p.raghav@samsung.com/
Change-Id: I8d1a915e6e09b04095acdf964d31837c4206bc49
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Remove the condition which disallows non-power_of_2 zone size ZNS drive
to be updated and use generic method to calculate number of zones
instead of relying on log and shift based calculation on zone size.
The power_of_2 calculation has been replaced directly with generic
calculation without special handling. Both modified functions are not
used in hot paths, they are only used during initialization &
revalidation of the ZNS device.
As rounddown macro from math.h does not work for 32 bit architectures,
round down operation is open coded.
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed by: Adam Manzanares <a.manzanares@samsung.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Bug: 197782466
Bug: 269471019
Link: https://lore.kernel.org/linux-block/20220923173618.6899-6-p.raghav@samsung.com/
Change-Id: Id15b9b6f68498477f3d1c6159c5a459749f856a9
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Checking if a given sector is aligned to a zone is a common
operation that is performed for zoned devices. Add
bdev_is_zone_start helper to check for this instead of opencoding it
everywhere.
Convert the calculations on zone size to be generic instead of relying on
power-of-2(po2) based arithmetic in the block layer using the helpers
wherever possible.
The only hot path affected by this change for zoned devices with po2
zone size is in blk_check_zone_append() but bdev_is_zone_start() helper is
used to optimize the calculation for po2 zone sizes.
Finally, allow zoned devices with non po2 zone sizes provided that their
zone capacity and zone size are equal. The main motivation to allow zoned
devices with non po2 zone size is to remove the unmapped LBA between
zone capcity and zone size for devices that cannot have a po2 zone
capacity.
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Bug: 197782466
Bug: 269471019
Link: https://lore.kernel.org/linux-block/20220923173618.6899-4-p.raghav@samsung.com/
Change-Id: I2ecc186d7b14f5508b6abfe9821526d39a21d7e4
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Adapt bdev_nr_zones and disk_zone_no functions so that they can
also work for non-power-of-2 zone sizes.
As the existing deployments assume that a device zone size is a power of
2 number of sectors, power-of-2 optimized calculation is used for those
devices.
There are no direct hot paths modified and the changes just
introduce one new branch per call.
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Adam Manzanares <a.manzanares@samsung.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Bug: 197782466
Bug: 269471019
Link: https://lore.kernel.org/linux-block/20220923173618.6899-2-p.raghav@samsung.com/
Change-Id: I1695f25f55579a342c44c6994fd43055d7356c81
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Header generation script is using the protected exports list
as a source to generate the header file during the kernel build.
Script preporcess the symbols in-place before using it to generate
an array of symbols for protected exports causing the source file
to change. This may force the kleaf to build kernel again even
though there are no real changes in terms of symbols.
Use a copy as a temp file for processing leaving the source file
un affected.
Unprotected symbol list is already a temp file; so it doesn't
affect that target.
Bug: 268678245
Test: TH
Change-Id: Ifb551639451d1c7bd935ff732bd1959647c014d7
Signed-off-by: Ramji Jiyani <ramjiyani@google.com>
Changes in 5.15.94
mm/migration: return errno when isolate_huge_page failed
migrate: hugetlb: check for hugetlb shared PMD in node migration
btrfs: limit device extents to the device size
btrfs: zlib: zero-initialize zlib workspace
ALSA: hda/realtek: Add Positivo N14KP6-TG
ALSA: emux: Avoid potential array out-of-bound in snd_emux_xg_control()
ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book2 Pro 360
ALSA: hda/realtek: Enable mute/micmute LEDs on HP Elitebook, 645 G9
tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw
of/address: Return an error when no valid dma-ranges are found
can: j1939: do not wait 250 ms if the same addr was already claimed
xfrm: compat: change expression for switch in xfrm_xlate64
IB/hfi1: Restore allocated resources on failed copyout
xfrm/compat: prevent potential spectre v1 gadget in xfrm_xlate32_attr()
IB/IPoIB: Fix legacy IPoIB due to wrong number of queues
RDMA/irdma: Fix potential NULL-ptr-dereference
RDMA/usnic: use iommu_map_atomic() under spin_lock()
xfrm: fix bug with DSCP copy to v6 from v4 tunnel
net: phylink: move phy_device_free() to correctly release phy device
bonding: fix error checking in bond_debug_reregister()
net: phy: meson-gxl: use MMD access dummy stubs for GXL, internal PHY
ionic: clean interrupt before enabling queue to avoid credit race
uapi: add missing ip/ipv6 header dependencies for linux/stddef.h
ice: Do not use WQ_MEM_RECLAIM flag for workqueue
net: dsa: mt7530: don't change PVC_EG_TAG when CPU port becomes VLAN-aware
net: mscc: ocelot: fix VCAP filters not matching on MAC with "protocol 802.1Q"
net/mlx5e: Move repeating clear_bit in mlx5e_rx_reporter_err_rq_cqe_recover
net/mlx5e: Introduce the mlx5e_flush_rq function
net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change
net/mlx5: Bridge, fix ageing of peer FDB entries
net/mlx5e: IPoIB, Show unknown speed instead of error
net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers
net/mlx5: fw_tracer, Zero consumer index when reloading the tracer
net/mlx5: Serialize module cleanup with reload and remove
igc: Add ndo_tx_timeout support
rds: rds_rm_zerocopy_callback() use list_first_entry()
selftests: forwarding: lib: quote the sysctl values
ALSA: pci: lx6464es: fix a debug loop
riscv: stacktrace: Fix missing the first frame
ASoC: topology: Return -ENOMEM on memory allocation failure
pinctrl: mediatek: Fix the drive register definition of some Pins
pinctrl: aspeed: Fix confusing types in return value
pinctrl: single: fix potential NULL dereference
spi: dw: Fix wrong FIFO level setting for long xfers
pinctrl: intel: Restore the pins that used to be in Direct IRQ mode
cifs: Fix use-after-free in rdata->read_into_pages()
net: USB: Fix wrong-direction WARNING in plusb.c
mptcp: be careful on subflow status propagation on errors
btrfs: free device in btrfs_close_devices for a single device filesystem
usb: core: add quirk for Alcor Link AK9563 smartcard reader
usb: typec: altmodes/displayport: Fix probe pin assign check
clk: ingenic: jz4760: Update M/N/OD calculation algorithm
ceph: flush cap releases when the session is flushed
riscv: Fixup race condition on PG_dcache_clean in flush_icache_pte
powerpc/64s/interrupt: Fix interrupt exit race with security mitigation switch
rtmutex: Ensure that the top waiter is always woken up
arm64: dts: meson-gx: Make mmc host controller interrupts level-sensitive
arm64: dts: meson-g12-common: Make mmc host controller interrupts level-sensitive
arm64: dts: meson-axg: Make mmc host controller interrupts level-sensitive
Fix page corruption caused by racy check in __free_pages
drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
drm/i915: Initialize the obj flags for shmem objects
drm/i915: Fix VBT DSI DVO port handling
x86/speculation: Identify processors vulnerable to SMT RSB predictions
KVM: x86: Mitigate the cross-thread return address predictions bug
Documentation/hw-vuln: Add documentation for Cross-Thread Return Predictions
Linux 5.15.94
Change-Id: I46aca6bfb09ef8e68122a41734968906982b2a5f
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
... so that they can be loaded by Kleaf extensions
and read during the loading phase.
Moving forward, we should remove build configs in
the future and express constants in .bzl files. However,
for now, until kernel_build has been migrated to
use the defined cc_toolchain, we must keep this file.
Test: Treehugger
Bug: 228238975
Change-Id: Id9628663785970c460470382e1ae162e1112203d
Signed-off-by: Yifan Hong <elsk@google.com>
Remove the symbol list entries that are not actually symbols (any more)
to allow strict mode to be enabled.
Bug: 269346251
Change-Id: I32d93e0a3f46c01ccabd4251805066d627518ea0
Signed-off-by: Matthias Maennich <maennich@google.com>
The purpose of this vendor hook is to calculating
the total resume latency for device, CPU and
console, etc. Current vendor hook only supports
individual resume latency for device, each individual
CPU, etc, but lacking of the total resume latency tracing.
Bug: 232541623
Signed-off-by: Sophia Wang <yodagump@google.com>
Change-Id: Idd7c999dcd822cc0f7747baa11ec200eed5f5172
commit 6f0f2d5ef8 upstream.
By default, KVM/SVM will intercept attempts by the guest to transition
out of C0. However, the KVM_CAP_X86_DISABLE_EXITS capability can be used
by a VMM to change this behavior. To mitigate the cross-thread return
address predictions bug (X86_BUG_SMT_RSB), a VMM must not be allowed to
override the default behavior to intercept C0 transitions.
Use a module parameter to control the mitigation on processors that are
vulnerable to X86_BUG_SMT_RSB. If the processor is vulnerable to the
X86_BUG_SMT_RSB bug and the module parameter is set to mitigate the bug,
KVM will not allow the disabling of the HLT, MWAIT and CSTATE exits.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4019348b5e07148eb4d593380a5f6713b93c9a16.1675956146.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be8de49bea upstream.
Certain AMD processors are vulnerable to a cross-thread return address
predictions bug. When running in SMT mode and one of the sibling threads
transitions out of C0 state, the other sibling thread could use return
target predictions from the sibling thread that transitioned out of C0.
The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
when context switching to the idle thread. However, KVM allows a VMM to
prevent exiting guest mode when transitioning out of C0. A guest could
act maliciously in this situation, so create a new x86 BUG that can be
used to detect if the processor is vulnerable.
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <91cec885656ca1fcd4f0185ce403a53dd9edecb7.1675956146.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5ad7bbf3db upstream.
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.
Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:
amdgpu: probe of 0000:04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0000000000000090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
<TASK>
amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
devm_drm_dev_init_release+0x49/0x70
[...]
To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.
Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0], and also removed the no_scheduler check [1].
[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/
[1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/
Fixes: 067f44c8b4 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)")
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Guchun Chen <guchun.chen@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 462a8e08e0 upstream.
When we upgraded our kernel, we started seeing some page corruption like
the following consistently:
BUG: Bad page state in process ganesha.nfsd pfn:1304ca
page:0000000022261c55 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1304ca
flags: 0x17ffffc0000000()
raw: 0017ffffc0000000 ffff8a513ffd4c98 ffffeee24b35ec08 0000000000000000
raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
page dumped because: nonzero mapcount
CPU: 0 PID: 15567 Comm: ganesha.nfsd Kdump: loaded Tainted: P B O 5.10.158-1.nutanix.20221209.el7.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
Call Trace:
dump_stack+0x74/0x96
bad_page.cold+0x63/0x94
check_new_page_bad+0x6d/0x80
rmqueue+0x46e/0x970
get_page_from_freelist+0xcb/0x3f0
? _cond_resched+0x19/0x40
__alloc_pages_nodemask+0x164/0x300
alloc_pages_current+0x87/0xf0
skb_page_frag_refill+0x84/0x110
...
Sometimes, it would also show up as corruption in the free list pointer
and cause crashes.
After bisecting the issue, we found the issue started from commit
e320d3012d ("mm/page_alloc.c: fix freeing non-compound pages"):
if (put_page_testzero(page))
free_the_page(page, order);
else if (!PageHead(page))
while (order-- > 0)
free_the_page(page + (1 << order), order);
So the problem is the check PageHead is racy because at this point we
already dropped our reference to the page. So even if we came in with
compound page, the page can already be freed and PageHead can return
false and we will end up freeing all the tail pages causing double free.
Fixes: e320d3012d ("mm/page_alloc.c: fix freeing non-compound pages")
Link: https://lore.kernel.org/lkml/BYAPR02MB448855960A9656EEA81141FC94D99@BYAPR02MB4488.namprd02.prod.outlook.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db370a8b9f upstream.
Let L1 and L2 be two spinlocks.
Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top
waiter of L2.
Let T2 be the task holding L2.
Let T3 be a task trying to acquire L1.
The following events will lead to a state in which the wait queue of L2
isn't empty, but no task actually holds the lock.
T1 T2 T3
== == ==
spin_lock(L1)
| raw_spin_lock(L1->wait_lock)
| rtlock_slowlock_locked(L1)
| | task_blocks_on_rt_mutex(L1, T3)
| | | orig_waiter->lock = L1
| | | orig_waiter->task = T3
| | | raw_spin_unlock(L1->wait_lock)
| | | rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3)
spin_unlock(L2) | | | |
| rt_mutex_slowunlock(L2) | | | |
| | raw_spin_lock(L2->wait_lock) | | | |
| | wakeup(T1) | | | |
| | raw_spin_unlock(L2->wait_lock) | | | |
| | | | waiter = T1->pi_blocked_on
| | | | waiter == rt_mutex_top_waiter(L2)
| | | | waiter->task == T1
| | | | raw_spin_lock(L2->wait_lock)
| | | | dequeue(L2, waiter)
| | | | update_prio(waiter, T1)
| | | | enqueue(L2, waiter)
| | | | waiter != rt_mutex_top_waiter(L2)
| | | | L2->owner == NULL
| | | | wakeup(T1)
| | | | raw_spin_unlock(L2->wait_lock)
T1 wakes up
T1 != top_waiter(L2)
schedule_rtlock()
If the deadline of T1 is updated before the call to update_prio(), and the
new deadline is greater than the deadline of the second top waiter, then
after the requeue, T1 is no longer the top waiter, and the wrong task is
woken up which will then go back to sleep because it is not the top waiter.
This can be reproduced in PREEMPT_RT with stress-ng:
while true; do
stress-ng --sched deadline --sched-period 1000000000 \
--sched-runtime 800000000 --sched-deadline \
1000000000 --mmapfork 23 -t 20
done
A similar issue was pointed out by Thomas versus the cases where the top
waiter drops out early due to a signal or timeout, which is a general issue
for all regular rtmutex use cases, e.g. futex.
The problematic code is in rt_mutex_adjust_prio_chain():
// Save the top waiter before dequeue/enqueue
prerequeue_top_waiter = rt_mutex_top_waiter(lock);
rt_mutex_dequeue(lock, waiter);
waiter_update_prio(waiter, task);
rt_mutex_enqueue(lock, waiter);
// Lock has no owner?
if (!rt_mutex_owner(lock)) {
// Top waiter changed
----> if (prerequeue_top_waiter != rt_mutex_top_waiter(lock))
----> wake_up_state(waiter->task, waiter->wake_state);
This only takes the case into account where @waiter is the new top waiter
due to the requeue operation.
But it fails to handle the case where @waiter is not longer the top
waiter due to the requeue operation.
Ensure that the new top waiter is woken up so in all cases so it can take
over the ownerless lock.
[ tglx: Amend changelog, add Fixes tag ]
Fixes: c014ef69b3 ("locking/rtmutex: Add wake_state to rt_mutex_waiter")
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230117172649.52465-1-wander@redhat.com
Link: https://lore.kernel.org/r/20230202123020.14844-1-wander@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>