[ Upstream commit ec768766608092087dfb5c1fc45a16a6f524dee2 ]
Cortex-X4 erratum 3194386 and Neoverse-V3 erratum 3312417 are identical,
with duplicate Kconfig text and some unsightly ifdeffery. While we try
to share code behind CONFIG_ARM64_WORKAROUND_SPECULATIVE_SSBS, having
separate options results in a fair amount of boilerplate code, and this
will only get worse as we expand the set of affected CPUs.
To reduce this boilerplate, unify the two behind a common Kconfig
option. This removes the duplicate text and Kconfig logic, and removes
the need for the intermediate ARM64_WORKAROUND_SPECULATIVE_SSBS option.
The set of affected CPUs is described as a list so that this can easily
be extended.
I've used ARM64_ERRATUM_3194386 (matching the Neoverse-V3 erratum ID) as
the common option, matching the way we use ARM64_ERRATUM_1319367 to
cover Cortex-A57 erratum 1319537 and Cortex-A72 erratum 1319367.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240603111812.1514101-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[ Mark: fix conflicts, drop unneeded cpucaps.h ]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7187bb7d0b5c7dfa18ca82e9e5c75e13861b1d88 ]
Cortex-X4 and Neoverse-V3 suffer from errata whereby an MSR to the SSBS
special-purpose register does not affect subsequent speculative
instructions, permitting speculative store bypassing for a window of
time. This is described in their Software Developer Errata Notice (SDEN)
documents:
* Cortex-X4 SDEN v8.0, erratum 3194386:
https://developer.arm.com/documentation/SDEN-2432808/0800/
* Neoverse-V3 SDEN v6.0, erratum 3312417:
https://developer.arm.com/documentation/SDEN-2891958/0600/
To workaround these errata, it is necessary to place a speculation
barrier (SB) after MSR to the SSBS special-purpose register. This patch
adds the requisite SB after writes to SSBS within the kernel, and hides
the presence of SSBS from EL0 such that userspace software which cares
about SSBS will manipulate this via prctl(PR_GET_SPECULATION_CTRL, ...).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240508081400.235362-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
[ Mark: fix conflicts, drop unneeded cpucaps.h, fold in user_feature_fixup() ]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 89add40066f9ed9abe5f7f886fe5789ff7e0c50e upstream.
Tighten csum_start and csum_offset checks in virtio_net_hdr_to_skb
for GSO packets.
The function already checks that a checksum requested with
VIRTIO_NET_HDR_F_NEEDS_CSUM is in skb linear. But for GSO packets
this might not hold for segs after segmentation.
Syzkaller demonstrated to reach this warning in skb_checksum_help
offset = skb_checksum_start_offset(skb);
ret = -EINVAL;
if (WARN_ON_ONCE(offset >= skb_headlen(skb)))
By injecting a TSO packet:
WARNING: CPU: 1 PID: 3539 at net/core/dev.c:3284 skb_checksum_help+0x3d0/0x5b0
ip_do_fragment+0x209/0x1b20 net/ipv4/ip_output.c:774
ip_finish_output_gso net/ipv4/ip_output.c:279 [inline]
__ip_finish_output+0x2bd/0x4b0 net/ipv4/ip_output.c:301
iptunnel_xmit+0x50c/0x930 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x2296/0x2c70 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x759/0xa60 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4850 [inline]
netdev_start_xmit include/linux/netdevice.h:4864 [inline]
xmit_one net/core/dev.c:3595 [inline]
dev_hard_start_xmit+0x261/0x8c0 net/core/dev.c:3611
__dev_queue_xmit+0x1b97/0x3c90 net/core/dev.c:4261
packet_snd net/packet/af_packet.c:3073 [inline]
The geometry of the bad input packet at tcp_gso_segment:
[ 52.003050][ T8403] skb len=12202 headroom=244 headlen=12093 tailroom=0
[ 52.003050][ T8403] mac=(168,24) mac_len=24 net=(192,52) trans=244
[ 52.003050][ T8403] shinfo(txflags=0 nr_frags=1 gso(size=1552 type=3 segs=0))
[ 52.003050][ T8403] csum(0x60000c7 start=199 offset=1536
ip_summed=3 complete_sw=0 valid=0 level=0)
Mitigate with stricter input validation.
csum_offset: for GSO packets, deduce the correct value from gso_type.
This is already done for USO. Extend it to TSO. Let UFO be:
udp[46]_ufo_fragment ignores these fields and always computes the
checksum in software.
csum_start: finding the real offset requires parsing to the transport
header. Do not add a parser, use existing segmentation parsing. Thanks
to SKB_GSO_DODGY, that also catches bad packets that are hw offloaded.
Again test both TSO and USO. Do not test UFO for the above reason, and
do not test UDP tunnel offload.
GSO packet are almost always CHECKSUM_PARTIAL. USO packets may be
CHECKSUM_NONE since commit 10154dbded6d6 ("udp: Allow GSO transmit
from devices with no checksum offload"), but then still these fields
are initialized correctly in udp4_hwcsum/udp6_hwcsum_outgoing. So no
need to test for ip_summed == CHECKSUM_PARTIAL first.
This revises an existing fix mentioned in the Fixes tag, which broke
small packets with GSO offload, as detected by kselftests.
Link: https://syzkaller.appspot.com/bug?extid=e1db31216c789f552871
Link: https://lore.kernel.org/netdev/20240723223109.2196886-1-kuba@kernel.org
Fixes: e269d79c7d35 ("net: missing check virtio")
Cc: stable@vger.kernel.org
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240729201108.1615114-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 77baa5bafcbe1b2a15ef9c37232c21279c95481c upstream.
In extreme test scenarios:
the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime,
utime = 18446744073709518790 ns, rtime = 135989749728000 ns
In cputime_adjust() process, stime is greater than rtime due to
mul_u64_u64_div_u64() precision problem.
before call mul_u64_u64_div_u64(),
stime = 175136586720000, rtime = 135989749728000, utime = 1416780000.
after call mul_u64_u64_div_u64(),
stime = 135989949653530
unsigned reversion occurs because rtime is less than stime.
utime = rtime - stime = 135989749728000 - 135989949653530
= -199925530
= (u64)18446744073709518790
Trigger condition:
1). User task run in kernel mode most of time
2). ARM64 architecture
3). TICK_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime
Fixes: 3dc167ba57 ("sched/cputime: Improve cputime_adjust()")
Signed-off-by: Zheng Zucheng <zhengzucheng@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20240726023235.217771-1-zhengzucheng@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 82dbb57ac8d06dfe8227ba9ab11a49de2b475ae5 upstream.
Some firmware versions of the 9600 series SAS HBA byte-swap the REPORT
ZONES command reply buffer from ATA-ZAC devices by directly accessing the
buffer in the host memory. This does not respect the default command DMA
direction and causes IOMMU page faults on architectures with an IOMMU
enforcing write-only mappings for DMA_FROM_DEVICE DMA driection (e.g. AMD
hosts).
scsi 18:0:0:0: Direct-Access-ZBC ATA WDC WSH722020AL W870 PQ: 0 ANSI: 6
scsi 18:0:0:0: SATA: handle(0x0027), sas_addr(0x300062b2083e7c40), phy(0), device_name(0x5000cca29dc35e11)
scsi 18:0:0:0: enclosure logical id (0x300062b208097c40), slot(0)
scsi 18:0:0:0: enclosure level(0x0000), connector name( C0.0)
scsi 18:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
scsi 18:0:0:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
sd 18:0:0:0: Attached scsi generic sg2 type 20
sd 18:0:0:0: [sdc] Host-managed zoned block device
mpt3sas 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0xfff9b200 flags=0x0050]
mpt3sas 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0xfff9b300 flags=0x0050]
mpt3sas_cm0: mpt3sas_ctl_pre_reset_handler: Releasing the trace buffer due to adapter reset.
mpt3sas_cm0 fault info from func: mpt3sas_base_make_ioc_ready
mpt3sas_cm0: fault_state(0x2666)!
mpt3sas_cm0: sending diag reset !!
mpt3sas_cm0: diag reset: SUCCESS
sd 18:0:0:0: [sdc] REPORT ZONES start lba 0 failed
sd 18:0:0:0: [sdc] REPORT ZONES: Result: hostbyte=DID_RESET driverbyte=DRIVER_OK
sd 18:0:0:0: [sdc] 0 4096-byte logical blocks: (0 B/0 B)
Avoid such issue by always mapping the buffer of REPORT ZONES commands
using DMA_BIDIRECTIONAL (read+write IOMMU mapping). This is done by
introducing the helper function _base_scsi_dma_map() and using this helper
in _base_build_sg_scmd() and _base_build_sg_scmd_ieee() instead of calling
directly scsi_dma_map().
Fixes: 471ef9d4e4 ("mpt3sas: Build MPI SGL LIST on GEN2 HBAs and IEEE SGL LIST on GEN3 HBAs")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240719073913.179559-3-dlemoal@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b88f55389ad27f05ed84af9e1026aa64dbfabc9a upstream.
The kernel sleep profile is no longer working due to a recursive locking
bug introduced by commit 42a20f86dc ("sched: Add wrapper for get_wchan()
to keep task blocked")
Booting with the 'profile=sleep' kernel command line option added or
executing
# echo -n sleep > /sys/kernel/profiling
after boot causes the system to lock up.
Lockdep reports
kthreadd/3 is trying to acquire lock:
ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70
but task is already holding lock:
ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370
with the call trace being
lock_acquire+0xc8/0x2f0
get_wchan+0x32/0x70
__update_stats_enqueue_sleeper+0x151/0x430
enqueue_entity+0x4b0/0x520
enqueue_task_fair+0x92/0x6b0
ttwu_do_activate+0x73/0x140
try_to_wake_up+0x213/0x370
swake_up_locked+0x20/0x50
complete+0x2f/0x40
kthread+0xfb/0x180
However, since nobody noticed this regression for more than two years,
let's remove 'profile=sleep' support based on the assumption that nobody
needs this functionality.
Fixes: 42a20f86dc ("sched: Add wrapper for get_wchan() to keep task blocked")
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ed0172af5d6fc07d1b40ca82f5ca3979300369f7 ]
We've observed NFS clients with sync tasks sleeping in __rpc_execute
waiting on RPC_TASK_QUEUED that have not responded to a wake-up from
rpc_make_runnable(). I suspect this problem usually goes unnoticed,
because on a busy client the task will eventually be re-awoken by another
task completion or xprt event. However, if the state manager is draining
the slot table, a sync task missing a wake-up can result in a hung client.
We've been able to prove that the waker in rpc_make_runnable() successfully
calls wake_up_bit() (ie- there's no race to tk_runstate), but the
wake_up_bit() call fails to wake the waiter. I suspect the waker is
missing the load of the bit's wait_queue_head, so waitqueue_active() is
false. There are some very helpful comments about this problem above
wake_up_bit(), prepare_to_wait(), and waitqueue_active().
Fix this by inserting smp_mb__after_atomic() before the wake_up_bit(),
which pairs with prepare_to_wait() calling set_current_state().
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bf365071ea92b9579d5a272679b74052a5643e35 ]
When a task waiting for completion of a Store Data operation is
interrupted, an attempt is made to halt this operation. If this attempt
fails due to a hardware or firmware problem, there is a chance that the
SCLP facility might store data into buffers referenced by the original
operation at a later time.
Handle this situation by not releasing the referenced data buffers if
the halt attempt fails. For current use cases, this might result in a
leak of few pages of memory in case of a rare hardware/firmware
malfunction.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 68594cec291ff9523b9feb3f43fd853dcddd1f60 ]
syzkaller reported use-after-free in load_firmware_cb() [1].
The reason is because the module allocated a struct tuner in tuner_probe(),
and then the module initialization failed, the struct tuner was released.
A worker which created during module initialization accesses this struct
tuner later, it caused use-after-free.
The process is as follows:
task-6504 worker_thread
tuner_probe <= alloc dvb_frontend [2]
...
request_firmware_nowait <= create a worker
...
tuner_remove <= free dvb_frontend
...
request_firmware_work_func <= the firmware is ready
load_firmware_cb <= but now the dvb_frontend has been freed
To fix the issue, check the dvd_frontend in load_firmware_cb(), if it is
null, report a warning and just return.
[1]:
==================================================================
BUG: KASAN: use-after-free in load_firmware_cb+0x1310/0x17a0
Read of size 8 at addr ffff8000d7ca2308 by task kworker/2:3/6504
Call trace:
load_firmware_cb+0x1310/0x17a0
request_firmware_work_func+0x128/0x220
process_one_work+0x770/0x1824
worker_thread+0x488/0xea0
kthread+0x300/0x430
ret_from_fork+0x10/0x20
Allocated by task 6504:
kzalloc
tuner_probe+0xb0/0x1430
i2c_device_probe+0x92c/0xaf0
really_probe+0x678/0xcd0
driver_probe_device+0x280/0x370
__device_attach_driver+0x220/0x330
bus_for_each_drv+0x134/0x1c0
__device_attach+0x1f4/0x410
device_initial_probe+0x20/0x30
bus_probe_device+0x184/0x200
device_add+0x924/0x12c0
device_register+0x24/0x30
i2c_new_device+0x4e0/0xc44
v4l2_i2c_new_subdev_board+0xbc/0x290
v4l2_i2c_new_subdev+0xc8/0x104
em28xx_v4l2_init+0x1dd0/0x3770
Freed by task 6504:
kfree+0x238/0x4e4
tuner_remove+0x144/0x1c0
i2c_device_remove+0xc8/0x290
__device_release_driver+0x314/0x5fc
device_release_driver+0x30/0x44
bus_remove_device+0x244/0x490
device_del+0x350/0x900
device_unregister+0x28/0xd0
i2c_unregister_device+0x174/0x1d0
v4l2_device_unregister+0x224/0x380
em28xx_v4l2_init+0x1d90/0x3770
The buggy address belongs to the object at ffff8000d7ca2000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 776 bytes inside of
2048-byte region [ffff8000d7ca2000, ffff8000d7ca2800)
The buggy address belongs to the page:
page:ffff7fe00035f280 count:1 mapcount:0 mapping:ffff8000c001f000 index:0x0
flags: 0x7ff800000000100(slab)
raw: 07ff800000000100 ffff7fe00049d880 0000000300000003 ffff8000c001f000
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8000d7ca2200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8000d7ca2280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8000d7ca2300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8000d7ca2380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8000d7ca2400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
[2]
Actually, it is allocated for struct tuner, and dvb_frontend is inside.
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e3d55fbd160b3ca376599a68b4cddfdc67d4153 ]
The bandwidth fixup quirk doesn't know that SuperSpeed exists and has
the same 8 service intervals per millisecond as High Speed, hence its
calculations are wrong.
Assume that all speeds from HS up use 8 intervals per millisecond.
No further changes are needed, updated code has been confirmed to work
with all speeds from FS to SS.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Link: https://lore.kernel.org/r/20240414190040.2255a0bc@foxbook
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5cd7c25f6f0576073b3d03bc4cfb1e8ca63a1195 ]
Some SunplusIT cameras took a borderline interpretation of the UVC 1.5
standard, and fill the PTS and SCR fields with invalid data if the
package does not contain data.
"STC must be captured when the first video data of a video frame is put
on the USB bus."
Some SunplusIT devices send, e.g.,
buffer: 0xa7755c00 len 000012 header:0x8c stc 00000000 sof 0000 pts 00000000
buffer: 0xa7755c00 len 000012 header:0x8c stc 00000000 sof 0000 pts 00000000
buffer: 0xa7755c00 len 000668 header:0x8c stc 73779dba sof 070c pts 7376d37a
While the UVC specification meant that the first two packets shouldn't
have had the SCR bit set in the header.
This borderline/buggy interpretation has been implemented in a variety
of devices, from directly SunplusIT and from other OEMs that rebrand
SunplusIT products. So quirking based on VID:PID will be problematic.
All the affected modules have the following extension unit:
VideoControl Interface Descriptor:
guidExtensionCode {82066163-7050-ab49-b8cc-b3855e8d221d}
But the vendor plans to use that GUID in the future and fix the bug,
this means that we should use heuristic to figure out the broken
packets.
This patch takes care of this.
lsusb of one of the affected cameras:
Bus 001 Device 003: ID 1bcf:2a01 Sunplus Innovation Technology Inc.
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.01
bDeviceClass 239 Miscellaneous Device
bDeviceSubClass 2 ?
bDeviceProtocol 1 Interface Association
bMaxPacketSize0 64
idVendor 0x1bcf Sunplus Innovation Technology Inc.
idProduct 0x2a01
bcdDevice 0.02
iManufacturer 1 SunplusIT Inc
iProduct 2 HanChen Wise Camera
iSerial 3 01.00.00
bNumConfigurations 1
Tested-by: HungNien Chen <hn.chen@sunplusit.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Link: https://lore.kernel.org/r/20240323-resend-hwtimestamp-v10-2-b08e590d97c7@chromium.org
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 38e6f715b02b572f74677eb2f29d3b4bc6f1ddff ]
This commit adds a null check for the 'afb' variable in the
amdgpu_dm_plane_handle_cursor_update function. Previously, 'afb' was
assumed to be null, but was used later in the code without a null check.
This could potentially lead to a null pointer dereference.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_plane.c:1298 amdgpu_dm_plane_handle_cursor_update() error: we previously assumed 'afb' could be null (see line 1252)
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Hersen Wu <hersenxs.wu@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 065927b51eb1f042c3e026cebfd55e72ccc26093 ]
There is no need to add a lock in s_ctrl callback, it has been
synchronized by the ctrl_handler's lock, otherwise it may led to
a deadlock if the driver calls v4l2_ctrl_s_ctrl().
Signed-off-by: Ming Qian <ming.qian@nxp.com>
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e864180ee49b4d30e640fd1e1d852b86411420c9 ]
flush_gpu_tlb may be called from another thread while
device_gpu_recover is running.
Both of these threads access registers through the VF
RLCG interface during VF Full Access. Add a lock around this interface
to prevent race conditions between these threads.
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 030ffd4d43b433bc6671d9ec34fc12c59220b95d ]
When user space sets an invalid ta type, the pointer context will be empty.
So it need to check the pointer context before using it
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c02c1960c93eede587576625a1221205a68a904f ]
optimize the code to avoid pass a null pointer (hwmgr->backend)
to function smu7_update_edc_leakage_table.
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f683f24093dd94a831085fe0ea8e9dc4c6c1a2d1 ]
Function .set_power_profile_mode need an array as input
parameter. So define variable workload as an array to fix
the below coverity warning.
"Passing &workload to function hwmgr->hwmgr_func->set_power_profile_mode
which uses it as an array. This might corrupt or misinterpret adjacent
memory locations"
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 22a5daaec0660dd19740c4c6608b78f38760d1e6 ]
Clear resource leak warning that when the prepare fails,
the allocated amdgpu job object will never be released.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 10839ee6a977ed1f7d0f4deb29f2d7e5d1f2a9dd ]
[WHY]
Avoid race condition which puts LTTPR into bad state during UHBR LT.
[HOW]
Delay 30ms between starting UHBR TPS1 PHY output and sending TPS1 via DPCD.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 320d8dc612660da84c3b70a28658bb38069e5a9a ]
If we failed to link a free space entry because there's already a
conflicting entry for the same offset, we free the free space entry but
we don't free the associated bitmap that we had just allocated before.
Fix that by freeing the bitmap before freeing the entry.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 97713b1a2ced1e4a2a6c40045903797ebd44d7e0 ]
[BUG]
For subpage + zoned case, the following workload can lead to rsv data
leak at unmount time:
# mkfs.btrfs -f -s 4k $dev
# mount $dev $mnt
# fsstress -w -n 8 -d $mnt -s 1709539240
0/0: fiemap - no filename
0/1: copyrange read - no filename
0/2: write - no filename
0/3: rename - no source filename
0/4: creat f0 x:0 0 0
0/4: creat add id=0,parent=-1
0/5: writev f0[259 1 0 0 0 0] [778052,113,965] 0
0/6: ioctl(FIEMAP) f0[259 1 0 0 224 887097] [1294220,2291618343991484791,0x10000] -1
0/7: dwrite - xfsctl(XFS_IOC_DIOINFO) f0[259 1 0 0 224 887097] return 25, fallback to stat()
0/7: dwrite f0[259 1 0 0 224 887097] [696320,102400] 0
# umount $mnt
The dmesg includes the following rsv leak detection warning (all call
trace skipped):
------------[ cut here ]------------
WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8653 btrfs_destroy_inode+0x1e0/0x200 [btrfs]
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8654 btrfs_destroy_inode+0x1a8/0x200 [btrfs]
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8660 btrfs_destroy_inode+0x1a0/0x200 [btrfs]
---[ end trace 0000000000000000 ]---
BTRFS info (device sda): last unmount of filesystem 1b4abba9-de34-4f07-9e7f-157cf12a18d6
------------[ cut here ]------------
WARNING: CPU: 3 PID: 4528 at fs/btrfs/block-group.c:4434 btrfs_free_block_groups+0x338/0x500 [btrfs]
---[ end trace 0000000000000000 ]---
BTRFS info (device sda): space_info DATA has 268218368 free, is not full
BTRFS info (device sda): space_info total=268435456, used=204800, pinned=0, reserved=0, may_use=12288, readonly=0 zone_unusable=0
BTRFS info (device sda): global_block_rsv: size 0 reserved 0
BTRFS info (device sda): trans_block_rsv: size 0 reserved 0
BTRFS info (device sda): chunk_block_rsv: size 0 reserved 0
BTRFS info (device sda): delayed_block_rsv: size 0 reserved 0
BTRFS info (device sda): delayed_refs_rsv: size 0 reserved 0
------------[ cut here ]------------
WARNING: CPU: 3 PID: 4528 at fs/btrfs/block-group.c:4434 btrfs_free_block_groups+0x338/0x500 [btrfs]
---[ end trace 0000000000000000 ]---
BTRFS info (device sda): space_info METADATA has 267796480 free, is not full
BTRFS info (device sda): space_info total=268435456, used=131072, pinned=0, reserved=0, may_use=262144, readonly=0 zone_unusable=245760
BTRFS info (device sda): global_block_rsv: size 0 reserved 0
BTRFS info (device sda): trans_block_rsv: size 0 reserved 0
BTRFS info (device sda): chunk_block_rsv: size 0 reserved 0
BTRFS info (device sda): delayed_block_rsv: size 0 reserved 0
BTRFS info (device sda): delayed_refs_rsv: size 0 reserved 0
Above $dev is a tcmu-runner emulated zoned HDD, which has a max zone
append size of 64K, and the system has 64K page size.
[CAUSE]
I have added several trace_printk() to show the events (header skipped):
> btrfs_dirty_pages: r/i=5/259 dirty start=774144 len=114688
> btrfs_dirty_pages: r/i=5/259 dirty part of page=720896 off_in_page=53248 len_in_page=12288
> btrfs_dirty_pages: r/i=5/259 dirty part of page=786432 off_in_page=0 len_in_page=65536
> btrfs_dirty_pages: r/i=5/259 dirty part of page=851968 off_in_page=0 len_in_page=36864
The above lines show our buffered write has dirtied 3 pages of inode
259 of root 5:
704K 768K 832K 896K
I |////I/////////////////I///////////| I
756K 868K
|///| is the dirtied range using subpage bitmaps. and 'I' is the page
boundary.
Meanwhile all three pages (704K, 768K, 832K) have their PageDirty
flag set.
> btrfs_direct_write: r/i=5/259 start dio filepos=696320 len=102400
Then direct IO write starts, since the range [680K, 780K) covers the
beginning part of the above dirty range, we need to writeback the
two pages at 704K and 768K.
> cow_file_range: r/i=5/259 add ordered extent filepos=774144 len=65536
> extent_write_locked_range: r/i=5/259 locked page=720896 start=774144 len=65536
Now the above 2 lines show that we're writing back for dirty range
[756K, 756K + 64K).
We only writeback 64K because the zoned device has max zone append size
as 64K.
> extent_write_locked_range: r/i=5/259 clear dirty for page=786432
!!! The above line shows the root cause. !!!
We're calling clear_page_dirty_for_io() inside extent_write_locked_range(),
for the page 768K.
This is because extent_write_locked_range() can go beyond the current
locked page, here we hit the page at 768K and clear its page dirt.
In fact this would lead to the desync between subpage dirty and page
dirty flags. We have the page dirty flag cleared, but the subpage range
[820K, 832K) is still dirty.
After the writeback of range [756K, 820K), the dirty flags look like
this, as page 768K no longer has dirty flag set.
704K 768K 832K 896K
I I | I/////////////| I
820K 868K
This means we will no longer writeback range [820K, 832K), thus the
reserved data/metadata space would never be properly released.
> extent_write_cache_pages: r/i=5/259 skip non-dirty folio=786432
Now even though we try to start writeback for page 768K, since the
page is not dirty, we completely skip it at extent_write_cache_pages()
time.
> btrfs_direct_write: r/i=5/259 dio done filepos=696320 len=0
Now the direct IO finished.
> cow_file_range: r/i=5/259 add ordered extent filepos=851968 len=36864
> extent_write_locked_range: r/i=5/259 locked page=851968 start=851968 len=36864
Now we writeback the remaining dirty range, which is [832K, 868K).
Causing the range [820K, 832K) never to be submitted, thus leaking the
reserved space.
This bug only affects subpage and zoned case. For non-subpage and zoned
case, we have exactly one sector for each page, thus no such partial dirty
cases.
For subpage and non-zoned case, we never go into run_delalloc_cow(), and
normally all the dirty subpage ranges would be properly submitted inside
__extent_writepage_io().
[FIX]
Just do not clear the page dirty at all inside extent_write_locked_range().
As __extent_writepage_io() would do a more accurate, subpage compatible
clear for page and subpage dirty flags anyway.
Now the correct trace would look like this:
> btrfs_dirty_pages: r/i=5/259 dirty start=774144 len=114688
> btrfs_dirty_pages: r/i=5/259 dirty part of page=720896 off_in_page=53248 len_in_page=12288
> btrfs_dirty_pages: r/i=5/259 dirty part of page=786432 off_in_page=0 len_in_page=65536
> btrfs_dirty_pages: r/i=5/259 dirty part of page=851968 off_in_page=0 len_in_page=36864
The page dirty part is still the same 3 pages.
> btrfs_direct_write: r/i=5/259 start dio filepos=696320 len=102400
> cow_file_range: r/i=5/259 add ordered extent filepos=774144 len=65536
> extent_write_locked_range: r/i=5/259 locked page=720896 start=774144 len=65536
And the writeback for the first 64K is still correct.
> cow_file_range: r/i=5/259 add ordered extent filepos=839680 len=49152
> extent_write_locked_range: r/i=5/259 locked page=786432 start=839680 len=49152
Now with the fix, we can properly writeback the range [820K, 832K), and
properly release the reserved data/metadata space.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3c466d6537b99f801b3f68af3d8124d4312437a0 ]
On sa8775p-ride-r3 the RX clocks from the AQR115C PHY are not available at
the time of the DMA reset. We can however extract the RX clock from the
internal SERDES block. Once the link is up, we can revert to the
previous state.
The AQR115C PHY doesn't support in-band signalling so we can count on
getting the link up notification and safely reuse existing callbacks
which are already used by another HW quirk workaround which enables the
functional clock to avoid a DMA reset due to timeout.
Only enable loopback on revision 3 of the board - check the phy_mode to
make sure.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240703181500.28491-3-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3a0a88fcbaf9e027ecca3fe8775be9700b4d6460 ]
This patch updates the workaround for a problem similar to erratum
DS80000789E 6 of the mcp2518fd, the other variants of the chip
family (mcp2517fd and mcp251863) are probably also affected.
Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA
register for an RX FIFO may be corrupted". However observation shows
that this problem is not limited to RX FIFOs but also effects the TEF
FIFO.
In the bad case, the driver reads a too large head index. As the FIFO
is implemented as a ring buffer, this results in re-handling old CAN
transmit complete events.
Every transmit complete event contains with a sequence number that
equals to the sequence number of the corresponding TX request. This
way old TX complete events can be detected.
If the original driver detects a non matching sequence number, it
prints an info message and tries again later. As wrong sequence
numbers can be explained by the erratum DS80000789E 6, demote the info
message to debug level, streamline the code and update the comments.
Keep the behavior: If an old CAN TX complete event is detected, abort
the iteration and mark the number of valid CAN TX complete events as
processed in the chip by incrementing the FIFO's tail index.
Cc: Stefan Althöfer <Stefan.Althoefer@janztec.com>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b8e0ddd36ce9536ad7478dd27df06c9ae92370ba ]
This is a preparatory patch to work around a problem similar to
erratum DS80000789E 6 of the mcp2518fd, the other variants of the chip
family (mcp2517fd and mcp251863) are probably also affected.
Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA
register for an RX FIFO may be corrupted". However observation shows
that this problem is not limited to RX FIFOs but also effects the TEF
FIFO.
When handling the TEF interrupt, the driver reads the FIFO header
index from the TEF FIFO STA register of the chip.
In the bad case, the driver reads a too large head index. In the
original code, the driver always trusted the read value, which caused
old CAN transmit complete events that were already processed to be
re-processed.
Instead of reading and trusting the head index, read the head index
and calculate the number of CAN frames that were supposedly received -
replace mcp251xfd_tef_ring_update() with mcp251xfd_get_tef_len().
The mcp251xfd_handle_tefif() function reads the CAN transmit complete
events from the chip, iterates over them and pushes them into the
network stack. The original driver already contains code to detect old
CAN transmit complete events, that will be updated in the next patch.
Cc: Stefan Althöfer <Stefan.Althoefer@janztec.com>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56e69e59751d20993f243fb7dd6991c4e522424c ]
An overflow may occur if the function is called with the last
block and an offset greater than zero. It is necessary to add
a check to avoid this.
Found by Linux Verification Center (linuxtesting.org) with Svace.
[JK: Make test cover also unalloc table freeing]
Link: https://patch.msgid.link/20240620072413.7448-1-r.smirnov@omp.ru
Suggested-by: Jan Kara <jack@suse.com>
Signed-off-by: Roman Smirnov <r.smirnov@omp.ru>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1ca27e0c8c13ac50a4acf9cdf77069e2d94a547d ]
When a SOCK_(STREAM|SEQPACKET) socket connect()s to another one, we need
to lock the two sockets to check their states in unix_stream_connect().
We use unix_state_lock() for the server and unix_state_lock_nested() for
client with tricky sk->sk_state check to avoid deadlock.
The possible deadlock scenario are the following:
1) Self connect()
2) Simultaneous connect()
The former is simple, attempt to grab the same lock, and the latter is
AB-BA deadlock.
After the server's unix_state_lock(), we check the server socket's state,
and if it's not TCP_LISTEN, connect() fails with -EINVAL.
Then, we avoid the former deadlock by checking the client's state before
unix_state_lock_nested(). If its state is not TCP_LISTEN, we can make
sure that the client and the server are not identical based on the state.
Also, the latter deadlock can be avoided in the same way. Due to the
server sk->sk_state requirement, AB-BA deadlock could happen only with
TCP_LISTEN sockets. So, if the client's state is TCP_LISTEN, we can
give up the second lock to avoid the deadlock.
CPU 1 CPU 2 CPU 3
connect(A -> B) connect(B -> A) listen(A)
--- --- ---
unix_state_lock(B)
B->sk_state == TCP_LISTEN
READ_ONCE(A->sk_state) == TCP_CLOSE
^^^^^^^^^
ok, will lock A unix_state_lock(A)
.--------------' WRITE_ONCE(A->sk_state, TCP_LISTEN)
| unix_state_unlock(A)
|
| unix_state_lock(A)
| A->sk_sk_state == TCP_LISTEN
| READ_ONCE(B->sk_state) == TCP_LISTEN
v ^^^^^^^^^^
unix_state_lock_nested(A) Don't lock B !!
Currently, while checking the client's state, we also check if it's
TCP_ESTABLISHED, but this is unlikely and can be checked after we know
the state is not TCP_CLOSE.
Moreover, if it happens after the second lock, we now jump to the restart
label, but it's unlikely that the server is not found during the retry,
so the jump is mostly to revist the client state check.
Let's remove the retry logic and check the state against TCP_CLOSE first.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fba8334721e266f92079632598e46e5f89082f30 ]
When all the strides in a WQE have been consumed, the WQE is unlinked
from the WQ linked list (mlx5_wq_ll_pop()). For SHAMPO, it is possible
to receive CQEs with 0 consumed strides for the same WQE even after the
WQE is fully consumed and unlinked. This triggers an additional unlink
for the same wqe which corrupts the linked list.
Fix this scenario by accepting 0 sized consumed strides without
unlinking the WQE again.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>