[ Upstream commit 21225873be ]
For ENETC hardware, the TCs are numbered from 0 to N-1, where N
is the number of TCs. Numerically higher TC has higher priority.
It's obvious that the highest priority TC index should be N-1 and
the 2nd highest priority TC index should be N-2.
However, the previous logic uses netdev_get_prio_tc_map() to get
the indexes of highest priority and 2nd highest priority TCs, it
does not make sense and is incorrect to give a "tc" argument to
netdev_get_prio_tc_map(). So the driver may get the wrong indexes
of the two highest priotiry TCs which would lead to failed to set
the CBS for the two highest priotiry TCs.
e.g.
$ tc qdisc add dev eno0 parent root handle 100: mqprio num_tc 6 \
map 0 0 1 1 2 3 4 5 queues 1@0 1@1 1@2 1@3 2@4 2@6 hw 1
$ tc qdisc replace dev eno0 parent 100:6 cbs idleslope 100000 \
sendslope -900000 hicredit 12 locredit -113 offload 1
$ Error: Specified device failed to setup cbs hardware offload.
^^^^^
In this example, the previous logic deems the indexes of the two
highest priotiry TCs should be 3 and 2. Actually, the indexes are
5 and 4, because the number of TCs is 6. So it would be failed to
configure the CBS for the two highest priority TCs.
Fixes: c431047c4e ("enetc: add support Credit Based Shaper(CBS) for hardware offload")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1240eb93f0 ]
In case of error when adding a new rule that refers to an anonymous set,
deactivate expressions via NFT_TRANS_PREPARE state, not NFT_TRANS_RELEASE.
Thus, the lookup expression marks anonymous sets as inactive in the next
generation to ensure it is not reachable in this transaction anymore and
decrement the set refcount as introduced by c1592a8994 ("netfilter:
nf_tables: deactivate anonymous set from preparation phase"). The abort
step takes care of undoing the anonymous set.
This is also consistent with rule deletion, where NFT_TRANS_PREPARE is
used. Note that this error path is exercised in the preparation step of
the commit protocol. This patch replaces nf_tables_rule_release() by the
deactivate and destroy calls, this time with NFT_TRANS_PREPARE.
Due to this incorrect error handling, it is possible to access a
dangling pointer to the anonymous set that remains in the transaction
list.
[1009.379054] BUG: KASAN: use-after-free in nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379106] Read of size 8 at addr ffff88816c4c8020 by task nft-rule-add/137110
[1009.379116] CPU: 7 PID: 137110 Comm: nft-rule-add Not tainted 6.4.0-rc4+ #256
[1009.379128] Call Trace:
[1009.379132] <TASK>
[1009.379135] dump_stack_lvl+0x33/0x50
[1009.379146] ? nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379191] print_address_description.constprop.0+0x27/0x300
[1009.379201] kasan_report+0x107/0x120
[1009.379210] ? nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379255] nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379302] nft_lookup_init+0xa5/0x270 [nf_tables]
[1009.379350] nf_tables_newrule+0x698/0xe50 [nf_tables]
[1009.379397] ? nf_tables_rule_release+0xe0/0xe0 [nf_tables]
[1009.379441] ? kasan_unpoison+0x23/0x50
[1009.379450] nfnetlink_rcv_batch+0x97c/0xd90 [nfnetlink]
[1009.379470] ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink]
[1009.379485] ? __alloc_skb+0xb8/0x1e0
[1009.379493] ? __alloc_skb+0xb8/0x1e0
[1009.379502] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
[1009.379509] ? unwind_get_return_address+0x2a/0x40
[1009.379517] ? write_profile+0xc0/0xc0
[1009.379524] ? avc_lookup+0x8f/0xc0
[1009.379532] ? __rcu_read_unlock+0x43/0x60
Fixes: 958bee14d0 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a1a64a151d ]
If caller reports ENOMEM, then stop iterating over the batch and send a
single netlink message to userspace to report OOM.
Fixes: cbb8125eb4 ("netfilter: nfnetlink: deliver netlink errors on batch completion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 212ed75dc5 ]
The pipapo set backend follows copy-on-update approach, maintaining one
clone of the existing datastructure that is being updated. The clone
and current datastructures are swapped via rcu from the commit step.
The existing integration with the commit protocol is flawed because
there is no operation to clean up the clone if the transaction is
aborted. Moreover, the datastructure swap happens on set element
activation.
This patch adds two new operations for sets: commit and abort, these new
operations are invoked from the commit and abort steps, after the
transactions have been digested, and it updates the pipapo set backend
to use it.
This patch adds a new ->pending_update field to sets to maintain a list
of sets that require this new commit and abort operations.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c5c31fb71f ]
The DSPI controller has configurable timing for
(a) tCSC: the interval between the assertion of the chip select and the
first clock edge
(b) tASC: the interval between the last clock edge and the deassertion
of the chip select
What is a bit surprising, but is documented in the figure "Example of
continuous transfer (CPHA=1, CONT=1)" in the datasheet, is that when the
chip select stays asserted between multiple TX FIFO writes, the tCSC and
tASC times still apply. With CONT=1, chip select remains asserted, but
SCK takes a break and goes to the idle state for tASC + tCSC ns.
In other words, the default values (of 0 and 0 ns) result in SCK
glitches where the SCK transition to the idle state, as well as the SCK
transition from the idle state, will have no delay in between, and it
may appear that a SCK cycle has simply gone missing. The resulting
timing violation might cause data corruption in many peripherals, as
their chip select is asserted.
The driver has device tree bindings for tCSC ("fsl,spi-cs-sck-delay")
and tASC ("fsl,spi-sck-cs-delay"), but these are only specified to apply
when the chip select toggles in the first place, and this timing
characteristic depends on each peripheral. Many peripherals do not have
explicit timing requirements, so many device trees do not have these
properties present at all.
Nonetheless, the lack of SCK glitches is a common sense requirement, and
since the SCK stays in the idle state during transfers for tCSC+tASC ns,
and that in itself should look like half a cycle, then let's ensure that
tCSC and tASC are at least a quarter of a SCK period, such that their
sum is at least half of one.
Fixes: 95bf15f386 ("spi: fsl-dspi: Add ~50ns delay between cs and sck")
Reported-by: Lisa Chen (陈敏捷) <minjie.chen@geekplus.com>
Debugged-by: Lisa Chen (陈敏捷) <minjie.chen@geekplus.com>
Tested-by: Lisa Chen (陈敏捷) <minjie.chen@geekplus.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230529223402.1199503-1-vladimir.oltean@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b00683422f ]
There is a reference count error in error path code and a potential race
in check_rkey() in rxe_resp.c. When looking up the rkey for a memory
window the reference to the mw from rxe_lookup_mw() is dropped before a
reference is taken on the mr referenced by the mw. If the mr is destroyed
immediately after the call to rxe_put(mw) the mr pointer is unprotected
and may end up pointing at freed memory. The rxe_get(mr) call should take
place before the rxe_put(mw) call.
All errors in check_rkey() call rxe_put(mw) if mw is not NULL but it was
already called after the above. The mw pointer should be set to NULL after
the rxe_put(mw) call to prevent this from happening.
Fixes: cdd0b85675 ("RDMA/rxe: Implement memory access through MWs")
Link: https://lore.kernel.org/r/20230517211509.1819998-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9a3763e873 ]
In rxe_net.c a received packet, from udp or loopback, is passed to
rxe_rcv() in rxe_recv.c as a udp packet. I.e. skb->data is pointing at the
udp header. But rxe_rcv() makes length checks to verify the packet is long
enough to hold the roce headers as if it were a roce
packet. I.e. skb->data pointing at the bth header. A runt packet would
appear to have 8 more bytes than it actually does which may lead to
incorrect behavior.
This patch calls skb_pull() to adjust the skb to point at the bth header
before calling rxe_rcv() which fixes this error.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20230517172242.1806340-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c29c8c7df ]
In current design:
1. PD and clt_path->s.dev are shared among connections.
2. every con[n]'s cleanup phase will call destroy_con_cq_qp()
3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and
when clt_path->s.dev become zero, it will destroy PD.
4. when con[1] failed to create, con[1] will not take clt_path->s.dev,
but it try to decreased clt_path->s.dev
So, in case create_cm(con[0]) succeeds but create_cm(con[1]) fails,
destroy_con_cq_qp(con[1]) will be called first which will destroy the PD
while this PD is still taken by con[0].
Here, we refactor the error path of create_cm() and init_conns(), so that
we do the cleanup in the order they are created.
The warning occurs when destroying RXE PD whose reference count is not
zero.
rnbd_client L597: Mapping device /dev/nvme0n1 on session client, (access_mode: rw, nr_poll_queues: 0)
------------[ cut here ]------------
WARNING: CPU: 0 PID: 26407 at drivers/infiniband/sw/rxe/rxe_pool.c:256 __rxe_cleanup+0x13a/0x170 [rdma_rxe]
Modules linked in: rpcrdma rdma_ucm ib_iser rnbd_client libiscsi rtrs_client scsi_transport_iscsi rtrs_core rdma_cm iw_cm ib_cm crc32_generic rdma_rxe udp_tunnel ib_uverbs ib_core kmem device_dax nd_pmem dax_pmem nd_vme crc32c_intel fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 26407 Comm: rnbd-client.sh Kdump: loaded Not tainted 6.2.0-rc6-roce-flush+ #53
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:__rxe_cleanup+0x13a/0x170 [rdma_rxe]
Code: 45 84 e4 0f 84 5a ff ff ff 48 89 ef e8 5f 18 71 f9 84 c0 75 90 be c8 00 00 00 48 89 ef e8 be 89 1f fa 85 c0 0f 85 7b ff ff ff <0f> 0b 41 bc ea ff ff ff e9 71 ff ff ff e8 84 7f 1f fa e9 d0 fe ff
RSP: 0018:ffffb09880b6f5f0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff99401f15d6a8 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffffbac8234b RDI: 00000000ffffffff
RBP: ffff99401f15d6d0 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000002d82 R11: 0000000000000000 R12: 0000000000000001
R13: ffff994101eff208 R14: ffffb09880b6f6a0 R15: 00000000fffffe00
FS: 00007fe113904740(0000) GS:ffff99413bc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff6cde656c8 CR3: 000000001f108004 CR4: 00000000001706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
rxe_dealloc_pd+0x16/0x20 [rdma_rxe]
ib_dealloc_pd_user+0x4b/0x80 [ib_core]
rtrs_ib_dev_put+0x79/0xd0 [rtrs_core]
destroy_con_cq_qp+0x8a/0xa0 [rtrs_client]
init_path+0x1e7/0x9a0 [rtrs_client]
? __pfx_autoremove_wake_function+0x10/0x10
? lock_is_held_type+0xd7/0x130
? rcu_read_lock_sched_held+0x43/0x80
? pcpu_alloc+0x3dd/0x7d0
? rtrs_clt_init_stats+0x18/0x40 [rtrs_client]
rtrs_clt_open+0x24f/0x5a0 [rtrs_client]
? __pfx_rnbd_clt_link_ev+0x10/0x10 [rnbd_client]
rnbd_clt_map_device+0x6a5/0xe10 [rnbd_client]
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/1682384563-2-4-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Tested-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 00f8205ffc upstream.
Consider a scenario where cable disconnect happens when there is an active
usb reqest queued to the UDC. As part of the disconnect we would issue an
end transfer with no interrupt-on-completion before giving back this
request. Since we are giving back the request without skipping TRBs the
num_trbs field of dwc3_request still holds the stale value previously used.
Function drivers re-use same request for a given bind-unbind session and
hence their dwc3_request context gets preserved across cable
disconnect/connect. When such a request gets re-queued after cable connect,
we would increase the num_trbs field on top of the previous stale value
thus incorrectly representing the number of TRBs used. Fix this by
resetting num_trbs field before giving back the request.
Fixes: 09fe1f8d7e ("usb: dwc3: gadget: track number of TRBs per request")
Cc: stable <stable@kernel.org>
Signed-off-by: Elson Roy Serrao <quic_eserrao@quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Message-ID: <1685654850-8468-1-git-send-email-quic_eserrao@quicinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d2d6935422 upstream.
The Qualcomm dwc3 glue driver is currently accessing the driver data of
the child core device during suspend and on wakeup interrupts. This is
clearly a bad idea as the child may not have probed yet or could have
been unbound from its driver.
The first such layering violation was part of the initial version of the
driver, but this was later made worse when the hack that accesses the
driver data of the grand child xhci device to configure the wakeup
interrupts was added.
Fixing this properly is not that easily done, so add a sanity check to
make sure that the child driver data is non-NULL before dereferencing it
for now.
Note that this relies on subtleties like the fact that driver core is
making sure that the parent is not suspended while the child is probing.
Reported-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/all/20230325165217.31069-4-manivannan.sadhasivam@linaro.org/
Fixes: d9152161b4 ("usb: dwc3: Add Qualcomm DWC3 glue layer driver")
Fixes: 6895ea55c3 ("usb: dwc3: qcom: Configure wakeup interrupts during suspend")
Cc: stable@vger.kernel.org # 3.18: a872ab303d: "usb: dwc3: qcom: fix use-after-free on runtime-PM wakeup"
Cc: Sandeep Maheswaram <quic_c_sanm@quicinc.com>
Cc: Krishna Kurapati <quic_kriskura@quicinc.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Message-ID: <20230607100540.31045-2-johan+linaro@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 286d9975a8 upstream.
usb_udc_connect_control(), soft_connect_store() and
usb_gadget_deactivate() can potentially race against each other to invoke
usb_gadget_connect()/usb_gadget_disconnect(). To prevent this, guard
udc->started, gadget->allow_connect, gadget->deactivate and
gadget->connect with connect_lock so that ->pullup() is only invoked when
the gadget is bound, started and not deactivated. The routines
usb_gadget_connect_locked(), usb_gadget_disconnect_locked(),
usb_udc_connect_control_locked(), usb_gadget_udc_start_locked(),
usb_gadget_udc_stop_locked() are called with this lock held.
An earlier version of this commit was reverted due to the crash reported in
https://lore.kernel.org/all/ZF4BvgsOyoKxdPFF@francesco-nb.int.toradex.com/.
commit 16737e78d190 ("usb: gadget: udc: core: Offload usb_udc_vbus_handler processing")
addresses the crash reported.
Cc: stable@vger.kernel.org
Fixes: 628ef0d273 ("usb: udc: add usb_udc_vbus_handler")
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Message-ID: <20230609010227.978661-2-badhri@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50966da807 upstream.
usb_udc_vbus_handler() can be invoked from interrupt context by irq
handlers of the gadget drivers, however, usb_udc_connect_control() has
to run in non-atomic context due to the following:
a. Some of the gadget driver implementations expect the ->pullup
callback to be invoked in non-atomic context.
b. usb_gadget_disconnect() acquires udc_lock which is a mutex.
Hence offload invocation of usb_udc_connect_control()
to workqueue.
UDC should not be pulled up unless gadget driver is bound. The new flag
"allow_connect" is now set by gadget_bind_driver() and cleared by
gadget_unbind_driver(). This prevents work item to pull up the gadget
even if queued when the gadget driver is already unbound.
Cc: stable@vger.kernel.org
Fixes: 1016fc0c09 ("USB: gadget: Fix obscure lockdep violation for udc_mutex")
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Message-ID: <20230609010227.978661-1-badhri@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 70c2e03e9a upstream.
Both tb_xdomain_enable_paths() and tb_xdomain_disable_paths() expect -1,
not 0, if the corresponding ring is not needed. For this reason change
the driver to use correct value for the rings that are not needed.
Fixes: 180b068942 ("thunderbolt: Allow multiple DMA tunnels over a single XDomain connection")
Cc: stable@vger.kernel.org
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3fe95742af upstream.
If the boot firmware has already established tunnels, especially ones
that have special requirements from the link such as DisplayPort, we
should not blindly enable CL states (nor change the TMU configuration).
Otherwise the existing tunnels may not work as expected.
For this reason, skip the CL state enabling when we go over the existing
topology. This will also keep the TMU settings untouched because we do
not change the TMU configuration when CL states are not enabled.
Reported-by: Koba Ko <koba.ko@canonical.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7831
Cc: stable@vger.kernel.org # v6.0+
Acked-By: Yehezkel Bernat <YehezkelShB@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3eb1a3a040 upstream.
The VBIOS image update flow requires userspace to:
1) Write the image to `psp_vbflash`
2) Read `psp_vbflash`
3) Poll `psp_vbflash_status` to check for completion
If userspace reads `psp_vbflash` before writing an image, it's
possible that it causes problems that can put the dGPU into an invalid
state.
Explicitly check that an image has been written before letting a read
succeed.
Cc: stable@vger.kernel.org
Fixes: 8424f2ccb3 ("drm/amdgpu/psp: Add vbflash sysfs interface support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c8a5d5ea3b upstream.
This seems to have existed for ever but is now more apparant after
commit 9bff18d134 ("drm/ttm: use per BO cleanup workers")
My analysis: two threads are running, one in the irq signalling the
fence, in dma_fence_signal_timestamp_locked, it has done the
DMA_FENCE_FLAG_SIGNALLED_BIT setting, but hasn't yet reached the
callbacks.
The second thread in nouveau_cli_work_ready, where it sees the fence is
signalled, so then puts the fence, cleanups the object and frees the
work item, which contains the callback.
Thread one goes again and tries to call the callback and causes the
use-after-free.
Proposed fix: lock the fence signalled check in nouveau_cli_work_ready,
so either the callbacks are done or the memory is freed.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Fixes: 11e451e740 ("drm/nouveau: remove fence wait code from deferred client work handler")
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/dri-devel/20230615024008.1600281-1-airlied@gmail.com/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 722d908223 upstream.
issue_discard() passes GFP_NOWAIT to __blkdev_issue_discard() despite
its code assuming bio_alloc() always succeeds.
Commit 3dba53a958 ("dm thin: use __blkdev_issue_discard for async
discard support") clearly shows where things went bad:
Before commit 3dba53a958, dm-thin.c's open-coded
__blkdev_issue_discard_async() properly handled using GFP_NOWAIT.
Unfortunately __blkdev_issue_discard() doesn't and it was missed
during review.
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cb65b282c9 upstream.
Must check pmd->fail_io before using pmd->data_sm since
pmd->data_sm may be destroyed by other processes.
P1(kworker) P2(message)
do_worker
process_prepared
process_prepared_discard_passdown_pt2
dm_pool_dec_data_range
pool_message
commit
dm_pool_commit_metadata
↓
// commit failed
metadata_operation_failed
abort_transaction
dm_pool_abort_metadata
__open_or_format_metadata
↓
dm_sm_disk_open
↓
// open failed
// pmd->data_sm is NULL
dm_sm_dec_blocks
↓
// try to access pmd->data_sm --> UAF
As shown above, if dm_pool_commit_metadata() and
dm_pool_abort_metadata() fail in pool_message process, kworker may
trigger UAF.
Fixes: be500ed721 ("dm space maps: improve performance with inc/dec on ranges of blocks")
Cc: stable@vger.kernel.org
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 122e2cb7e1 upstream.
This commit adds new DEVICE_FLG with QUIRK_FLAG_DSD_RAW and Vendor Id for
HEM devices which supports native DSD. Prior to this change Linux kernel
was not enabling native DSD playback for HEM devices, and as a result,
DSD audio was being converted to PCM "on the fly". HEM devices,
when connected to the system, would only play audio in PCM format,
even if the source material was in DSD format. With the addition of new
VENDOR_FLG in the quircks.c file, the devices are now correctly
recognized, and raw DSD data is transmitted to the device,
allowing for native DSD playback.
Signed-off-by: Lukasz Tyl <ltyl@hem-e.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230614122524.30271-1-ltyl@hem-e.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ba61c9f6c upstream.
As reported in the bugzilla below, the PM resume of a UAC3 device may
fail due to the incomplete power state change, stuck at D1. The
reason is that the driver expects the full D0 power state change only
at hw_params, while the normal PCM resume procedure doesn't call
hw_params.
For fixing the bug, we add the same power state update to D0 at the
prepare callback, which is certainly called by the resume procedure.
Note that, with this change, the power state change in the hw_params
becomes almost redundant, since snd_usb_hw_params() doesn't touch the
parameters (at least it tires so). But dropping it is still a bit
risky (e.g. we have the media-driver binding), so I leave the D0 power
state change in snd_usb_hw_params() as is for now.
Fixes: a0a4959eb4 ("ALSA: usb-audio: Operate UAC3 Power Domains in PCM callbacks")
Cc: <stable@vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217539
Link: https://lore.kernel.org/r/20230612132818.29486-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit deccae40e4 upstream.
Commit 619104ba45 ("btrfs: move common NOCOW checks against a file
extent into a helper") changed our call to btrfs_cross_ref_exist() to
always pass false for the 'strict' parameter. We're passing this down
through the stack so that we can do a full check for cross references
during swapfile activation.
With strict always false, this test fails:
btrfs subvol create swappy
chattr +C swappy
fallocate -l1G swappy/swapfile
chmod 600 swappy/swapfile
mkswap swappy/swapfile
btrfs subvol snap swappy swapsnap
btrfs subvol del -C swapsnap
btrfs fi sync /
sync;sync;sync
swapon swappy/swapfile
The fix is to just use args->strict, and everyone except swapfile
activation is passing false.
Fixes: 619104ba45 ("btrfs: move common NOCOW checks against a file extent into a helper")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7833b86595 upstream.
can_nocow_extent can reduce the len passed in, which needs to be
propagated to btrfs_dio_iomap_begin so that iomap does not submit
more data then is mapped.
This problems exists since the btrfs_get_blocks_direct helper was added
in commit c5794e5178 ("btrfs: Factor out write portion of
btrfs_get_blocks_direct"), but the ordered_extent splitting added in
commit b73a6fd1b1 ("btrfs: split partial dio bios before submit")
added a WARN_ON that made a syzkaller test fail.
Reported-by: syzbot+ee90502d5c8fd1d0dd93@syzkaller.appspotmail.com
Fixes: c5794e5178 ("btrfs: Factor out write portion of btrfs_get_blocks_direct")
CC: stable@vger.kernel.org # 6.1+
Tested-by: syzbot+ee90502d5c8fd1d0dd93@syzkaller.appspotmail.com
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 745806fb45 upstream.
[BUG]
Syzbot reports a reproducible ASSERT() when using rescue=usebackuproot
mount option on a corrupted fs.
The full report can be found here:
https://syzkaller.appspot.com/bug?extid=c4614eae20a166c25bf0
BTRFS error (device loop0: state C): failed to load root csum
assertion failed: !tmp, in fs/btrfs/disk-io.c:1103
------------[ cut here ]------------
kernel BUG at fs/btrfs/ctree.h:3664!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 3608 Comm: syz-executor356 Not tainted 6.0.0-rc7-syzkaller-00029-g3800a713b607 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
RIP: 0010:assertfail+0x1a/0x1c fs/btrfs/ctree.h:3663
RSP: 0018:ffffc90003aaf250 EFLAGS: 00010246
RAX: 0000000000000032 RBX: 0000000000000000 RCX: f21c13f886638400
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffff888021c640a0 R08: ffffffff816bd38d R09: ffffed10173667f1
R10: ffffed10173667f1 R11: 1ffff110173667f0 R12: dffffc0000000000
R13: ffff8880229c21f7 R14: ffff888021c64060 R15: ffff8880226c0000
FS: 0000555556a73300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055a2637d7a00 CR3: 00000000709c4000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
btrfs_global_root_insert+0x1a7/0x1b0 fs/btrfs/disk-io.c:1103
load_global_roots_objectid+0x482/0x8c0 fs/btrfs/disk-io.c:2467
load_global_roots fs/btrfs/disk-io.c:2501 [inline]
btrfs_read_roots fs/btrfs/disk-io.c:2528 [inline]
init_tree_roots+0xccb/0x203c fs/btrfs/disk-io.c:2939
open_ctree+0x1e53/0x33df fs/btrfs/disk-io.c:3574
btrfs_fill_super+0x1c6/0x2d0 fs/btrfs/super.c:1456
btrfs_mount_root+0x885/0x9a0 fs/btrfs/super.c:1824
legacy_get_tree+0xea/0x180 fs/fs_context.c:610
vfs_get_tree+0x88/0x270 fs/super.c:1530
fc_mount fs/namespace.c:1043 [inline]
vfs_kern_mount+0xc9/0x160 fs/namespace.c:1073
btrfs_mount+0x3d3/0xbb0 fs/btrfs/super.c:1884
[CAUSE]
Since the introduction of global roots, we handle
csum/extent/free-space-tree roots as global roots, even if no
extent-tree-v2 feature is enabled.
So for regular csum/extent/fst roots, we load them into
fs_info::global_root_tree rb tree.
And we should not expect any conflicts in that rb tree, thus we have an
ASSERT() inside btrfs_global_root_insert().
But rescue=usebackuproot can break the assumption, as we will try to
load those trees again and again as long as we have bad roots and have
backup roots slot remaining.
So in that case we can have conflicting roots in the rb tree, and
triggering the ASSERT() crash.
[FIX]
We can safely remove that ASSERT(), as the caller will properly put the
offending root.
To make further debugging easier, also add two explicit error messages:
- Error message for conflicting global roots
- Error message when using backup roots slot
Reported-by: syzbot+a694851c6ab28cbcfb9c@syzkaller.appspotmail.com
Fixes: abed4aaae4 ("btrfs: track the csum, extent, and free space trees in a rb tree")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cac9e4418f upstream.
If the application sets ->msg_control and we have to later retry this
command, or if it got queued with IOSQE_ASYNC to begin with, then we
need to retain the original msg_control value. This is due to the net
stack overwriting this field with an in-kernel pointer, to copy it
in. Hitting that path for the second time will now fail the copy from
user, as it's attempting to copy from a non-user address.
Cc: stable@vger.kernel.org # 5.10+
Link: https://github.com/axboe/liburing/issues/880
Reported-and-tested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92c5d1b860 upstream.
The current sanity check for nilfs2 geometry information lacks checks for
the number of segments stored in superblocks, so even for device images
that have been destructively truncated or have an unusually high number of
segments, the mount operation may succeed.
This causes out-of-bounds block I/O on file system block reads or log
writes to the segments, the latter in particular causing
"a_ops->writepages" to repeatedly fail, resulting in sync_inodes_sb() to
hang.
Fix this issue by checking the number of segments stored in the superblock
and avoiding mounting devices that can cause out-of-bounds accesses. To
eliminate the possibility of overflow when calculating the number of
blocks required for the device from the number of segments, this also adds
a helper function to calculate the upper bound on the number of segments
and inserts a check using it.
Link: https://lkml.kernel.org/r/20230526021332.3431-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+7d50f1e54a12ba3aeae2@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=7d50f1e54a12ba3aeae2
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fee5eaecca upstream.
Syzbot reports that in its stress test for resize ioctl, the log writing
function nilfs_segctor_do_construct hits a WARN_ON in
nilfs_segctor_truncate_segments().
It turned out that there is a problem with the current implementation of
the resize ioctl, which changes the writable range on the device (the
range of allocatable segments) at the end of the resize process.
This order is necessary for file system expansion to avoid corrupting the
superblock at trailing edge. However, in the case of a file system
shrink, if log writes occur after truncating out-of-bounds trailing
segments and before the resize is complete, segments may be allocated from
the truncated space.
The userspace resize tool was fine as it limits the range of allocatable
segments before performing the resize, but it can run into this issue if
the resize ioctl is called alone.
Fix this issue by changing nilfs_sufile_resize() to update the range of
allocatable segments immediately after successful truncation of segment
space in case of file system shrink.
Link: https://lkml.kernel.org/r/20230524094348.3784-1-konishi.ryusuke@gmail.com
Fixes: 4e33f9eab0 ("nilfs2: implement resize ioctl")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+33494cd0df2ec2931851@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/0000000000005434c405fbbafdc5@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>