Clang static analysis reports this error
cdc-acm.c:409:3: warning: Use of memory after it is freed
acm_process_notification(acm, (unsigned char *)dr);
There are three problems, the first one is that dr is not reset
The variable dr is set with
if (acm->nb_index)
dr = (struct usb_cdc_notification *)acm->notification_buffer;
But if the notification_buffer is too small it is resized with
if (acm->nb_size) {
kfree(acm->notification_buffer);
acm->nb_size = 0;
}
alloc_size = roundup_pow_of_two(expected_size);
/*
* kmalloc ensures a valid notification_buffer after a
* use of kfree in case the previous allocation was too
* small. Final freeing is done on disconnect.
*/
acm->notification_buffer =
kmalloc(alloc_size, GFP_ATOMIC);
dr should point to the new acm->notification_buffer.
The second problem is any data in the notification_buffer is lost
when the pointer is freed. In the normal case, the current data
is accumulated in the notification_buffer here.
memcpy(&acm->notification_buffer[acm->nb_index],
urb->transfer_buffer, copy_size);
When a resize happens, anything before
notification_buffer[acm->nb_index] is garbage.
The third problem is the acm->nb_index is not reset on a
resizing buffer error.
So switch resizing to using krealloc and reassign dr and
reset nb_index.
Fixes: ea2583529c ("cdc-acm: reassemble fragmented notifications")
Signed-off-by: Tom Rix <trix@redhat.com>
Cc: stable <stable@vger.kernel.org>
Acked-by: Oliver Neukum <oneukum@suse.com>
Link: https://lore.kernel.org/r/20200801152154.20683-1-trix@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As a part of device shutdown the smmu driver will be
stopped and henceforth any IOVA address translation
will not be done. The wlan driver, being one of the
smmu driver consumer, should stop all the dma related
activity as a part of shutdown, and thereby ensuring
that no dma activity is done once the smmu driver
shuts down.
During the device shutdown, the smmu calls shutdown
for all its consumers in order to indicate them to
stop all their dma activities.
Register the shutdown handler to stop the wlan
driver and avoid any dma operations.
Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1
Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1593193981-30161-1-git-send-email-pillair@codeaurora.org
For QCA6390, normal power up and power down can't bring MHI
to a workable state. This happens especially in warm reboot
and rmmod and insmod. Host needs to write a few registers to
bring MHI to normal state.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.1.0.1-01238-QCAHKSWPL_SILICONZ-2
Signed-off-by: Carl Huang <cjhuang@codeauroro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1597555891-26112-10-git-send-email-kvalo@codeaurora.org
For QCA6390, wbm2sw1 is used for other purpose rather than
tx completion ring. So use TCL_DATA_RING 0 only for QCA6390.
Add MISC_CAPS_TCL_0_ONLY to control it.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.1.0.1-01238-QCAHKSWPL_SILICONZ-2
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1597555891-26112-9-git-send-email-kvalo@codeaurora.org
QCA6390 doesn't enable V2 map and ummap event, so the addr search
flags and type is different from IPQ8074. Assign correct search flags
and type for QCA6390.
Without this change, ping sometimes fails. With this change, now ping
is always successful.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.1.0.1-01238-QCAHKSWPL_SILICONZ-2
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1597555891-26112-7-git-send-email-kvalo@codeaurora.org
For QCA6390, it processes the reg chan list event only for phy0,
and it goes to fallback if the phy_id is not valid. For a valid
phy_id but not 0, just discard the event.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.1.0.1-01238-QCAHKSWPL_SILICONZ-2
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1597555891-26112-5-git-send-email-kvalo@codeaurora.org
For QCA6390, only one pdev is created and this pdev manages both lmacs, thus
both rxdmas. So host needs to initialize all rxdma related rings for one pdev.
Another difference is for QCA6390, host fills rxbuf to firmware and firmware
further fills the rxbuf to rxbuf ring for each rxdma.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.1.0.1-01238-QCAHKSWPL_SILICONZ-2
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1597555891-26112-4-git-send-email-kvalo@codeaurora.org
For QCA6390, it has 2 lmacs and thus 2 rxdmas. However, each rxdma has rxdma0
only, and doesn't have rxdma1. So for QCA6390, don't initialize rxdma1 related
rings such as rx_mon_buf_ring, rx_mon_dst_ring and rx_mon_desc_ring.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.1.0.1-01238-QCAHKSWPL_SILICONZ-2
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1597555891-26112-3-git-send-email-kvalo@codeaurora.org
QCA6390 uses MSI interrupt, so need to configure msi_add and
msi_data to dp srngs. As there are so many DP srngs, so need
to group them. Each group shares one MSI interrupt.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.1.0.1-01238-QCAHKSWPL_SILICONZ-2
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1597555891-26112-2-git-send-email-kvalo@codeaurora.org
For QCA6390, host puts hardware to Dual Band Simultaneous (DBS) mode by default
so both 2G and 5G bands can be used. Otherwise only the 5G band can be used.
QCA6390 doesn't provide band_to_mac configuration and firmware will do the
band_to_mac map.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.1.0.1-01238-QCAHKSWPL_SILICONZ-2
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1597576599-8857-12-git-send-email-kvalo@codeaurora.org
This macro is evil as it's accesses ab variable in a hidden way. It's better
for readibility to access ab->hw_params.ce_count directly.
This is done in a separate patch to keep the patches simple. No functional changes.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.1.0.1-01238-QCAHKSWPL_SILICONZ-2
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1597576599-8857-6-git-send-email-kvalo@codeaurora.org
This macro is evil as it's accesses ab variable in a hidden way. It's better
for readibility to access ab->hw_params.host_ce_config directly.
This is done in a separate patch to keep the patches simple. No functional changes.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.1.0.1-01238-QCAHKSWPL_SILICONZ-2
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1597576599-8857-5-git-send-email-kvalo@codeaurora.org
QCA6390 uses only 9 Copy Engines while IPQ8074 may use 12, make it possible to
change CE configuration dynamically via hw_params.
The defines for host_ce_config_wlan and CE_COUNT are temporary solutions, they
will be removed in the following patches to keep things simple.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.1.0.1-01238-QCAHKSWPL_SILICONZ-2
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1597576599-8857-4-git-send-email-kvalo@codeaurora.org
Now some of the HAL register macros access ab variable in a hidden way, make ab
variable visible in the macro by adding it as an argument.
This is done in a separate patch to keep the patches simple. No functional changes.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.1.0.1-01238-QCAHKSWPL_SILICONZ-2
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1597576599-8857-3-git-send-email-kvalo@codeaurora.org
There's no reason to have call for enable_pll_clk in ath10k_bmi_start(), move
it to ath10k_core_start() instead. This way it's possible to call
ath10k_bmi_start() from sdio.c during firmware dump creation. And also the
function call is more visible when it's in core.c.
No functional changes, compile tested only.
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1597421745-4329-1-git-send-email-kvalo@codeaurora.org
For a power9 KVM guest with XIVE enabled, running a test loop
where we hotplug 384 vcpus and then unplug them, the following traces
can be seen (generally within a few loops) either from the unplugged
vcpu:
cpu 65 (hwid 65) Ready to die...
Querying DEAD? cpu 66 (66) shows 2
list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:56!
Oops: Exception in kernel mode, sig: 5 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 ...
CPU: 66 PID: 0 Comm: swapper/66 Kdump: loaded Not tainted 4.18.0-221.el8.ppc64le #1
NIP: c0000000007ab50c LR: c0000000007ab508 CTR: 00000000000003ac
REGS: c0000009e5a17840 TRAP: 0700 Not tainted (4.18.0-221.el8.ppc64le)
MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28000842 XER: 20040000
...
NIP __list_del_entry_valid+0xac/0x100
LR __list_del_entry_valid+0xa8/0x100
Call Trace:
__list_del_entry_valid+0xa8/0x100 (unreliable)
free_pcppages_bulk+0x1f8/0x940
free_unref_page+0xd0/0x100
xive_spapr_cleanup_queue+0x148/0x1b0
xive_teardown_cpu+0x1bc/0x240
pseries_mach_cpu_die+0x78/0x2f0
cpu_die+0x48/0x70
arch_cpu_idle_dead+0x20/0x40
do_idle+0x2f4/0x4c0
cpu_startup_entry+0x38/0x40
start_secondary+0x7bc/0x8f0
start_secondary_prolog+0x10/0x14
or on the worker thread handling the unplug:
pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
Querying DEAD? cpu 314 (314) shows 2
BUG: Bad page state in process kworker/u768:3 pfn:95de1
cpu 314 (hwid 314) Ready to die...
page:c00a000002577840 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0
flags: 0x5ffffc00000000()
raw: 005ffffc00000000 5deadbeef0000100 5deadbeef0000200 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000
page dumped because: nonzero mapcount
Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ...
CPU: 0 PID: 548 Comm: kworker/u768:3 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1
Workqueue: pseries hotplug workque pseries_hp_work_fn
Call Trace:
dump_stack+0xb0/0xf4 (unreliable)
bad_page+0x12c/0x1b0
free_pcppages_bulk+0x5bc/0x940
page_alloc_cpu_dead+0x118/0x120
cpuhp_invoke_callback.constprop.5+0xb8/0x760
_cpu_down+0x188/0x340
cpu_down+0x5c/0xa0
cpu_subsys_offline+0x24/0x40
device_offline+0xf0/0x130
dlpar_offline_cpu+0x1c4/0x2a0
dlpar_cpu_remove+0xb8/0x190
dlpar_cpu_remove_by_index+0x12c/0x150
dlpar_cpu+0x94/0x800
pseries_hp_work_fn+0x128/0x1e0
process_one_work+0x304/0x5d0
worker_thread+0xcc/0x7a0
kthread+0x1ac/0x1c0
ret_from_kernel_thread+0x5c/0x80
The latter trace is due to the following sequence:
page_alloc_cpu_dead
drain_pages
drain_pages_zone
free_pcppages_bulk
where drain_pages() in this case is called under the assumption that
the unplugged cpu is no longer executing. To ensure that is the case,
and early call is made to __cpu_die()->pseries_cpu_die(), which runs a
loop that waits for the cpu to reach a halted state by polling its
status via query-cpu-stopped-state RTAS calls. It only polls for 25
iterations before giving up, however, and in the trace above this
results in the following being printed only .1 seconds after the
hotplug worker thread begins processing the unplug request:
pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
Querying DEAD? cpu 314 (314) shows 2
At that point the worker thread assumes the unplugged CPU is in some
unknown/dead state and procedes with the cleanup, causing the race
with the XIVE cleanup code executed by the unplugged CPU.
Fix this by waiting indefinitely, but also making an effort to avoid
spurious lockup messages by allowing for rescheduling after polling
the CPU status and printing a warning if we wait for longer than 120s.
Fixes: eac1e731b5 ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Tested-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
[mpe: Trim oopses in change log slightly for readability]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200811161544.10513-1-mdroth@linux.vnet.ibm.com
Multipath errors were seen during failback due to login timeout. The
remote device sent LOGO, the local host tore down the session and did
relogin. The RSCN arrived indicates remote device is going through failover
after which the relogin is in a 20s timeout phase. At this point the
driver is stuck in the relogin process. Add a fix to delete the session as
part of abort/flush the login.
Link: https://lore.kernel.org/r/20200806111014.28434-5-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If we pass in an offset which is larger than PAGE_SIZE, then
page_is_mergeable() thinks it's not mergeable with the previous bio_vec,
leading to a large number of bio_vecs being used. Use a slightly more
obvious test that the two pages are compatible with each other.
Fixes: 52d52d1c98 ("block: only allow contiguous page structs in a bio_vec")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures that were
mainly seen on aarch64 machines (e.g. RPi 4 with four A72 CPUs). The
problem was tracked down to a missing critical section on a "short circuit"
path. Namely, the time to process the current command so far has already
exceeded the requested command duration (i.e. the number of nanoseconds in
the ndelay parameter).
The random=1 parameter setting was pivotal in finding this error. The
failure scenario involved first taking that "short circuit" path (due to a
very short command duration) and then taking the more likely
hrtimer_start() path (due to a longer command duration). With random=1 each
command's duration is taken from the uniformly distributed [0..ndelay)
interval. The fio utility also helped by reliably generating the error
scenario at about once per minute on a RPi 4 (64 bit OS).
Link: https://lore.kernel.org/r/20200813155738.109298-1-dgilbert@interlog.com
Reported-by: John Garry <john.garry@huawei.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Before v4.15 commit 75492a5156 ("s390/scsi: Convert timers to use
timer_setup()"), we intentionally only passed zfcp_adapter as context
argument to zfcp_fsf_request_timeout_handler(). Since we only trigger
adapter recovery, it was unnecessary to sync against races between timeout
and (late) completion. Likewise, we only passed zfcp_erp_action as context
argument to zfcp_erp_timeout_handler(). Since we only wakeup an ERP action,
it was unnecessary to sync against races between timeout and (late)
completion.
Meanwhile the timeout handlers get timer_list as context argument and do a
timer-specific container-of to zfcp_fsf_req which can have been freed.
Fix it by making sure that any request timeout handlers, that might just
have started before del_timer(), are completed by using del_timer_sync()
instead. This ensures the request free happens afterwards.
Space time diagram of potential use-after-free:
Basic idea is to have 2 or more pending requests whose timeouts run out at
almost the same time.
req 1 timeout ERP thread req 2 timeout
---------------- ---------------- ---------------------------------------
zfcp_fsf_request_timeout_handler
fsf_req = from_timer(fsf_req, t, timer)
adapter = fsf_req->adapter
zfcp_qdio_siosl(adapter)
zfcp_erp_adapter_reopen(adapter,...)
zfcp_erp_strategy
...
zfcp_fsf_req_dismiss_all
list_for_each_entry_safe
zfcp_fsf_req_complete 1
del_timer 1
zfcp_fsf_req_free 1
zfcp_fsf_req_complete 2
zfcp_fsf_request_timeout_handler
del_timer 2
fsf_req = from_timer(fsf_req, t, timer)
zfcp_fsf_req_free 2
adapter = fsf_req->adapter
^^^^^^^ already freed
Link: https://lore.kernel.org/r/20200813152856.50088-1-maier@linux.ibm.com
Fixes: 75492a5156 ("s390/scsi: Convert timers to use timer_setup()")
Cc: <stable@vger.kernel.org> #4.15+
Suggested-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>