When running suspend test, kernel crash happened in ath10k, and it is
fixed by commit b72a4aff94 ("ath10k: skip ath10k_halt during suspend
for driver state RESTARTING").
Currently the crash is fixed, but as a common code style, it is better
to set the pointer to NULL after memory is free.
This is to address the code style and it will avoid potential bug of
use-after-free.
Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00110-QCARMSWP-1
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220505092248.787-1-quic_wgong@quicinc.com
It has a fail log which is ath11k_dbg in ath11k_dp_rx_process_mon_status(),
as below, it will not print when debug_mask is not set ATH11K_DBG_DATA.
ath11k_dbg(ab, ATH11K_DBG_DATA,
"failed to find the peer with peer_id %d\n",
ppdu_info.peer_id);
When run scan with station disconnected, the peer_id is 0 for case
HAL_RX_MPDU_START in ath11k_hal_rx_parse_mon_status_tlv() which called
from ath11k_dp_rx_process_mon_status(), and the peer_id of ppdu_info is
reset to 0 in the while loop, so it does not match condition of the
check "if (ppdu_info->peer_id == HAL_INVALID_PEERID" in the loop, and
then the log "failed to find the peer with peer_id 0" print after the
check in the loop, it is below call stack when debug_mask is set
ATH11K_DBG_DATA.
The reason is this commit 01d2f285e3 ("ath11k: decode HE status tlv")
add "memset(ppdu_info, 0, sizeof(struct hal_rx_mon_ppdu_info))" in
ath11k_dp_rx_process_mon_status(), but the commit does not initialize
the peer_id to HAL_INVALID_PEERID, then lead the check mis-match.
Callstack of the failed log:
[12335.689072] RIP: 0010:ath11k_dp_rx_process_mon_status+0x9ea/0x1020 [ath11k]
[12335.689157] Code: 89 ff e8 f9 10 00 00 be 01 00 00 00 4c 89 f7 e8 dc 4b 4e de 48 8b 85 38 ff ff ff c7 80 e4 07 00 00 01 00 00 00 e9 20 f8 ff ff <0f> 0b 41 0f b7 96 be 06 00 00 48 c7 c6 b8 50 44 c1 4c 89 ff e8 fd
[12335.689180] RSP: 0018:ffffb874001a4ca0 EFLAGS: 00010246
[12335.689210] RAX: 0000000000000000 RBX: ffff995642cbd100 RCX: 0000000000000000
[12335.689229] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff99564212cd18
[12335.689248] RBP: ffffb874001a4dc0 R08: 0000000000000001 R09: 0000000000000000
[12335.689268] R10: 0000000000000220 R11: ffffb874001a48e8 R12: ffff995642473d40
[12335.689286] R13: ffff99564212c5b8 R14: ffff9956424736a0 R15: ffff995642120000
[12335.689303] FS: 0000000000000000(0000) GS:ffff995739000000(0000) knlGS:0000000000000000
[12335.689323] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12335.689341] CR2: 00007f43c5d5e039 CR3: 000000011c012005 CR4: 00000000000606e0
[12335.689360] Call Trace:
[12335.689377] <IRQ>
[12335.689418] ? rcu_read_lock_held_common+0x12/0x50
[12335.689447] ? rcu_read_lock_sched_held+0x25/0x80
[12335.689471] ? rcu_read_lock_held_common+0x12/0x50
[12335.689504] ath11k_dp_rx_process_mon_rings+0x8d/0x4f0 [ath11k]
[12335.689578] ? ath11k_dp_rx_process_mon_rings+0x8d/0x4f0 [ath11k]
[12335.689653] ? lock_acquire+0xef/0x360
[12335.689681] ? rcu_read_lock_sched_held+0x25/0x80
[12335.689713] ath11k_dp_service_mon_ring+0x38/0x60 [ath11k]
[12335.689784] ? ath11k_dp_rx_process_mon_rings+0x4f0/0x4f0 [ath11k]
[12335.689860] call_timer_fn+0xb2/0x2f0
[12335.689897] ? ath11k_dp_rx_process_mon_rings+0x4f0/0x4f0 [ath11k]
[12335.689970] run_timer_softirq+0x21f/0x540
[12335.689999] ? ktime_get+0xad/0x160
[12335.690025] ? lapic_next_deadline+0x2c/0x40
[12335.690053] ? clockevents_program_event+0x82/0x100
[12335.690093] __do_softirq+0x151/0x4a8
[12335.690135] irq_exit_rcu+0xc9/0x100
[12335.690165] sysvec_apic_timer_interrupt+0xa8/0xd0
[12335.690189] </IRQ>
[12335.690204] <TASK>
[12335.690225] asm_sysvec_apic_timer_interrupt+0x12/0x20
Reset the default value to HAL_INVALID_PEERID each time after memset
of ppdu_info as well as others memset which existed in function
ath11k_dp_rx_process_mon_status(), then the failed log disappeared.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
Fixes: 01d2f285e3 ("ath11k: decode HE status tlv")
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220518033556.31940-1-quic_wgong@quicinc.com
Currently commit 1f682dc9fb ("ath11k: reduce the wait time of 11d scan
and hw scan while add interface") introduced a wait_for_completion_timeout
operation for ar->scan.completed, another one is existed in ath11k_scan_stop(),
then ath11k has two places to wait for the ar->scan.completed and they
run in different thread, thus it is possible to happend that the two
thread both enter wait status. To handle this scenario, ath11k should
change the complete() to complete_all() for the ar->scan.completed. This
also work well when it is only one thread wait for ar->scan.completed.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220919024413.25083-1-quic_wgong@quicinc.com
If Wi-Fi driver send Wi-Fi status to BT via scoreboard too frequent in a
short moment, BT will loss some of them because of hardware response time.
To avoid this issue, change the code flow. Summarize the scoreboard changes
and if the value have changed, send the scoreboard to BT only once in
a coexistence processing cycle. It also can help to reduce driver I/O.
Signed-off-by: Ching-Te Ku <ku920601@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220920010939.12173-8-pkshih@realtek.com
WiFi/BT combo module could only have two antenna, namely WL_S0 and WL_S1.
WiFi can use two antenna to TX/RX 2SS data, and BT can share one of the
antenna. This patch is to allow WiFi to TX/RX 1SS data like ACK/RTS/CTS to
improve Wi-Fi performance while coexisting with Bluetooth.
Signed-off-by: Ching-Te Ku <ku920601@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220920010939.12173-6-pkshih@realtek.com
This debug entry is to summarize important messages to quickly address
problem types, such as firmware hang, C2H/H2C stuck, or too much
occupation of BT A2DP. If unexpected something is addressed, we can dig
the problem via other debug messages that provide more detail information.
Signed-off-by: Ching-Te Ku <ku920601@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220920010939.12173-4-pkshih@realtek.com
There are sleep in atomic context bugs when uploading device dump
data in mwifiex. The root cause is that dev_coredumpv could not
be used in atomic contexts, because it calls dev_set_name which
include operations that may sleep. The call tree shows execution
paths that could lead to bugs:
(Interrupt context)
fw_dump_timer_fn
mwifiex_upload_device_dump
dev_coredumpv(..., GFP_KERNEL)
dev_coredumpm()
kzalloc(sizeof(*devcd), gfp); //may sleep
dev_set_name
kobject_set_name_vargs
kvasprintf_const(GFP_KERNEL, ...); //may sleep
kstrdup(s, GFP_KERNEL); //may sleep
The corresponding fail log is shown below:
[ 135.275938] usb 1-1: == mwifiex dump information to /sys/class/devcoredump start
[ 135.281029] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:265
...
[ 135.293613] Call Trace:
[ 135.293613] <IRQ>
[ 135.293613] dump_stack_lvl+0x57/0x7d
[ 135.293613] __might_resched.cold+0x138/0x173
[ 135.293613] ? dev_coredumpm+0xca/0x2e0
[ 135.293613] kmem_cache_alloc_trace+0x189/0x1f0
[ 135.293613] ? devcd_match_failing+0x30/0x30
[ 135.293613] dev_coredumpm+0xca/0x2e0
[ 135.293613] ? devcd_freev+0x10/0x10
[ 135.293613] dev_coredumpv+0x1c/0x20
[ 135.293613] ? devcd_match_failing+0x30/0x30
[ 135.293613] mwifiex_upload_device_dump+0x65/0xb0
[ 135.293613] ? mwifiex_dnld_fw+0x1b0/0x1b0
[ 135.293613] call_timer_fn+0x122/0x3d0
[ 135.293613] ? msleep_interruptible+0xb0/0xb0
[ 135.293613] ? lock_downgrade+0x3c0/0x3c0
[ 135.293613] ? __next_timer_interrupt+0x13c/0x160
[ 135.293613] ? lockdep_hardirqs_on_prepare+0xe/0x220
[ 135.293613] ? mwifiex_dnld_fw+0x1b0/0x1b0
[ 135.293613] __run_timers.part.0+0x3f8/0x540
[ 135.293613] ? call_timer_fn+0x3d0/0x3d0
[ 135.293613] ? arch_restore_msi_irqs+0x10/0x10
[ 135.293613] ? lapic_next_event+0x31/0x40
[ 135.293613] run_timer_softirq+0x4f/0xb0
[ 135.293613] __do_softirq+0x1c2/0x651
...
[ 135.293613] RIP: 0010:default_idle+0xb/0x10
[ 135.293613] RSP: 0018:ffff888006317e68 EFLAGS: 00000246
[ 135.293613] RAX: ffffffff82ad8d10 RBX: ffff888006301cc0 RCX: ffffffff82ac90e1
[ 135.293613] RDX: ffffed100d9ff1b4 RSI: ffffffff831ad140 RDI: ffffffff82ad8f20
[ 135.293613] RBP: 0000000000000003 R08: 0000000000000000 R09: ffff88806cff8d9b
[ 135.293613] R10: ffffed100d9ff1b3 R11: 0000000000000001 R12: ffffffff84593410
[ 135.293613] R13: 0000000000000000 R14: 0000000000000000 R15: 1ffff11000c62fd2
...
[ 135.389205] usb 1-1: == mwifiex dump information to /sys/class/devcoredump end
This patch uses delayed work to replace timer and moves the operations
that may sleep into a delayed work in order to mitigate bugs, it was
tested on Marvell 88W8801 chip whose port is usb and the firmware is
usb8801_uapsta.bin. The following is the result after using delayed
work to replace timer.
[ 134.936453] usb 1-1: == mwifiex dump information to /sys/class/devcoredump start
[ 135.043344] usb 1-1: == mwifiex dump information to /sys/class/devcoredump end
As we can see, there is no bug now.
Fixes: f5ecd02a8b ("mwifiex: device dump support for usb interface")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/5cfa5c473ff6d069cb67760ffa04a2f84ef450a8.1661252818.git.duoming@zju.edu.cn
Add support for WoW on WCN6750 chipset.
Unlike other chips where WoW exit happens after sending WoW wakeup
WMI command, exit from WoW suspend in the case of WCN6750 happens
upon sending a WoW exit SMP2P (Shared memory point to point) message
to the firmware.
Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1
Signed-off-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220902112520.24804-3-quic_mpubbise@quicinc.com
In current code STA_KEEPALIVE_ARP_RESPONSE TLV header is included only
when ARP method is used, this causes firmware always to crash when wowlan
is enabled because firmware needs it to be present no matter ARP method
is used or not.
Fix this issue by including STA_KEEPALIVE_ARP_RESPONSE TLV header by
default.
Also fix below typo:
s/WMI_TAG_STA_KEEPALVE_ARP_RESPONSE/WMI_TAG_STA_KEEPALIVE_ARP_RESPONSE/
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
Fixes: 0f84a156aa ("ath11k: Handle keepalive during WoWLAN suspend and resume")
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220913044358.2037-1-quic_bqiang@quicinc.com
The signal-to-noise-ratio SNR is returned by the wcn36xx firmware for each
received frame. SNR represents all of the unwanted interference signal
after filtering out the fundamental frequency and harmonics of the
frequency.
Noise can come from various electromagnetic sources, from temperature
affecting the performance hardware components or quantization effects
converting from analog to digital domains.
The SNR value returned by the WiFi firmware then is a good source of
entropy.
Other WiFi drivers offer up the noise component of the FFT as an entropy
source for the random pool e.g.
commit 2aa56cca35 ("ath9k: Mix the received FFT bins to the random pool")
I attended Jason's talk on sources of randomness at Plumbers and it
occurred to me that SNR is a reasonable candidate to add.
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220915004117.1562703-2-bryan.odonoghue@linaro.org
Even though ieee80211_hw.queues is set to 2, the ralink rt2x00 driver
is seeing tx skbs submitted to it with the queue-id set to 2 / set to
IEEE80211_AC_BE on a rt2500 card when associating with an access-point.
This causes rt2x00queue_get_tx_queue() to return NULL and the following
error to be logged: "ieee80211 phy0: rt2x00mac_tx: Error - Attempt to
send packet over invalid queue 2", after which association with the AP
fails.
This patch works around this by mapping QID_AC_BE and QID_AC_BK
to QID_AC_VI when there are only 2 tx_queues.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220908173618.155291-2-hdegoede@redhat.com
SER (system error recovery) can deal with different crash types by
different levels of processes. Previous FW crash simulation triggers
a CPU exception which is one kind of SER L2 type. It can verify SER L2
flow which includes HW/FW restart.
Now, we want to increase crash simulation types. A debug function is added
to trigger control error in purpose for SER L1 simulation/verification.
And, debugfs fw_crash is extended to accept different parameters.
echo 1 > fw_crash:
simulate CPU exception as before
(keep 1 for compatibility with previous)
It will be catched and handled by SER L2.
(this requires HW/FW restart)
echo 2 > fw_crash:
simulate control error
It will be catched and handled by SER L1.
(driver and FW cooperate to recover this)
Besides, in order to apply to the above two cases,
rename RTW89_FLAG_RESTART_TRIGGER to RTW89_FLAG_CRASH_SIMULATING
and adjust where SER flow clears this bit for both L1 and L2.
Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220914035034.14521-5-pkshih@realtek.com
The variable cnt_connecting is to indicate if we are connecting to an AP.
This is an important clue for coexistence to assign more time slot to WiFi
side in this situation to ensure WiFi can establish connection.
Without this patch, compiler warns:
drivers/net/wireless/realtek/rtw89/coex.c:3244:25: warning: variable
'cnt_connecting' set but not used [-Wunused-but-set-variable]
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220912021009.6011-1-pkshih@realtek.com
Show TDMA information containing TDMA policy and time slot of Wi-Fi/BT in
debug message to check things are in expected. The v1 format contains
additional header, and remaining part is the same as original. So 8852CE
selects v1 version, and then everything like original.
Signed-off-by: Ching-Te Ku <ku920601@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220913092546.43722-6-pkshih@realtek.com
Parsing firmware error message from original version and v1 reports to
show up exception counter of commands from firmware in debug message.
Then, we can make sure exchange commands are correct totally.
In the later version Wi-Fi firmware(v1), the report format was changed.
With this update, we can yield correct report from proper struct.
Signed-off-by: Ching-Te Ku <ku920601@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220913092546.43722-5-pkshih@realtek.com