Commit Graph

1060530 Commits

Author SHA1 Message Date
Bernard Zhao
8b11e14bd5 drm/amd/amdgpu: cleanup the code style a bit
This change is to cleanup the code style a bit.

Signed-off-by: Bernard Zhao <bernard@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
Bernard Zhao
7b755d6510 drm/amd/amdgpu: remove useless break after return
This change is to remove useless break after return.

Signed-off-by: Bernard Zhao <bernard@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
Ye Guojin
88ac6df8af drm/amd/display: fix cond_no_effect.cocci warnings
This was found by coccicheck:
./drivers/gpu/drm/amd/display/dc/core/dc_resource.c, 2516, 7-9, WARNING
possible condition with no effect (if == else)

hdmi_info.bits.YQ0_YQ1 is always YYC_QUANTIZATION_LIMITED_RANGE.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
Jiapeng Chong
1da2fcc435 drm/amd/display: Clean up some inconsistent indenting
Eliminate the follow smatch warning:

drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_srv.c:622
dmub_srv_cmd_execute() warn: inconsistent indenting.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
hongao
6c5af7d2f8 drm/amdgpu: fix set scaling mode Full/Full aspect/Center not works on vga and dvi connectors
amdgpu_connector_vga_get_modes missed function amdgpu_get_native_mode
which assign amdgpu_encoder->native_mode with *preferred_mode result in
amdgpu_encoder->native_mode.clock always be 0. That will cause
amdgpu_connector_set_property returned early on:
if ((rmx_type != DRM_MODE_SCALE_NONE) &&
	(amdgpu_encoder->native_mode.clock == 0))
when we try to set scaling mode Full/Full aspect/Center.
Add the missing function to amdgpu_connector_vga_get_mode can fix this.
It also works on dvi connectors because
amdgpu_connector_dvi_helper_funcs.get_mode use the same method.

Signed-off-by: hongao <hongao@uniontech.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
Roman Li
b295ce3991 drm/amd/display: Fix OLED brightness control on eDP
[Why]
After commit ("drm/amdgpu/display: add support for multiple backlights")
number of eDPs is defined while registering backlight device.
However the panel's extended caps get updated once before register call.
That leads to regression with extended caps like oled brightness control.

[How]
Update connector ext caps after register_backlight_device

Fixes: 7fd13baeb7 ("drm/amdgpu/display: add support for multiple backlights")
Link: https://www.reddit.com/r/AMDLaptops/comments/qst0fm/after_updating_to_linux_515_my_brightness/

Signed-off-by: Roman Li <Roman.Li@amd.com>
Tested-by: Samuel Čavoj <samuel@cavoj.net>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jasdeep Dhillon <Jasdeep.Dhillon@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
Candice Li
d9a69fe512 drm/amdgpu: Add recovery_lock to save bad pages function
Fix race condition failure during UMC UE injection.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
Tao Zhou
3ebd8bf023 drm/amdgpu: support new mode-1 reset interface (v2)
If gpu reset is triggered by ras fatal error, tell it to smu in mode-1
reset message.

v2: move mode-1 reset function to aldebaran_ppt.c since it's aldebaran
specific currently.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
shaoyunl
c96cb65989 drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
already been called, the start_cpsch will not be called since there is no resume in this
case.  When reset been triggered again, driver should avoid to do uninitialization again.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
Lijo Lazar
33155ce6e1 drm/amd/pm: Remove artificial freq level on Navi1x
Print Navi1x fine grained clocks in a consistent manner with other SOCs.
Don't show aritificial DPM level when the current clock equals min or max.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
Evan Quan
6c08e0ef87 drm/amd/pm: avoid duplicate powergate/ungate setting
Just bail out if the target IP block is already in the desired
powergate/ungate state. This can avoid some duplicate settings
which sometimes may cause unexpected issues.

Link: https://lore.kernel.org/all/YV81vidWQLWvATMM@zn.tnic/
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214921
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215025
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1789
Signed-off-by: Evan Quan <evan.quan@amd.com>
Tested-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
Sung Joon Kim
ed12f3f198 drm/amd/display: Revert "retain/release stream pointer in link enc table"
[why]
Change causing issue. Need to revert the change.

Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Sung Joon Kim <sungkim@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
Aric Cyr
e90f0bb0c7 drm/amd/display: 3.2.162
This version brings along following fixes:
- Fix issue that secondary display goes blank on Non DCN31.
- Adjust flushing data in DMCUB
- Revert patches which cause regression in hadnling MPO/Link encoder assignment
- Correct the setting within MSA of DP2.0
- Adjustment for DML isolation
- Fix FIFO erro in fast boot sequence
- Enable DSC over eDP
- Adjust the DSC power off sequence

Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
Yi-Ling Chen
8fa6f4c571 drm/amd/display: fixed the DSC power off sequence during Driver PnP
[WHY]
After unloading driver, driver would not disable DSC function.
At next loading driver, driver would power all DSC engines off.
When driver powered the active DSC off, the screen would be gray
until reprograming DSC relatived register correcntly.

[HOW]
1. Remove DSC Power down code into init_pipes()
2. Depend on  the OTG mapping information and DSC status to skip
power off for the working DSC.

Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Yi-Ling Chen <Yi-Ling.Chen2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
Anthony Koo
3f232a0fdb drm/amd/display: [FW Promotion] Release 0.0.93
- Fix ARR39-C issue with scaled integer addition in rb func

Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:02 -05:00
Anthony Koo
1f49355c4c drm/amd/display: [FW Promotion] Release 0.0.92
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
hvanzyll
21f45a2363 drm/amd/display: Visual Confirm Bar Height Adjust
[What]
This change allows adjustment to the Visual Confirm
height border.

[Why]
Aids debugging and testing

[How]
Use the existing infrastructure to implement logic to
draw borders

Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Harry VanZyllDeJong <hvanzyll@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
Brandon Syu
189789a15f drm/amd/display: Fix eDP will flash when boot to OS
[WHY]
With eDP DSC enabled and set 4K 60Hz, there would be screen
corruption when booting to OS or enabling the driver.

[HOW]
Avoid powering down VDD when we cannot apply eDP fast boot.

Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Brandon Syu <Brandon.Syu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
Mikita Lipski
2665f63a73 drm/amd/display: Enable DSC over eDP
[why]
- Adding a DM interface to enable DSC over eDP on Linux
- DSC over eDP will allow to power savings by reducing
the bandwidth required to support panel's modes
- Apply link optimization algorithm to reduce link bandwidth
when DSC is enabled

[how]
- Read eDP panel's DSC capabilities
- Apply DSC policy on eDP panel based on its DSC capabilities
- Enable DSC encoder's on the pipe
- Enable DSC on panel's side by setting DSC_ENABLE DPCD register
- Adding link optimization algorithm to reduce link rate or lane
count based

Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
Angus Wang
2430be71c0 drm/amd/display: Fix LTTPR not Enabled
[WHY]
Previous LTTPR change has caused a regression that led to an
issue where LTTPR is disabled

[HOW]
Extended changes from previous fix to DCN30X

Reviewed-by: Wesley Chalmers <wesley.chalmers@amd.com>
Reviewed-by: Martin Leung <Martin.Leung@amd.com>
Reviewed-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Angus Wang <angus.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
Xu, Jinze
430bb83dbd drm/amd/display: Reset fifo after enable otg
[Why]
In fast boot sequence, when change dispclk, otg is disabled but digfe
is enabled. This may cause dig fifo error.

[How]
Reset dig fifo after enable otg.

Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: JinZe.Xu <JinZe.Xu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
Jun Lei
d26c4ffba6 drm/amd/display: Code change for DML isolation
[why]
DML itself is SW only, putting the logic as part of resource makes it
hw dependent and thus impossible to compile separately from dc.
Separate compilation is critical for unit testing as well as bbox tool
development

[how]
create new dml wrapper.
Copy logic from the validation functions into dml wrapper as base
implementation. Dml wrapper has internal/static implementations
for all helpers, and does not reference other functions.
It may reference dc structures/types for convenience.

This change now has all the changes for DML isolation squashed into
one.

Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Jun Lei <jun.lei@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
Wenjing Liu
ef9d5a54da drm/amd/display: set MSA vsp/hsp to 0 for positive polarity for DP 128b/132b
[why]
There is a bug in MSA programming sequence that mistakenly set
MSA vsp/hsp to 1 for positive polarity. This is incorrect.

Reviewed-by: Ariel Bernstein <Eric.Bernstein@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
Angus Wang
fd3b2e21b8 drm/amd/display: Revert changes for MPO underflow
[WHY]
The previous changes for fixing MPO underflow with multiple
display connected caused a regression where the machine runs
into a hang when doing multiple driver pnp with multiple displays
connected

[HOW]
Reverted offending change

Reviewed-by: Martin Leung <Martin.Leung@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Angus Wang <angus.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
Nicholas Kazlauskas
a53b554b56 drm/amd/display: Only flush delta from last command execution
[Why]
We're currently flushing commands that had been previously been
flushed or are currently being processed by the DMCUB when we don't
immediately wait for idle after command execution.

[How]
Avoiding reflushing the data by keeping track of the last wptr.

We'll treat this as the actual rptr by creating a copy of the inbox
and modifying the copy's rptr.

Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
Ahmad Othman
c09bb36dd1 drm/amd/display: Secondary display goes blank on Non DCN31
[Why]
Due to integration issues with branch merging,
a regression happened that prevented secondary
displays from lighting up or enabling certain features

[How]
Separated the new logic to be for DCN31 only and retained
pre DCN31 logic for all other ASICs

Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Ahmad Othman <Ahmad.Othman@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
Ramesh Errabolu
d25e35bc26 drm/amdgpu: Pin MMIO/DOORBELL BO's in GTT domain
MMIO/DOORBELL BOs encode control data and should be pinned in GTT
domain before enabling PCIe connected peer devices in accessing it

Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
Ramesh Errabolu
f441dd33db drm/amdgpu: Update BO memory accounting to rely on allocation flag
Accounting system to track amount of available memory (system, TTM
and VRAM of a device) relies on BO's domain. The change is to rely
instead on allocation flag indicating BO type - VRAM, GTT, USERPTR,
MMIO or DOORBELL

Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
Leo (Hanghong) Ma
1d925758ba drm/amd/display: Reduce dmesg error to a debug print
[Why & How]
Dmesg errors are found on dcn3.1 during reset test, but it's not
a really failure. So reduce it to a debug print.

Signed-off-by: Leo (Hanghong) Ma <hanghong.ma@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:45:01 -05:00
Leo (Hanghong) Ma
625097a9e0 drm/amd/display: Drop config guard for DC_LOG_DP2
[Why & How]
It doesn't make sense to guard DC_LOG_DP2 by CONFIG_DRM_AMD_DCN, and
this also caused build failure for allmodconfig; So drop the guard
to fix the compile failure;

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Leo (Hanghong) Ma <hanghong.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22 14:44:40 -05:00
Nadav Amit
13e4ad2ce8 hugetlbfs: flush before unlock on move_hugetlb_page_tables()
We must flush the TLB before releasing i_mmap_rwsem to avoid the
potential reuse of an unshared PMDs page.  This is not true in the case
of move_hugetlb_page_tables().  The last reference on the page table can
therefore be dropped before the TLB flush took place.

Prevent it by reordering the operations and flushing the TLB before
releasing i_mmap_rwsem.

Fixes: 550a7d60bd ("mm, hugepages: add mremap() support for hugepage backed vma")
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-22 11:36:46 -08:00
Nadav Amit
a4a118f2ee hugetlbfs: flush TLBs correctly after huge_pmd_unshare
When __unmap_hugepage_range() calls to huge_pmd_unshare() succeed, a TLB
flush is missing.  This TLB flush must be performed before releasing the
i_mmap_rwsem, in order to prevent an unshared PMDs page from being
released and reused before the TLB flush took place.

Arguably, a comprehensive solution would use mmu_gather interface to
batch the TLB flushes and the PMDs page release, however it is not an
easy solution: (1) try_to_unmap_one() and try_to_migrate_one() also call
huge_pmd_unshare() and they cannot use the mmu_gather interface; and (2)
deferring the release of the page reference for the PMDs page until
after i_mmap_rwsem is dropeed can confuse huge_pmd_unshare() into
thinking PMDs are shared when they are not.

Fix __unmap_hugepage_range() by adding the missing TLB flush, and
forcing a flush when unshare is successful.

Fixes: 24669e5847 ("hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages)" # 3.6
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-22 11:36:46 -08:00
Rob Clark
e4840d537c drm/msm: Do hw_init() before capturing GPU state
In particular, we need to ensure all the necessary blocks are switched
to 64b mode (a5xx+) otherwise the high bits of the address of the BO to
snapshot state into will be ignored, resulting in:

  *** gpu fault: ttbr0=0000000000000000 iova=0000000000012000 dir=READ type=TRANSLATION source=CP (0,0,0,0)
  platform 506a000.gmu: [drm:a6xx_gmu_set_oob] *ERROR* Timeout waiting for GMU OOB set BOOT_SLUMBER: 0x0

Fixes: 4f776f4511 ("drm/msm/gpu: Convert the GPU show function to use the GPU state")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211108180122.487859-1-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-22 10:45:55 -08:00
Lorenzo Bianconi
6e53d6d269 mt76: mt7915: fix NULL pointer dereference in mt7915_get_phy_mode
Fix the following NULL pointer dereference in mt7915_get_phy_mode
routine adding an ibss interface to the mt7915 driver.

[  101.137097] wlan0: Trigger new scan to find an IBSS to join
[  102.827039] wlan0: Creating new IBSS network, BSSID 26:a4:50:1a:6e:69
[  103.064756] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  103.073670] Mem abort info:
[  103.076520]   ESR = 0x96000005
[  103.079614]   EC = 0x25: DABT (current EL), IL = 32 bits
[  103.084934]   SET = 0, FnV = 0
[  103.088042]   EA = 0, S1PTW = 0
[  103.091215] Data abort info:
[  103.094104]   ISV = 0, ISS = 0x00000005
[  103.098041]   CM = 0, WnR = 0
[  103.101044] user pgtable: 4k pages, 39-bit VAs, pgdp=00000000460b1000
[  103.107565] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[  103.116590] Internal error: Oops: 96000005 [#1] SMP
[  103.189066] CPU: 1 PID: 333 Comm: kworker/u4:3 Not tainted 5.10.75 #0
[  103.195498] Hardware name: MediaTek MT7622 RFB1 board (DT)
[  103.201124] Workqueue: phy0 ieee80211_iface_work [mac80211]
[  103.206695] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
[  103.212705] pc : mt7915_get_phy_mode+0x68/0x120 [mt7915e]
[  103.218103] lr : mt7915_mcu_add_bss_info+0x11c/0x760 [mt7915e]
[  103.223927] sp : ffffffc011cdb9e0
[  103.227235] x29: ffffffc011cdb9e0 x28: ffffff8006563098
[  103.232545] x27: ffffff8005f4da22 x26: ffffff800685ac40
[  103.237855] x25: 0000000000000001 x24: 000000000000011f
[  103.243165] x23: ffffff8005f4e260 x22: ffffff8006567918
[  103.248475] x21: ffffff8005f4df80 x20: ffffff800685ac58
[  103.253785] x19: ffffff8006744400 x18: 0000000000000000
[  103.259094] x17: 0000000000000000 x16: 0000000000000001
[  103.264403] x15: 000899c3a2d9d2e4 x14: 000899bdc3c3a1c8
[  103.269713] x13: 0000000000000000 x12: 0000000000000000
[  103.275024] x11: ffffffc010e30c20 x10: 0000000000000000
[  103.280333] x9 : 0000000000000050 x8 : ffffff8006567d88
[  103.285642] x7 : ffffff8006563b5c x6 : ffffff8006563b44
[  103.290952] x5 : 0000000000000002 x4 : 0000000000000001
[  103.296262] x3 : 0000000000000001 x2 : 0000000000000001
[  103.301572] x1 : 0000000000000000 x0 : 0000000000000011
[  103.306882] Call trace:
[  103.309328]  mt7915_get_phy_mode+0x68/0x120 [mt7915e]
[  103.314378]  mt7915_bss_info_changed+0x198/0x200 [mt7915e]
[  103.319941]  ieee80211_bss_info_change_notify+0x128/0x290 [mac80211]
[  103.326360]  __ieee80211_sta_join_ibss+0x308/0x6c4 [mac80211]
[  103.332171]  ieee80211_sta_create_ibss+0x8c/0x10c [mac80211]
[  103.337895]  ieee80211_ibss_work+0x3dc/0x614 [mac80211]
[  103.343185]  ieee80211_iface_work+0x388/0x3f0 [mac80211]
[  103.348495]  process_one_work+0x288/0x690
[  103.352499]  worker_thread+0x70/0x464
[  103.356157]  kthread+0x144/0x150
[  103.359380]  ret_from_fork+0x10/0x18
[  103.362952] Code: 394008c3 52800220 394000e4 7100007f (39400023)

Fixes: 37f4ca907c ("mt76: mt7915: register per-phy HE capabilities for each interface")
Fixes: e57b790146 ("mt76: add mac80211 driver for MT7915 PCIe-based chipsets")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/ddae419a740f1fb9e48afd432035e9f394f512ee.1637239456.git.lorenzo@kernel.org
2021-11-22 20:43:09 +02:00
Ping-Ke Shih
5737b4515d rtw89: update partition size of firmware header on skb->data
The partition size is used to tell hardware the size of piece we are going
to send a firmware. The old code updates the size in constant buffer of
firmware, and leads system crash.

To fix this, update the size on skb->data after we copy the firmware data
into skb.

Buglink: https://bugzilla.opensuse.org/show_bug.cgi?id=1188303
Fixes: e3ec7017f6 ("rtw89: add Realtek 802.11ax driver")
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211119054512.10620-2-pkshih@realtek.com
2021-11-22 20:42:15 +02:00
Christophe JAILLET
a571bc2832 iwlwifi: Fix memory leaks in error handling path
Should an error occur (invalid TLV len or memory allocation failure), the
memory already allocated in 'reduce_power_data' should be freed before
returning, otherwise it is leaking.

Fixes: 9dad325f9d ("iwlwifi: support loading the reduced power table from UEFI")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1504cd7d842d13ddb8244e18004523128d5c9523.1636615284.git.christophe.jaillet@wanadoo.fr
2021-11-22 20:39:16 +02:00
Łukasz Bartosik
f5cecf1d4c iwlwifi: fix warnings produced by kernel debug options
Fix warnings produced by:
- lockdep_assert_wiphy() in function reg_process_self_managed_hint(),
- wiphy_dereference() in function iwl_mvm_init_fw_regd().
Both function are expected to be called in critical section.

The warnings were discovered when running v5.15 kernel
with debug options enabled:

1)
Hardware name: Google Delbin/Delbin
RIP: 0010:reg_process_self_managed_hint+0x254/0x347 [cfg80211]
...
Call Trace:
regulatory_set_wiphy_regd_sync+0x3d/0xb0
iwl_mvm_init_mcc+0x49d/0x5a2
iwl_op_mode_mvm_start+0x1b58/0x2507
? iwl_mvm_reprobe_wk+0x94/0x94
_iwl_op_mode_start+0x146/0x1a3
iwl_opmode_register+0xda/0x13d
init_module+0x28/0x1000

2)
drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c:263 suspicious rcu_dereference_protected() usage!
...
Hardware name: Google Delbin/Delbin, BIOS Google_Delbin
Call Trace:
dump_stack_lvl+0xb1/0xe6
iwl_mvm_init_fw_regd+0x2e7/0x379
iwl_mvm_init_mcc+0x2c6/0x5a2
iwl_op_mode_mvm_start+0x1b58/0x2507
? iwl_mvm_reprobe_wk+0x94/0x94
_iwl_op_mode_start+0x146/0x1a3
iwl_opmode_register+0xda/0x13d
init_module+0x28/0x100

Fixes: a05829a722 ("cfg80211: avoid holding the RTNL when calling the driver")
Signed-off-by: Łukasz Bartosik <lb@semihalf.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211110215744.5487-1-lukasz.bartosik@semihalf.com
2021-11-22 20:38:46 +02:00
Mordechay Goodstein
5283dd677e iwlwifi: mvm: retry init flow if failed
In some very rare cases the init flow may fail.  In many cases, this is
recoverable, so we can retry.  Implement a loop to retry two more times
after the first attempt failed.

This can happen in two different situations, namely during probe and
during mac80211 start.  For the first case, a simple loop is enough.
For the second case, we need to add a flag to prevent mac80211 from
trying to restart it as well, leaving full control with the driver.

Cc: <stable@vger.kernel.org>
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20211110150132.57514296ecab.I52a0411774b700bdc7dedb124d8b59bf99456eb2@changeid
2021-11-22 20:38:17 +02:00
chongjiapeng
1b54403c9c iwlwifi: Fix missing error code in iwl_pci_probe()
The error code is missing in this code scenario, add the error code
'-EINVAL' to the return value 'ret'.

Eliminate the follow smatch warning:

drivers/net/wireless/intel/iwlwifi/pcie/drv.c:1376 iwl_pci_probe() warn:
missing error code 'ret'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Fixes: 1f171f4f14 ("iwlwifi: Add support for getting rf id with blank otp")
Signed-off-by: chongjiapeng <jiapeng.chong@linux.alibaba.com>
Acked-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1635838727-128735-1-git-send-email-jiapeng.chong@linux.alibaba.com
2021-11-22 20:37:25 +02:00
Arnd Bergmann
fe785f56ad iwlwifi: pcie: fix constant-conversion warning
Both gcc-11 and clang point out a potential issue with integer overflow when
the iwl_dev_info_table[] array is empty. This is what clang warns:

drivers/net/wireless/intel/iwlwifi/pcie/drv.c:1344:42: error: implicit conversion from 'unsigned long' to 'int' changes value from 18446744073709551615 to -1 [-Werror,-Wconstant-conversion]
        for (i = ARRAY_SIZE(iwl_dev_info_table) - 1; i >= 0; i--) {
               ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~

This is still harmless, as the loop correctly terminates, but adding
an extra range check makes that obvious to both readers and to the
compiler.

Fixes: 3f7320428f ("iwlwifi: pcie: simplify iwl_pci_find_dev_info()")
Reported-by: kernel test robot <lkp@intel.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211118142124.526901-1-arnd@kernel.org
2021-11-22 20:33:27 +02:00
Douglas Anderson
d03fcc1de0 drm/msm/dp: Avoid unpowered AUX xfers that caused crashes
If you happened to try to access `/dev/drm_dp_aux` devices provided by
the MSM DP AUX driver too early at bootup you could go boom. Let's
avoid that by only allowing AUX transfers when the controller is
powered up.

Specifically the crash that was seen (on Chrome OS 5.4 tree with
relevant backports):
  Kernel panic - not syncing: Asynchronous SError Interrupt
  CPU: 0 PID: 3131 Comm: fwupd Not tainted 5.4.144-16620-g28af11b73efb #1
  Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
  Call trace:
   dump_backtrace+0x0/0x14c
   show_stack+0x20/0x2c
   dump_stack+0xac/0x124
   panic+0x150/0x390
   nmi_panic+0x80/0x94
   arm64_serror_panic+0x78/0x84
   do_serror+0x0/0x118
   do_serror+0xa4/0x118
   el1_error+0xbc/0x160
   dp_catalog_aux_write_data+0x1c/0x3c
   dp_aux_cmd_fifo_tx+0xf0/0x1b0
   dp_aux_transfer+0x1b0/0x2bc
   drm_dp_dpcd_access+0x8c/0x11c
   drm_dp_dpcd_read+0x64/0x10c
   auxdev_read_iter+0xd4/0x1c4

I did a little bit of tracing and found that:
* We register the AUX device very early at bootup.
* Power isn't actually turned on for my system until
  hpd_event_thread() -> dp_display_host_init() -> dp_power_init()
* You can see that dp_power_init() calls dp_aux_init() which is where
  we start allowing AUX channel requests to go through.

In general this patch is a bit of a bandaid but at least it gets us
out of the current state where userspace acting at the wrong time can
fully crash the system.
* I think the more proper fix (which requires quite a bit more
  changes) is to power stuff on while an AUX transfer is
  happening. This is like the solution we did for ti-sn65dsi86. This
  might be required for us to move to populating the panel via the
  DP-AUX bus.
* Another fix considered was to dynamically register / unregister. I
  tried that at <https://crrev.com/c/3169431/3> but it got
  ugly. Currently there's a bug where the pm_runtime() state isn't
  tracked properly and that causes us to just keep registering more
  and more.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Link: https://lore.kernel.org/r/20211109100403.1.I4e23470d681f7efe37e2e7f1a6466e15e9bb1d72@changeid
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-22 10:16:05 -08:00
Philip Chen
cd92cc187c drm/msm/dsi: set default num_data_lanes
If "data_lanes" property of the dsi output endpoint is missing in
the DT, num_data_lanes would be 0 by default, which could cause
dsi_host_attach() to fail if dsi->lanes is set to a non-zero value
by the bridge driver.

According to the binding document of msm dsi controller, the
input/output endpoint of the controller is expected to have 4 lanes.
So let's set num_data_lanes to 4 by default.

Signed-off-by: Philip Chen <philipchen@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20211030100812.1.I6cd9af36b723fed277d34539d3b2ba4ca233ad2d@changeid
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-22 10:16:04 -08:00
Marta Plantykow
f65ee535df ice: avoid bpf_prog refcount underflow
Ice driver has the routines for managing XDP resources that are shared
between ndo_bpf op and VSI rebuild flow. The latter takes place for
example when user changes queue count on an interface via ethtool's
set_channels().

There is an issue around the bpf_prog refcounting when VSI is being
rebuilt - since ice_prepare_xdp_rings() is called with vsi->xdp_prog as
an argument that is used later on by ice_vsi_assign_bpf_prog(), same
bpf_prog pointers are swapped with each other. Then it is also
interpreted as an 'old_prog' which in turn causes us to call
bpf_prog_put on it that will decrement its refcount.

Below splat can be interpreted in a way that due to zero refcount of a
bpf_prog it is wiped out from the system while kernel still tries to
refer to it:

[  481.069429] BUG: unable to handle page fault for address: ffffc9000640f038
[  481.077390] #PF: supervisor read access in kernel mode
[  481.083335] #PF: error_code(0x0000) - not-present page
[  481.089276] PGD 100000067 P4D 100000067 PUD 1001cb067 PMD 106d2b067 PTE 0
[  481.097141] Oops: 0000 [#1] PREEMPT SMP PTI
[  481.101980] CPU: 12 PID: 3339 Comm: sudo Tainted: G           OE     5.15.0-rc5+ #1
[  481.110840] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016
[  481.122021] RIP: 0010:dev_xdp_prog_id+0x25/0x40
[  481.127265] Code: 80 00 00 00 00 0f 1f 44 00 00 89 f6 48 c1 e6 04 48 01 fe 48 8b 86 98 08 00 00 48 85 c0 74 13 48 8b 50 18 31 c0 48 85 d2 74 07 <48> 8b 42 38 8b 40 20 c3 48 8b 96 90 08 00 00 eb e8 66 2e 0f 1f 84
[  481.148991] RSP: 0018:ffffc90007b63868 EFLAGS: 00010286
[  481.155034] RAX: 0000000000000000 RBX: ffff889080824000 RCX: 0000000000000000
[  481.163278] RDX: ffffc9000640f000 RSI: ffff889080824010 RDI: ffff889080824000
[  481.171527] RBP: ffff888107af7d00 R08: 0000000000000000 R09: ffff88810db5f6e0
[  481.179776] R10: 0000000000000000 R11: ffff8890885b9988 R12: ffff88810db5f4bc
[  481.188026] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  481.196276] FS:  00007f5466d5bec0(0000) GS:ffff88903fb00000(0000) knlGS:0000000000000000
[  481.205633] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  481.212279] CR2: ffffc9000640f038 CR3: 000000014429c006 CR4: 00000000003706e0
[  481.220530] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  481.228771] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  481.237029] Call Trace:
[  481.239856]  rtnl_fill_ifinfo+0x768/0x12e0
[  481.244602]  rtnl_dump_ifinfo+0x525/0x650
[  481.249246]  ? __alloc_skb+0xa5/0x280
[  481.253484]  netlink_dump+0x168/0x3c0
[  481.257725]  netlink_recvmsg+0x21e/0x3e0
[  481.262263]  ____sys_recvmsg+0x87/0x170
[  481.266707]  ? __might_fault+0x20/0x30
[  481.271046]  ? _copy_from_user+0x66/0xa0
[  481.275591]  ? iovec_from_user+0xf6/0x1c0
[  481.280226]  ___sys_recvmsg+0x82/0x100
[  481.284566]  ? sock_sendmsg+0x5e/0x60
[  481.288791]  ? __sys_sendto+0xee/0x150
[  481.293129]  __sys_recvmsg+0x56/0xa0
[  481.297267]  do_syscall_64+0x3b/0xc0
[  481.301395]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  481.307238] RIP: 0033:0x7f5466f39617
[  481.311373] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bd 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2f 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[  481.342944] RSP: 002b:00007ffedc7f4308 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
[  481.361783] RAX: ffffffffffffffda RBX: 00007ffedc7f5460 RCX: 00007f5466f39617
[  481.380278] RDX: 0000000000000000 RSI: 00007ffedc7f5360 RDI: 0000000000000003
[  481.398500] RBP: 00007ffedc7f53f0 R08: 0000000000000000 R09: 000055d556f04d50
[  481.416463] R10: 0000000000000077 R11: 0000000000000246 R12: 00007ffedc7f5360
[  481.434131] R13: 00007ffedc7f5350 R14: 00007ffedc7f5344 R15: 0000000000000e98
[  481.451520] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mxm_wmi mei_me coretemp mei ipmi_si ipmi_msghandler wmi acpi_pad acpi_power_meter ip_tables x_tables autofs4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel ahci crypto_simd cryptd libahci lpc_ich [last unloaded: ice]
[  481.528558] CR2: ffffc9000640f038
[  481.542041] ---[ end trace d1f24c9ecf5b61c1 ]---

Fix this by only calling ice_vsi_assign_bpf_prog() inside
ice_prepare_xdp_rings() when current vsi->xdp_prog pointer is NULL.
This way set_channels() flow will not attempt to swap the vsi->xdp_prog
pointers with itself.

Also, sprinkle around some comments that provide a reasoning about
correlation between driver and kernel in terms of bpf_prog refcount.

Fixes: efc2214b60 ("ice: Add support for XDP")
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Marta Plantykow <marta.a.plantykow@intel.com>
Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-11-22 08:35:36 -08:00
Maciej Fijalkowski
792b208658 ice: fix vsi->txq_map sizing
The approach of having XDP queue per CPU regardless of user's setting
exposed a hidden bug that could occur in case when Rx queue count differ
from Tx queue count. Currently vsi->txq_map's size is equal to the
doubled vsi->alloc_txq, which is not correct due to the fact that XDP
rings were previously based on the Rx queue count. Below splat can be
seen when ethtool -L is used and XDP rings are configured:

[  682.875339] BUG: kernel NULL pointer dereference, address: 000000000000000f
[  682.883403] #PF: supervisor read access in kernel mode
[  682.889345] #PF: error_code(0x0000) - not-present page
[  682.895289] PGD 0 P4D 0
[  682.898218] Oops: 0000 [#1] PREEMPT SMP PTI
[  682.903055] CPU: 42 PID: 2878 Comm: ethtool Tainted: G           OE     5.15.0-rc5+ #1
[  682.912214] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016
[  682.923380] RIP: 0010:devres_remove+0x44/0x130
[  682.928527] Code: 49 89 f4 55 48 89 fd 4c 89 ff 53 48 83 ec 10 e8 92 b9 49 00 48 8b 9d a8 02 00 00 48 8d 8d a0 02 00 00 49 89 c2 48 39 cb 74 0f <4c> 3b 63 10 74 25 48 8b 5b 08 48 39 cb 75 f1 4c 89 ff 4c 89 d6 e8
[  682.950237] RSP: 0018:ffffc90006a679f0 EFLAGS: 00010002
[  682.956285] RAX: 0000000000000286 RBX: ffffffffffffffff RCX: ffff88908343a370
[  682.964538] RDX: 0000000000000001 RSI: ffffffff81690d60 RDI: 0000000000000000
[  682.972789] RBP: ffff88908343a0d0 R08: 0000000000000000 R09: 0000000000000000
[  682.981040] R10: 0000000000000286 R11: 3fffffffffffffff R12: ffffffff81690d60
[  682.989282] R13: ffffffff81690a00 R14: ffff8890819807a8 R15: ffff88908343a36c
[  682.997535] FS:  00007f08c7bfa740(0000) GS:ffff88a03fd00000(0000) knlGS:0000000000000000
[  683.006910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  683.013557] CR2: 000000000000000f CR3: 0000001080a66003 CR4: 00000000003706e0
[  683.021819] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  683.030075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  683.038336] Call Trace:
[  683.041167]  devm_kfree+0x33/0x50
[  683.045004]  ice_vsi_free_arrays+0x5e/0xc0 [ice]
[  683.050380]  ice_vsi_rebuild+0x4c8/0x750 [ice]
[  683.055543]  ice_vsi_recfg_qs+0x9a/0x110 [ice]
[  683.060697]  ice_set_channels+0x14f/0x290 [ice]
[  683.065962]  ethnl_set_channels+0x333/0x3f0
[  683.070807]  genl_family_rcv_msg_doit+0xea/0x150
[  683.076152]  genl_rcv_msg+0xde/0x1d0
[  683.080289]  ? channels_prepare_data+0x60/0x60
[  683.085432]  ? genl_get_cmd+0xd0/0xd0
[  683.089667]  netlink_rcv_skb+0x50/0xf0
[  683.094006]  genl_rcv+0x24/0x40
[  683.097638]  netlink_unicast+0x239/0x340
[  683.102177]  netlink_sendmsg+0x22e/0x470
[  683.106717]  sock_sendmsg+0x5e/0x60
[  683.110756]  __sys_sendto+0xee/0x150
[  683.114894]  ? handle_mm_fault+0xd0/0x2a0
[  683.119535]  ? do_user_addr_fault+0x1f3/0x690
[  683.134173]  __x64_sys_sendto+0x25/0x30
[  683.148231]  do_syscall_64+0x3b/0xc0
[  683.161992]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fix this by taking into account the value that num_possible_cpus()
yields in addition to vsi->alloc_txq instead of doubling the latter.

Fixes: efc2214b60 ("ice: Add support for XDP")
Fixes: 22bf877e52 ("ice: introduce XDP_TX fallback path")
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-11-22 08:33:54 -08:00
David S. Miller
03a000bfd7 Merge branch 'nh-group-refcnt'
Nikolay Aleksandrov says:

====================
net: nexthop: fix refcount issues when replacing groups

This set fixes a refcount bug when replacing nexthop groups and
modifying routes. It is complex because the objects look valid when
debugging memory dumps, but we end up having refcount dependency between
unlinked objects which can never be released, so in turn they cannot
free their resources and refcounts. The problem happens because we can
have stale IPv6 per-cpu dsts in nexthops which were removed from a
group. Even though the IPv6 gen is bumped, the dsts won't be released
until traffic passes through them or the nexthop is freed, that can take
arbitrarily long time, and even worse we can create a scenario[1] where it
can never be released. The fix is to release the IPv6 per-cpu dsts of
replaced nexthops after an RCU grace period so no new ones can be
created. To do that we add a new IPv6 stub - fib6_nh_release_dsts, which
is used by the nexthop code only when necessary. We can further optimize
group replacement, but that is more suited for net-next as these patches
would have to be backported to stable releases.

v2: patch 02: update commit msg
    patch 03: check for mausezahn before testing and make a few comments
              more verbose

[1]
This info is also present in patch 02's commit message.
Initial state:
 $ ip nexthop list
  id 200 via 2002:db8::2 dev bridge.10 scope link onlink
  id 201 via 2002:db8::3 dev bridge scope link onlink
  id 203 group 201/200
 $ ip -6 route
  2001:db8::10 nhid 203 metric 1024 pref medium
     nexthop via 2002:db8::3 dev bridge weight 1 onlink
     nexthop via 2002:db8::2 dev bridge.10 weight 1 onlink

Create rt6_info through one of the multipath legs, e.g.:
 $ taskset -a -c 1  ./pkt_inj 24 bridge.10 2001:db8::10
 (pkt_inj is just a custom packet generator, nothing special)

Then remove that leg from the group by replace (let's assume it is id
200 in this case):
 $ ip nexthop replace id 203 group 201

Now remove the IPv6 route:
 $ ip -6 route del 2001:db8::10/128

The route won't be really deleted due to the stale rt6_info holding 1
refcnt in nexthop id 200.
At this point we have the following reference count dependency:
 (deleted) IPv6 route holds 1 reference over nhid 203
 nh 203 holds 1 ref over id 201
 nh 200 holds 1 ref over the net device and the route due to the stale
 rt6_info

Now to create circular dependency between nh 200 and the IPv6 route, and
also to get a reference over nh 200, restore nhid 200 in the group:
 $ ip nexthop replace id 203 group 201/200

And now we have a permanent circular dependncy because nhid 203 holds a
reference over nh 200 and 201, but the route holds a ref over nh 203 and
is deleted.

To trigger the bug just delete the group (nhid 203):
 $ ip nexthop del id 203

It won't really be deleted due to the IPv6 route dependency, and now we
have 2 unlinked and deleted objects that reference each other: the group
and the IPv6 route. Since the group drops the reference it holds over its
entries at free time (i.e. its own refcount needs to drop to 0) that will
never happen and we get a permanent ref on them, since one of the entries
holds a reference over the IPv6 route it will also never be released.

At this point the dependencies are:
 (deleted, only unlinked) IPv6 route holds reference over group nh 203
 (deleted, only unlinked) group nh 203 holds reference over nh 201 and 200
 nh 200 holds 1 ref over the net device and the route due to the stale
 rt6_info

This is the last point where it can be fixed by running traffic through
nh 200, and specifically through the same CPU so the rt6_info (dst) will
get released due to the IPv6 genid, that in turn will free the IPv6
route, which in turn will free the ref count over the group nh 203.

If nh 200 is deleted at this point, it will never be released due to the
ref from the unlinked group 203, it will only be unlinked:
 $ ip nexthop del id 200
 $ ip nexthop
 $

Now we can never release that stale rt6_info, we have IPv6 route with ref
over group nh 203, group nh 203 with ref over nh 200 and 201, nh 200 with
rt6_info (dst) with ref over the net device and the IPv6 route. All of
these objects are only unlinked, and cannot be released, thus they can't
release their ref counts.

 Message from syslogd@dev at Nov 19 14:04:10 ...
  kernel:[73501.828730] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
 Message from syslogd@dev at Nov 19 14:04:20 ...
  kernel:[73512.068811] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3

====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22 15:44:49 +00:00
Nikolay Aleksandrov
02ebe49ab0 selftests: net: fib_nexthops: add test for group refcount imbalance bug
The new selftest runs a sequence which causes circular refcount
dependency between deleted objects which cannot be released and results
in a netdevice refcount imbalance.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22 15:44:49 +00:00
Nikolay Aleksandrov
1005f19b93 net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group
When replacing a nexthop group, we must release the IPv6 per-cpu dsts of
the removed nexthop entries after an RCU grace period because they
contain references to the nexthop's net device and to the fib6 info.
With specific series of events[1] we can reach net device refcount
imbalance which is unrecoverable. IPv4 is not affected because dsts
don't take a refcount on the route.

[1]
 $ ip nexthop list
  id 200 via 2002:db8::2 dev bridge.10 scope link onlink
  id 201 via 2002:db8::3 dev bridge scope link onlink
  id 203 group 201/200
 $ ip -6 route
  2001:db8::10 nhid 203 metric 1024 pref medium
     nexthop via 2002:db8::3 dev bridge weight 1 onlink
     nexthop via 2002:db8::2 dev bridge.10 weight 1 onlink

Create rt6_info through one of the multipath legs, e.g.:
 $ taskset -a -c 1  ./pkt_inj 24 bridge.10 2001:db8::10
 (pkt_inj is just a custom packet generator, nothing special)

Then remove that leg from the group by replace (let's assume it is id
200 in this case):
 $ ip nexthop replace id 203 group 201

Now remove the IPv6 route:
 $ ip -6 route del 2001:db8::10/128

The route won't be really deleted due to the stale rt6_info holding 1
refcnt in nexthop id 200.
At this point we have the following reference count dependency:
 (deleted) IPv6 route holds 1 reference over nhid 203
 nh 203 holds 1 ref over id 201
 nh 200 holds 1 ref over the net device and the route due to the stale
 rt6_info

Now to create circular dependency between nh 200 and the IPv6 route, and
also to get a reference over nh 200, restore nhid 200 in the group:
 $ ip nexthop replace id 203 group 201/200

And now we have a permanent circular dependncy because nhid 203 holds a
reference over nh 200 and 201, but the route holds a ref over nh 203 and
is deleted.

To trigger the bug just delete the group (nhid 203):
 $ ip nexthop del id 203

It won't really be deleted due to the IPv6 route dependency, and now we
have 2 unlinked and deleted objects that reference each other: the group
and the IPv6 route. Since the group drops the reference it holds over its
entries at free time (i.e. its own refcount needs to drop to 0) that will
never happen and we get a permanent ref on them, since one of the entries
holds a reference over the IPv6 route it will also never be released.

At this point the dependencies are:
 (deleted, only unlinked) IPv6 route holds reference over group nh 203
 (deleted, only unlinked) group nh 203 holds reference over nh 201 and 200
 nh 200 holds 1 ref over the net device and the route due to the stale
 rt6_info

This is the last point where it can be fixed by running traffic through
nh 200, and specifically through the same CPU so the rt6_info (dst) will
get released due to the IPv6 genid, that in turn will free the IPv6
route, which in turn will free the ref count over the group nh 203.

If nh 200 is deleted at this point, it will never be released due to the
ref from the unlinked group 203, it will only be unlinked:
 $ ip nexthop del id 200
 $ ip nexthop
 $

Now we can never release that stale rt6_info, we have IPv6 route with ref
over group nh 203, group nh 203 with ref over nh 200 and 201, nh 200 with
rt6_info (dst) with ref over the net device and the IPv6 route. All of
these objects are only unlinked, and cannot be released, thus they can't
release their ref counts.

 Message from syslogd@dev at Nov 19 14:04:10 ...
  kernel:[73501.828730] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
 Message from syslogd@dev at Nov 19 14:04:20 ...
  kernel:[73512.068811] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3

Fixes: 7bf4796dd0 ("nexthops: add support for replace")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22 15:44:49 +00:00
Nikolay Aleksandrov
8837cbbf85 net: ipv6: add fib6_nh_release_dsts stub
We need a way to release a fib6_nh's per-cpu dsts when replacing
nexthops otherwise we can end up with stale per-cpu dsts which hold net
device references, so add a new IPv6 stub called fib6_nh_release_dsts.
It must be used after an RCU grace period, so no new dsts can be created
through a group's nexthop entry.
Similar to fib6_nh_release it shouldn't be used if fib6_nh_init has failed
so it doesn't need a dummy stub when IPv6 is not enabled.

Fixes: 7bf4796dd0 ("nexthops: add support for replace")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22 15:44:49 +00:00
Nicolas Frattaroli
428ee30a05 ASoC: rk817: Add module alias for rk817-codec
Without a module alias, autoloading the driver does not occurr
when it is built as a module.

By adding a module alias, the driver now probes fine automatically
and therefore analog audio output works as it should.

Fixes: 0d6a04da9b ("ASoC: Add Rockchip rk817 audio CODEC support")
Signed-off-by: Nicolas Frattaroli <frattaroli.nicolas@gmail.com>
Link: https://lore.kernel.org/r/20211121150521.159543-1-frattaroli.nicolas@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-11-22 15:40:03 +00:00
Hans de Goede
28c916ade1 ASoC: soc-acpi: Set mach->id field on comp_ids matches
Commit dac7cbd55d ("ASoC: Intel: soc-acpi-byt: shrink tables using
compatible IDs") and commit 959ae8215a ("ASoC: Intel: soc-acpi-cht:
shrink tables using compatible IDs") simplified the match tables in
soc-acpi-intel-byt-match.c and soc-acpi-intel-cht-match.c by merging
identical entries using the new .comp_ids snd_soc_acpi_mach field to
point a single entry to multiple ACPI HIDs and clearing the previously
unique per entry .id field.

But various machine drivers from sound/soc/intel/boards rely on mach->id
in one or more ways, e.g. some drivers contain the following snippets:

	adev = acpi_dev_get_first_match_dev(mach->id, NULL, -1);

	pkg_found = snd_soc_acpi_find_package_from_hid(mach->id, ...

	if (!strncmp(snd_soc_cards[i].codec_id, mach->id, 8)) { ...

All of which are broken by the match table shrinking.

Make the snd_soc_acpi_mach.id field non const (the storage for the tables
already is non const) and on a comps_ids match copy the matching HID to
the id field to fix this.

Fixes: dac7cbd55d ("ASoC: Intel: soc-acpi-byt: shrink tables using compatible IDs")
Fixes: 959ae8215a ("ASoC: Intel: soc-acpi-cht: shrink tables using compatible IDs")
Suggested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: Brent Lu <brent.lu@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20211118153014.349222-1-hdegoede@redhat.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2021-11-22 15:40:01 +00:00