net/mlx5: Always drain health in shutdown callback

[ Upstream commit 1b75da22ed1e6171e261bc9265370162553d5393 ]

There is no point in recovery during device shutdown. if health
work started need to wait for it to avoid races and NULL pointer
access.

Hence, drain health WQ on shutdown callback.

Fixes: 1958fc2f07 ("net/mlx5: SF, Add auxiliary device driver")
Fixes: d2aa060d40 ("net/mlx5: Cancel health poll before sending panic teardown command")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/20240730061638.1831002-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[Xiangyu: Modified to apply on 6.1.y to fix CVE-2024-43866]
Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This commit is contained in:
Shay Drory
2024-10-10 15:01:07 +08:00
committed by Greg Kroah-Hartman
parent af017667ad
commit 5005e2e159
2 changed files with 2 additions and 1 deletions

View File

@@ -1950,7 +1950,6 @@ static int mlx5_try_fast_unload(struct mlx5_core_dev *dev)
/* Panic tear down fw command will stop the PCI bus communication
* with the HCA, so the health poll is no longer needed.
*/
mlx5_drain_health_wq(dev);
mlx5_stop_health_poll(dev, false);
ret = mlx5_cmd_fast_teardown_hca(dev);
@@ -1985,6 +1984,7 @@ static void shutdown(struct pci_dev *pdev)
mlx5_core_info(dev, "Shutdown was called\n");
set_bit(MLX5_BREAK_FW_WAIT, &dev->intf_state);
mlx5_drain_health_wq(dev);
err = mlx5_try_fast_unload(dev);
if (err)
mlx5_unload_one(dev, false);

View File

@@ -75,6 +75,7 @@ static void mlx5_sf_dev_shutdown(struct auxiliary_device *adev)
{
struct mlx5_sf_dev *sf_dev = container_of(adev, struct mlx5_sf_dev, adev);
mlx5_drain_health_wq(sf_dev->mdev);
mlx5_unload_one(sf_dev->mdev, false);
}