Add the option to flush IBPB only on VMEXIT in order to protect from
malicious guests but one otherwise trusts the software that runs on the
hypervisor.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Add the option to mitigate using IBPB on a kernel entry. Pull in the
Retbleed alternative so that the IBPB call from there can be used. Also,
if Retbleed mitigation is done using IBPB, the same mitigation can and
must be used here.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Add support for the synthetic CPUID flag which "if this bit is 1,
it indicates that MSR 49h (PRED_CMD) bit 0 (IBPB) flushes all branch
type predictions from the CPU branch predictor."
This flag is there so that this capability in guests can be detected
easily (otherwise one would have to track microcode revisions which is
impossible for guests).
It is also needed only for Zen3 and -4. The other two (Zen1 and -2)
always flush branch type predictions by default.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Add a mitigation for the speculative return address stack overflow
vulnerability found on AMD processors.
The mitigation works by ensuring all RET instructions speculate to
a controlled location, similar to how speculation is controlled in the
retpoline sequence. To accomplish this, the __x86_return_thunk forces
the CPU to mispredict every function return using a 'safe return'
sequence.
To ensure the safety of this mitigation, the kernel must ensure that the
safe return sequence is itself free from attacker interference. In Zen3
and Zen4, this is accomplished by creating a BTB alias between the
untraining function srso_untrain_ret_alias() and the safe return
function srso_safe_ret_alias() which results in evicting a potentially
poisoned BTB entry and using that safe one for all function returns.
In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Hardware based on the Bay Trail / BYT SoCs require an external ULPI phy for
USB device-mode. The phy chip usually has its 'reset' and 'chip select'
lines connected to GPIOs described by ACPI fwnodes in the DSDT table.
Because of hardware with missing ACPI resources for the 'reset' and 'chip
select' GPIOs commit 5741022cbd ("usb: dwc3: pci: Add GPIO lookup table
on platforms without ACPI GPIO resources") introduced a fallback
gpiod_lookup_table with hard-coded mappings for Bay Trail devices.
However there are existing Bay Trail based devices, like the National
Instruments cRIO-903x series, where the phy chip has its 'reset' and
'chip-select' lines always asserted in hardware via resistor pull-ups. On
this hardware the phy chip is always enabled and the ACPI dsdt table is
missing information not only for the 'chip-select' and 'reset' lines but
also for the BYT GPIO controller itself "INT33FC".
With the introduction of the gpiod_lookup_table initializing the USB
device-mode on these hardware now errors out. The error comes from the
gpiod_get_optional() calls in dwc3_pci_quirks() which will now return an
-ENOENT error due to the missing ACPI entry for the INT33FC gpio controller
used in the aforementioned table.
This hardware used to work before because gpiod_get_optional() will return
NULL instead of -ENOENT if no GPIO has been assigned to the requested
function. The dwc3_pci_quirks() code for setting the 'cs' and 'reset' GPIOs
was then skipped (due to the NULL return). This is the correct behavior in
cases where the phy chip is hardwired and there are no GPIOs to control.
Since the gpiod_lookup_table relies on the presence of INT33FC fwnode
in ACPI tables only add the table if we know the entry for the INT33FC
gpio controller is present. This allows Bay Trail based devices with
hardwired dwc3 ULPI phys to continue working.
Fixes: 5741022cbd ("usb: dwc3: pci: Add GPIO lookup table on platforms without ACPI GPIO resources")
Cc: stable <stable@kernel.org>
Signed-off-by: Gratian Crisan <gratian.crisan@ni.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230726184555.218091-2-gratian.crisan@ni.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a grant entry is still in use by the remote domain, Linux must put
it on a deferred list. Normally, this list is very short, because
the PV network and block protocols expect the backend to unmap the grant
first. However, Qubes OS's GUI protocol is subject to the constraints
of the X Window System, and as such winds up with the frontend unmapping
the window first. As a result, the list can grow very large, resulting
in a massive memory leak and eventual VM freeze.
To partially solve this problem, make the number of entries that the VM
will attempt to free at each iteration tunable. The default is still
10, but it can be overridden via a module parameter.
This is Cc: stable because (when combined with appropriate userspace
changes) it fixes a severe performance and stability problem for Qubes
OS users.
Cc: stable@vger.kernel.org
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230726165354.1252-1-demi@invisiblethingslab.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Florian Westphal says:
====================
netfilter fixes for net
1. On-demand overlap detection in 'rbtree' set can cause memory leaks.
This is broken since 6.2.
2. An earlier fix in 6.4 to address an imbalance in refcounts during
transaction error unwinding was incomplete, from Pablo Neira.
3. Disallow adding a rule to a deleted chain, also from Pablo.
Broken since 5.9.
* tag 'nf-23-07-26' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR
netfilter: nft_set_rbtree: fix overlap expiration walk
====================
Link: https://lore.kernel.org/r/20230726152524.26268-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The nla_for_each_nested parsing in function mqprio_parse_nlattr() does
not check the length of the nested attribute. This can lead to an
out-of-attribute read and allow a malformed nlattr (e.g., length 0) to
be viewed as 8 byte integer and passed to priv->max_rate/min_rate.
This patch adds the check based on nla_len() when check the nla_type(),
which ensures that the length of these two attribute must equals
sizeof(u64).
Fixes: 4e8b86c062 ("mqprio: Introduce new hardware offload mode and shaper in mqprio")
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230725024227.426561-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle says:
====================
net: ethernet: mtk_eth_soc: add basic support for MT7988 SoC
The MediaTek MT7988 SoC introduces a new version (3) of the NETSYS
block and comes with three instead of two MACs.
The first MAC can be internally connected to a built-in Gigabit
Ethernet switch with four 1000M/100M/10M twisted pair user ports.
The second MAC can be internally connected to a built-in 2500Base-T
Ethernet PHY.
There are two SerDes units which can be operated in USXGMII, 10GBase-(K)R,
5GBase-R, 2500Base-X, 1000Base-X or SGMII interface mode.
This series adds initial support for NETSYS v3 and the first MAC of the
MT7988 SoC connecting the built-in DSA switch.
The switch is supported since commit 110c18bfed ("net: dsa: mt7530:
introduce driver for MT7988 built-in switch").
Basic support for the 1000M/100M/10M built-in PHYs connected to the
switch ports is present since commit ("98c485eaf509b net: phy: add
driver for MediaTek SoC built-in GE PHYs").
====================
Link: https://lore.kernel.org/r/cover.1690246066.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce DT bindings for the MT7988 SoC to mediatek,net.yaml.
The MT7988 SoC got 3 Ethernet MACs operating at a maximum of
10 Gigabit/sec supported by 2 packet processor engines for
offloading tasks.
The first MAC is hard-wired to a built-in switch which exposes
four 1000Base-T PHYs as user ports.
It also comes with built-in 2500Base-T PHY which can be used
with the 2nd GMAC.
The 2nd and 3rd GMAC can be connected to external PHYs or provide
SFP(+) cages attached via SGMII, 1000Base-X, 2500Base-X, USXGMII,
5GBase-KR or 10GBase-KR.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/3c83d2c0d629dac064ec4396132538c52e77a57f.1690246066.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230724211905.805665-1-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20230724211859.805481-1-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata says:
====================
mlxsw: Speed up transceiver module EEPROM dump
Ido Schimmel writes:
Old firmware versions could only read up to 48 bytes from a transceiver
module's EEPROM in one go. Newer versions can read up to 128 bytes,
resulting in fewer transactions.
Query support for the new capability during driver initialization and if
supported, read up to 128 bytes in one go.
This is going to be especially useful for upcoming transceiver module
firmware flashing support.
Before:
# perf stat -e devlink:devlink_hwmsg -- ethtool -m swp11 page 0x1 offset 128 length 128 i2c 0x50
[...]
Performance counter stats for 'ethtool -m swp11 page 0x1 offset 128 length 128 i2c 0x50':
3 devlink:devlink_hwmsg
After:
# perf stat -e devlink:devlink_hwmsg -- ethtool -m swp11 page 0x1 offset 128 length 128 i2c 0x50
[...]
Performance counter stats for 'ethtool -m swp11 page 0x1 offset 128 length 128 i2c 0x50':
1 devlink:devlink_hwmsg
Patches #1-#4 are preparations / cleanups.
Patch #5 adds support for the new read size.
====================
Link: https://lore.kernel.org/r/cover.1690281940.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Old firmware versions could only read up to 48 bytes from a transceiver
module's EEPROM in one go. Newer versions can read up to 128 bytes,
resulting in fewer transactions.
Query support for the new capability during driver initialization and if
supported, read up to 128 bytes in one go.
This is going to be especially useful for upcoming transceiver module
firmware flashing support.
Before:
# perf stat -e devlink:devlink_hwmsg -- ethtool -m swp11 page 0x1 offset 128 length 128 i2c 0x50
[...]
Performance counter stats for 'ethtool -m swp11 page 0x1 offset 128 length 128 i2c 0x50':
3 devlink:devlink_hwmsg
After:
# perf stat -e devlink:devlink_hwmsg -- ethtool -m swp11 page 0x1 offset 128 length 128 i2c 0x50
[...]
Performance counter stats for 'ethtool -m swp11 page 0x1 offset 128 length 128 i2c 0x50':
1 devlink:devlink_hwmsg
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/99d1618e8cd5acefb2f795dfde1a5b41caa07dcb.1690281940.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MCAM register reports the device supported management features. Querying
this register exposes if features are supported with the current
firmware version in the current ASIC. Then, the driver can separate
between different implementations dynamically.
MCAM register supports querying whether the MCIA register supports 128
bytes payloads or only 48 bytes. Add support for the register as
preparation for allowing larger MCIA transactions.
Note that the access to the bits in the field 'mng_feature_cap_mask' is
not same to other mask fields in other registers. In most of the cases
bit #0 is the first one in the last dword, in MCAM register, bits #0-#31
are in the first dword and so on. Declare the mask field using bits
arrays per dword to simplify the access.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/1427a3f57ba93db1c5dd4f982bfb31dd5c82356e.1690281940.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The array 'mlxsw_reg_infos' is ordered by registers' IDs. The ID of MPSC
register is 0x9080, so it should be after MCDA (register ID 0x9063) and
not after MTUTC (register ID 0x9055). Note that the register's fields are
defined in the correct place in the file, only the definition in
'mlxsw_reg_infos' is wrong. This issue was found while adding new
register which supposed to be before mpsc.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/c5e270cd5769f301fe81235622215143506e1b48.1690281940.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mat Martineau says:
====================
mptcp: More fixes for 6.5
Patch 1: Better detection of ip6tables vs ip6tables-legacy tools for
self tests. Fix for 6.4 and newer.
Patch 2: Only generate "new listener" event if listen operation
succeeds. Fix for 6.2 and newer.
====================
Link: https://lore.kernel.org/r/20230725-send-net-20230725-v1-0-6f60fe7137a9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The offending patch is based on the assumption that for PFs,
mlx5_get_dev_index() is the same as vhca_id. However, this assumption
is wrong in case of DPU (ECPF).
Fix it by using vhca_id directly, and switch the array of peers to
xarray.
Fixes: 6d5b7321d8 ("net/mlx5: DR, handle more than one peer domain")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit sets ft prio to fs_base_prio. But if
ignore_flow_level it not supported, ft prio must be set based on
tc filter prio. Otherwise, all the ft prio are the same on the same
chain. It is invalid if ignore_flow_level is not supported.
Fix it by setting ft prio based on tc filter prio and setting
fs_base_prio to 0 for fdb.
Fixes: 8e80e56480 ("net/mlx5: fs_chains: Refactor to detach chains from tc usage")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>