Commit Graph

198010 Commits

Author SHA1 Message Date
Linus Torvalds
05c6ca8512 Merge tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:

 - Make RESERVE_BRK() work again with older binutils. The recent
   'simplification' broke that.

 - Make early #VE handling increment RIP when successful.

 - Make the #VE code consistent vs. the RIP adjustments and add
   comments.

 - Handle load_unaligned_zeropad() across page boundaries correctly in
   #VE when the second page is shared.

* tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
  x86/tdx: Clarify RIP adjustments in #VE handler
  x86/tdx: Fix early #VE handling
  x86/mm: Fix RESERVE_BRK() for older binutils
2022-06-19 09:58:28 -05:00
Linus Torvalds
5d770f11a1 Merge tag 'objtool-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull build tooling updates from Thomas Gleixner:

 - Remove obsolete CONFIG_X86_SMAP reference from objtool

 - Fix overlapping text section failures in faddr2line for real

 - Remove OBJECT_FILES_NON_STANDARD usage from x86 ftrace and replace it
   with finegrained annotations so objtool can validate that code
   correctly.

* tag 'objtool-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ftrace: Remove OBJECT_FILES_NON_STANDARD usage
  faddr2line: Fix overlapping text section failures, the sequel
  objtool: Fix obsolete reference to CONFIG_X86_SMAP
2022-06-19 09:54:16 -05:00
Kirill A. Shutemov
1e7769653b x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
load_unaligned_zeropad() can lead to unwanted loads across page boundaries.
The unwanted loads are typically harmless. But, they might be made to
totally unrelated or even unmapped memory. load_unaligned_zeropad()
relies on exception fixup (#PF, #GP and now #VE) to recover from these
unwanted loads.

In TDX guests, the second page can be shared page and a VMM may configure
it to trigger #VE.

The kernel assumes that #VE on a shared page is an MMIO access and tries to
decode instruction to handle it. In case of load_unaligned_zeropad() it
may result in confusion as it is not MMIO access.

Fix it by detecting split page MMIO accesses and failing them.
load_unaligned_zeropad() will recover using exception fixups.

The issue was discovered by analysis and reproduced artificially. It was
not triggered during testing.

[ dhansen: fix up changelogs and comments for grammar and clarity,
	   plus incorporate Kirill's off-by-one fix]

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220614120135.14812-4-kirill.shutemov@linux.intel.com
2022-06-17 15:37:33 -07:00
Linus Torvalds
32efdbffff Merge tag 'pci-v5.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fix from Bjorn Helgaas:
 "Revert clipping of PCI host bridge windows to avoid E820 regions,
  which broke several machines by forcing unnecessary BAR reassignments
  (Hans de Goede)"

* tag 'pci-v5.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"
2022-06-17 15:12:20 -05:00
Hans de Goede
a2b36ffbf5 x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"
This reverts commit 4c5e242d3e.

Prior to 4c5e242d3e ("x86/PCI: Clip only host bridge windows for E820
regions"), E820 regions did not affect PCI host bridge windows.  We only
looked at E820 regions and avoided them when allocating new MMIO space.
If firmware PCI bridge window and BAR assignments used E820 regions, we
left them alone.

After 4c5e242d3e, we removed E820 regions from the PCI host bridge
windows before looking at BARs, so firmware assignments in E820 regions
looked like errors, and we moved things around to fit in the space left
(if any) after removing the E820 regions.  This unnecessary BAR
reassignment broke several machines.

Guilherme reported that Steam Deck fails to boot after 4c5e242d3e.  We
clipped the window that contained most 32-bit BARs:

  BIOS-e820: [mem 0x00000000a0000000-0x00000000a00fffff] reserved
  acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] for e820 entry [mem 0xa0000000-0xa00fffff]

which forced us to reassign all those BARs, for example, this NVMe BAR:

  pci 0000:00:01.2: PCI bridge to [bus 01]
  pci 0000:00:01.2:   bridge window [mem 0x80600000-0x806fffff]
  pci 0000:01:00.0: BAR 0: [mem 0x80600000-0x80603fff 64bit]
  pci 0000:00:01.2: can't claim window [mem 0x80600000-0x806fffff]: no compatible bridge window
  pci 0000:01:00.0: can't claim BAR 0 [mem 0x80600000-0x80603fff 64bit]: no compatible bridge window

  pci 0000:00:01.2: bridge window: assigned [mem 0xa0100000-0xa01fffff]
  pci 0000:01:00.0: BAR 0: assigned [mem 0xa0100000-0xa0103fff 64bit]

All the reassignments were successful, so the devices should have been
functional at the new addresses, but some were not.

Andy reported a similar failure on an Intel MID platform.  Benjamin
reported a similar failure on a VMWare Fusion VM.

Note: this is not a clean revert; this revert keeps the later change to
make the clipping dependent on a new pci_use_e820 bool, moving the checking
of this bool to arch_remove_reservations().

[bhelgaas: commit log, add more reporters and testers]
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109
Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reported-by: Benjamin Coddington <bcodding@redhat.com>
Reported-by: Jongman Heo <jongman.heo@gmail.com>
Fixes: 4c5e242d3e ("x86/PCI: Clip only host bridge windows for E820 regions")
Link: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-06-17 14:24:14 -05:00
Linus Torvalds
ef06e68290 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:

 - Revert the moving of the jump labels initialisation before
   setup_machine_fdt(). The bug was fixed in drivers/char/random.c.

 - Ftrace fixes: branch range check and consistent handling of PLTs.

 - Clean rather than invalidate FROM_DEVICE buffers at start of DMA
   transfer (safer if such buffer is mapped in user space). A cache
   invalidation is done already at the end of the transfer.

 - A couple of clean-ups (unexport symbol, remove unused label).

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer
  arm64/cpufeature: Unexport set_cpu_feature()
  arm64: ftrace: remove redundant label
  arm64: ftrace: consistently handle PLTs.
  arm64: ftrace: fix branch range checks
  Revert "arm64: Initialize jump labels before setup_machine_fdt()"
2022-06-17 13:55:19 -05:00
Linus Torvalds
cc2fb31d49 Merge tag 'loongarch-fixes-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
 "Add missing ELF_DETAILS in vmlinux.lds.S and fix document rendering"

* tag 'loongarch-fixes-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  docs/zh_CN/LoongArch: Fix notes rendering by using reST directives
  docs/LoongArch: Fix notes rendering by using reST directives
  LoongArch: vmlinux.lds.S: Add missing ELF_DETAILS
2022-06-17 13:50:24 -05:00
Linus Torvalds
f10516322d Merge tag 'riscv-for-linus-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:

 - A fix for the PolarFire SOC's device tree

 - A handful of fixes for the recently added Svpmbt support

 - An improvement to the Kconfig text for Svpbmt

* tag 'riscv-for-linus-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Improve description for RISCV_ISA_SVPBMT Kconfig symbol
  riscv: drop cpufeature_apply_feature tracking variable
  riscv: fix dependency for t-head errata
  riscv: dts: microchip: re-add pdma to mpfs device tree
2022-06-17 13:45:47 -05:00
Linus Torvalds
2d806a688f Merge tag 'hyperv-fixes-signed-20220617' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:

 - Fix hv_init_clocksource annotation (Masahiro Yamada)

 - Two bug fixes for vmbus driver (Saurabh Sengar)

 - Fix SEV negotiation (Tianyu Lan)

 - Fix comments in code (Xiang Wang)

 - One minor fix to HID driver (Michael Kelley)

* tag 'hyperv-fixes-signed-20220617' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM
  Drivers: hv: vmbus: Release cpu lock in error case
  HID: hyperv: Correctly access fields declared as __le16
  clocksource: hyper-v: unexport __init-annotated hv_init_clocksource()
  Drivers: hv: Fix syntax errors in comments
  Drivers: hv: vmbus: Don't assign VMbus channel interrupts to isolated CPUs
2022-06-17 13:39:12 -05:00
Will Deacon
c50f11c619 arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer
Invalidating the buffer memory in arch_sync_dma_for_device() for
FROM_DEVICE transfers

When using the streaming DMA API to map a buffer prior to inbound
non-coherent DMA (i.e. DMA_FROM_DEVICE), we invalidate any dirty CPU
cachelines so that they will not be written back during the transfer and
corrupt the buffer contents written by the DMA. This, however, poses two
potential problems:

  (1) If the DMA transfer does not write to every byte in the buffer,
      then the unwritten bytes will contain stale data once the transfer
      has completed.

  (2) If the buffer has a virtual alias in userspace, then stale data
      may be visible via this alias during the period between performing
      the cache invalidation and the DMA writes landing in memory.

Address both of these issues by cleaning (aka writing-back) the dirty
lines in arch_sync_dma_for_device(DMA_FROM_DEVICE) instead of discarding
them using invalidation.

Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220606152150.GA31568@willie-the-truck
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220610151228.4562-2-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-17 19:06:06 +01:00
Youling Tang
b672332ef9 LoongArch: vmlinux.lds.S: Add missing ELF_DETAILS
Commit c604abc3f6 ("vmlinux.lds.h: Split ELF_DETAILS from STABS_DEBUG")
splits ELF_DETAILS from STABS_DEBUG, resulting in missing ELF_DETAILS
information in LoongArch architecture, so add it.

Fixes: c604abc3f6 ("vmlinux.lds.h: Split ELF_DETAILS from STABS_DEBUG")
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-06-17 22:09:05 +08:00
Palmer Dabbelt
c836d9d17a RISC-V: Some Svpbmt fixes
Some additionals comments and notes from autobuilders received after the
series got applied, warranted some changes.

* commit '924cbb8cbe3460ea192e6243017ceb0ceb255b1b':
  riscv: Improve description for RISCV_ISA_SVPBMT Kconfig symbol
  riscv: drop cpufeature_apply_feature tracking variable
  riscv: fix dependency for t-head errata
2022-06-16 15:48:39 -07:00
Heiko Stuebner
924cbb8cbe riscv: Improve description for RISCV_ISA_SVPBMT Kconfig symbol
This improves the symbol's description to make it easier for
people to understand what it is about.

Suggested-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20220526205646.258337-3-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-06-16 15:47:39 -07:00
Heiko Stuebner
237c0ee474 riscv: drop cpufeature_apply_feature tracking variable
The variable was tracking which feature patches got applied
but that information was never actually used - and thus resulted
in a warning as well.

Drop the variable.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20220526205646.258337-2-heiko@sntech.de
Fixes: ff689fd21c ("riscv: add RISC-V Svpbmt extension support")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-06-16 15:47:31 -07:00
Heiko Stuebner
21f356f990 riscv: fix dependency for t-head errata
alternatives only work correctly on non-xip-kernels and while the
selected alternative-symbol has the correct dependency the symbol
selecting it also needs that dependency.

So add the missing dependency to the T-Head errata Kconfig symbol.

Reported-by: kernel test robot <yujie.liu@intel.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20220526205646.258337-5-heiko@sntech.de
Fixes: a35707c3d8 ("riscv: add memory-type errata for T-Head")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-06-16 15:42:55 -07:00
Linus Torvalds
48a23ec6ff Merge tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Mostly driver fixes.

  Current release - regressions:

   - Revert "net: Add a second bind table hashed by port and address",
     needs more work

   - amd-xgbe: use platform_irq_count(), static setup of IRQ resources
     had been removed from DT core

   - dts: at91: ksz9477_evb: add phy-mode to fix port/phy validation

  Current release - new code bugs:

   - hns3: modify the ring param print info

  Previous releases - always broken:

   - axienet: make the 64b addressable DMA depends on 64b architectures

   - iavf: fix issue with MAC address of VF shown as zero

   - ice: fix PTP TX timestamp offset calculation

   - usb: ax88179_178a needs FLAG_SEND_ZLP

  Misc:

   - document some net.sctp.* sysctls"

* tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
  net: axienet: add missing error return code in axienet_probe()
  Revert "net: Add a second bind table hashed by port and address"
  net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg
  net: usb: ax88179_178a needs FLAG_SEND_ZLP
  MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS
  ARM: dts: at91: ksz9477_evb: fix port/phy validation
  net: bgmac: Fix an erroneous kfree() in bgmac_remove()
  ice: Fix memory corruption in VF driver
  ice: Fix queue config fail handling
  ice: Sync VLAN filtering features for DVM
  ice: Fix PTP TX timestamp offset calculation
  mlxsw: spectrum_cnt: Reorder counter pools
  docs: networking: phy: Fix a typo
  amd-xgbe: Use platform_irq_count()
  octeontx2-vf: Add support for adaptive interrupt coalescing
  xilinx:  Fix build on x86.
  net: axienet: Use iowrite64 to write all 64b descriptor pointers
  net: axienet: make the 64b addresable DMA depends on 64b archectures
  net: hns3: fix tm port shapping of fibre port is incorrect after driver initialization
  net: hns3: fix PF rss size initialization bug
  ...
2022-06-16 11:51:32 -07:00
Mark Brown
3f77a1d057 arm64/cpufeature: Unexport set_cpu_feature()
We currently export set_cpu_feature() to modules but there are no in tree
users that can be built as modules and it is hard to see cases where it
would make sense for there to be any such users. Remove the export to avoid
anyone else having to worry about why it is there and ensure that any users
that do get added get a bit more visiblity.

Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220615191504.626604-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-16 18:42:26 +01:00
Tianyu Lan
49d6a3c062 x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM
Hyper-V Isolation VM current code uses sev_es_ghcb_hv_call()
to read/write MSR via GHCB page and depends on the sev code.
This may cause regression when sev code changes interface
design.

The latest SEV-ES code requires to negotiate GHCB version before
reading/writing MSR via GHCB page and sev_es_ghcb_hv_call() doesn't
work for Hyper-V Isolation VM. Add Hyper-V ghcb related implementation
to decouple SEV and Hyper-V code. Negotiate GHCB version in the
hyperv_init() and use the version to communicate with Hyper-V
in the ghcb hv call function.

Fixes: 2ea29c5abb ("x86/sev: Save the negotiated GHCB version")
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220614014553.1915929-1-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-06-15 18:27:40 +00:00
Kirill A. Shutemov
cdd85786f4 x86/tdx: Clarify RIP adjustments in #VE handler
After successful #VE handling, tdx_handle_virt_exception() has to move
RIP to the next instruction. The handler needs to know the length of the
instruction.

If the #VE happened due to instruction execution, the GET_VEINFO TDX
module call provides info on the instruction in R10, including its length.

For #VE due to EPT violation, the info in R10 is not populand and the
kernel must decode the instruction manually to find out its length.

Restructure the code to make it explicit that the instruction length
depends on the type of #VE. Make individual #VE handlers return
the instruction length on success or -errno on failure.

[ dhansen: fix up changelog and comments ]

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220614120135.14812-3-kirill.shutemov@linux.intel.com
2022-06-15 11:05:16 -07:00
Kirill A. Shutemov
60428d8bc2 x86/tdx: Fix early #VE handling
tdx_early_handle_ve() does not increment RIP after successfully
handling the exception.  That leads to infinite loop of exceptions.

Move RIP when exceptions are successfully handled.

[ dhansen: make problem statement more clear ]

Fixes: 32e72854fa ("x86/tdx: Port I/O: Add early boot support")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lkml.kernel.org/r/20220614120135.14812-2-kirill.shutemov@linux.intel.com
2022-06-15 10:52:59 -07:00
Mark Rutland
0d8116ccd8 arm64: ftrace: remove redundant label
Since commit:

  c4a0ebf87c ("arm64/ftrace: Make function graph use ftrace directly")

The 'ftrace_common_return' label has been unused.

Remove it.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: "Ivan T. Ivanov" <iivanov@suse.de>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220614080944.1349146-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-15 16:14:47 +01:00
Mark Rutland
a625357997 arm64: ftrace: consistently handle PLTs.
Sometimes it is necessary to use a PLT entry to call an ftrace
trampoline. This is handled by ftrace_make_call() and ftrace_make_nop(),
with each having *almost* identical logic, but this is not handled by
ftrace_modify_call() since its introduction in commit:

  3b23e4991f ("arm64: implement ftrace with regs")

Due to this, if we ever were to call ftrace_modify_call() for a callsite
which requires a PLT entry for a trampoline, then either:

a) If the old addr requires a trampoline, ftrace_modify_call() will use
   an out-of-range address to generate the 'old' branch instruction.
   This will result in warnings from aarch64_insn_gen_branch_imm() and
   ftrace_modify_code(), and no instructions will be modified. As
   ftrace_modify_call() will return an error, this will result in
   subsequent internal ftrace errors.

b) If the old addr does not require a trampoline, but the new addr does,
   ftrace_modify_call() will use an out-of-range address to generate the
   'new' branch instruction. This will result in warnings from
   aarch64_insn_gen_branch_imm(), and ftrace_modify_code() will replace
   the 'old' branch with a BRK. This will result in a kernel panic when
   this BRK is later executed.

Practically speaking, case (a) is vastly more likely than case (b), and
typically this will result in internal ftrace errors that don't
necessarily affect the rest of the system. This can be demonstrated with
an out-of-tree test module which triggers ftrace_modify_call(), e.g.

| # insmod test_ftrace.ko
| test_ftrace: Function test_function raw=0xffffb3749399201c, callsite=0xffffb37493992024
| branch_imm_common: offset out of range
| branch_imm_common: offset out of range
| ------------[ ftrace bug ]------------
| ftrace failed to modify
| [<ffffb37493992024>] test_function+0x8/0x38 [test_ftrace]
|  actual:   1d:00:00:94
| Updating ftrace call site to call a different ftrace function
| ftrace record flags: e0000002
|  (2) R
|  expected tramp: ffffb374ae42ed54
| ------------[ cut here ]------------
| WARNING: CPU: 0 PID: 165 at kernel/trace/ftrace.c:2085 ftrace_bug+0x280/0x2b0
| Modules linked in: test_ftrace(+)
| CPU: 0 PID: 165 Comm: insmod Not tainted 5.19.0-rc2-00002-g4d9ead8b45ce #13
| Hardware name: linux,dummy-virt (DT)
| pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : ftrace_bug+0x280/0x2b0
| lr : ftrace_bug+0x280/0x2b0
| sp : ffff80000839ba00
| x29: ffff80000839ba00 x28: 0000000000000000 x27: ffff80000839bcf0
| x26: ffffb37493994180 x25: ffffb374b0991c28 x24: ffffb374b0d70000
| x23: 00000000ffffffea x22: ffffb374afcc33b0 x21: ffffb374b08f9cc8
| x20: ffff572b8462c000 x19: ffffb374b08f9000 x18: ffffffffffffffff
| x17: 6c6c6163202c6331 x16: ffffb374ae5ad110 x15: ffffb374b0d51ee4
| x14: 0000000000000000 x13: 3435646532346561 x12: 3437336266666666
| x11: 203a706d61727420 x10: 6465746365707865 x9 : ffffb374ae5149e8
| x8 : 336266666666203a x7 : 706d617274206465 x6 : 00000000fffff167
| x5 : ffff572bffbc4a08 x4 : 00000000fffff167 x3 : 0000000000000000
| x2 : 0000000000000000 x1 : ffff572b84461e00 x0 : 0000000000000022
| Call trace:
|  ftrace_bug+0x280/0x2b0
|  ftrace_replace_code+0x98/0xa0
|  ftrace_modify_all_code+0xe0/0x144
|  arch_ftrace_update_code+0x14/0x20
|  ftrace_startup+0xf8/0x1b0
|  register_ftrace_function+0x38/0x90
|  test_ftrace_init+0xd0/0x1000 [test_ftrace]
|  do_one_initcall+0x50/0x2b0
|  do_init_module+0x50/0x1f0
|  load_module+0x17c8/0x1d64
|  __do_sys_finit_module+0xa8/0x100
|  __arm64_sys_finit_module+0x2c/0x3c
|  invoke_syscall+0x50/0x120
|  el0_svc_common.constprop.0+0xdc/0x100
|  do_el0_svc+0x3c/0xd0
|  el0_svc+0x34/0xb0
|  el0t_64_sync_handler+0xbc/0x140
|  el0t_64_sync+0x18c/0x190
| ---[ end trace 0000000000000000 ]---

We can solve this by consistently determining whether to use a PLT entry
for an address.

Note that since (the earlier) commit:

  f1a54ae9af ("arm64: module/ftrace: intialize PLT at load time")

... we can consistently determine the PLT address that a given callsite
will use, and therefore ftrace_make_nop() does not need to skip
validation when a PLT is in use.

This patch factors the existing logic out of ftrace_make_call() and
ftrace_make_nop() into a common ftrace_find_callable_addr() helper
function, which is used by ftrace_make_call(), ftrace_make_nop(), and
ftrace_modify_call(). In ftrace_make_nop() the patching is consistently
validated by ftrace_modify_code() as we can always determine what the
old instruction should have been.

Fixes: 3b23e4991f ("arm64: implement ftrace with regs")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: "Ivan T. Ivanov" <iivanov@suse.de>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220614080944.1349146-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-15 16:14:47 +01:00
Mark Rutland
3eefdf9d1e arm64: ftrace: fix branch range checks
The branch range checks in ftrace_make_call() and ftrace_make_nop() are
incorrect, erroneously permitting a forwards branch of 128M and
erroneously rejecting a backwards branch of 128M.

This is because both functions calculate the offset backwards,
calculating the offset *from* the target *to* the branch, rather than
the other way around as the later comparisons expect.

If an out-of-range branch were erroeously permitted, this would later be
rejected by aarch64_insn_gen_branch_imm() as branch_imm_common() checks
the bounds correctly, resulting in warnings and the placement of a BRK
instruction. Note that this can only happen for a forwards branch of
exactly 128M, and so the caller would need to be exactly 128M bytes
below the relevant ftrace trampoline.

If an in-range branch were erroeously rejected, then:

* For modules when CONFIG_ARM64_MODULE_PLTS=y, this would result in the
  use of a PLT entry, which is benign.

  Note that this is the common case, as this is selected by
  CONFIG_RANDOMIZE_BASE (and therefore RANDOMIZE_MODULE_REGION_FULL),
  which distributions typically seelct. This is also selected by
  CONFIG_ARM64_ERRATUM_843419.

* For modules when CONFIG_ARM64_MODULE_PLTS=n, this would result in
  internal ftrace failures.

* For core kernel text, this would result in internal ftrace failues.

  Note that for this to happen, the kernel text would need to be at
  least 128M bytes in size, and typical configurations are smaller tha
  this.

Fix this by calculating the offset *from* the branch *to* the target in
both functions.

Fixes: f8af0b364e ("arm64: ftrace: don't validate branch via PLT in ftrace_make_nop()")
Fixes: e71a4e1beb ("arm64: ftrace: add support for far branches to dynamic ftrace")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: "Ivan T. Ivanov" <iivanov@suse.de>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220614080944.1349146-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-15 16:14:46 +01:00
Catalin Marinas
27d8fa2078 Revert "arm64: Initialize jump labels before setup_machine_fdt()"
This reverts commit 73e2d827a5.

The reverted patch was needed as a fix after commit f5bda35fba
("random: use static branch for crng_ready()"). However, this was
already fixed by 60e5b2886b ("random: do not use jump labels before
they are initialized") and hence no longer necessary to initialise jump
labels before setup_machine_fdt().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-15 16:14:32 +01:00
Oleksij Rempel
56315b6bf7 ARM: dts: at91: ksz9477_evb: fix port/phy validation
Latest drivers version requires phy-mode to be set. Otherwise we will
use "NA" mode and the switch driver will invalidate this port mode.

Fixes: 65ac79e181 ("net: dsa: microchip: add the phylink get_caps")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20220610081621.584393-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-14 22:11:02 -07:00
Linus Torvalds
24625f7d91 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "While last week's pull request contained miscellaneous fixes for x86,
  this one covers other architectures, selftests changes, and a bigger
  series for APIC virtualization bugs that were discovered during 5.20
  development. The idea is to base 5.20 development for KVM on top of
  this tag.

  ARM64:

   - Properly reset the SVE/SME flags on vcpu load

   - Fix a vgic-v2 regression regarding accessing the pending state of a
     HW interrupt from userspace (and make the code common with vgic-v3)

   - Fix access to the idreg range for protected guests

   - Ignore 'kvm-arm.mode=protected' when using VHE

   - Return an error from kvm_arch_init_vm() on allocation failure

   - A bunch of small cleanups (comments, annotations, indentation)

  RISC-V:

   - Typo fix in arch/riscv/kvm/vmid.c

   - Remove broken reference pattern from MAINTAINERS entry

  x86-64:

   - Fix error in page tables with MKTME enabled

   - Dirty page tracking performance test extended to running a nested
     guest

   - Disable APICv/AVIC in cases that it cannot implement correctly"

[ This merge also fixes a misplaced end parenthesis bug introduced in
  commit 3743c2f025 ("KVM: x86: inhibit APICv/AVIC on changes to APIC
  ID or APIC base") pointed out by Sean Christopherson ]

Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits)
  KVM: selftests: Restrict test region to 48-bit physical addresses when using nested
  KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  KVM: selftests: Clean up LIBKVM files in Makefile
  KVM: selftests: Link selftests directly with lib object files
  KVM: selftests: Drop unnecessary rule for STATIC_LIBS
  KVM: selftests: Add a helper to check EPT/VPID capabilities
  KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
  KVM: selftests: Refactor nested_map() to specify target level
  KVM: selftests: Drop stale function parameter comment for nested_map()
  KVM: selftests: Add option to create 2M and 1G EPT mappings
  KVM: selftests: Replace x86_page_size with PG_LEVEL_XX
  KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE
  KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put
  KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking
  KVM: x86: disable preemption while updating apicv inhibition
  KVM: x86: SVM: fix avic_kick_target_vcpus_fast
  KVM: x86: SVM: remove avic's broken code that updated APIC ID
  KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base
  KVM: x86: document AVIC/APICv inhibit reasons
  KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs
  ...
2022-06-14 07:57:18 -07:00
Linus Torvalds
8e8afafb0b Merge tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 MMIO stale data fixes from Thomas Gleixner:
 "Yet another hw vulnerability with a software mitigation: Processor
  MMIO Stale Data.

  They are a class of MMIO-related weaknesses which can expose stale
  data by propagating it into core fill buffers. Data which can then be
  leaked using the usual speculative execution methods.

  Mitigations include this set along with microcode updates and are
  similar to MDS and TAA vulnerabilities: VERW now clears those buffers
  too"

* tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation/mmio: Print SMT warning
  KVM: x86/speculation: Disable Fill buffer clear within guests
  x86/speculation/mmio: Reuse SRBDS mitigation for SBDS
  x86/speculation/srbds: Update SRBDS mitigation selection
  x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data
  x86/speculation/mmio: Enable CPU Fill buffer clearing on idle
  x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations
  x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data
  x86/speculation: Add a common function for MD_CLEAR mitigation update
  x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug
  Documentation: Add documentation for Processor MMIO Stale Data
2022-06-14 07:43:15 -07:00
Josh Poimboeuf
e32683c6f7 x86/mm: Fix RESERVE_BRK() for older binutils
With binutils 2.26, RESERVE_BRK() causes a build failure:

  /tmp/ccnGOKZ5.s: Assembler messages:
  /tmp/ccnGOKZ5.s:98: Error: missing ')'
  /tmp/ccnGOKZ5.s:98: Error: missing ')'
  /tmp/ccnGOKZ5.s:98: Error: missing ')'
  /tmp/ccnGOKZ5.s:98: Error: junk at end of line, first unrecognized
  character is `U'

The problem is this line:

  RESERVE_BRK(early_pgt_alloc, INIT_PGT_BUF_SIZE)

Specifically, the INIT_PGT_BUF_SIZE macro which (via PAGE_SIZE's use
_AC()) has a "1UL", which makes older versions of the assembler unhappy.
Unfortunately the _AC() macro doesn't work for inline asm.

Inline asm was only needed here to convince the toolchain to add the
STT_NOBITS flag.  However, if a C variable is placed in a section whose
name is prefixed with ".bss", GCC and Clang automatically set
STT_NOBITS.  In fact, ".bss..page_aligned" already relies on this trick.

So fix the build failure (and simplify the macro) by allocating the
variable in C.

Also, add NOLOAD to the ".brk" output section clause in the linker
script.  This is a failsafe in case the ".bss" prefix magic trick ever
stops working somehow.  If there's a section type mismatch, the GNU
linker will force the ".brk" output section to be STT_NOBITS.  The LLVM
linker will fail with a "section type mismatch" error.

Note this also changes the name of the variable from .brk.##name to
__brk_##name.  The variable names aren't actually used anywhere, so it's
harmless.

Fixes: a1e2c031ec ("x86/mm: Simplify RESERVE_BRK()")
Reported-by: Joe Damato <jdamato@fastly.com>
Reported-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://lore.kernel.org/r/22d07a44c80d8e8e1e82b9a806ddc8c6bbb2606e.1654759036.git.jpoimboe@kernel.org
2022-06-13 10:15:04 +02:00
Conor Dooley
5e757deddd riscv: dts: microchip: re-add pdma to mpfs device tree
PolarFire SoC /does/ have a SiFive pdma, despite what I suggested as a
conflict resolution to Zong. Somehow the entry fell through the cracks
between versions of my dt patches, so re-add it with Zong's updated
compatible & dma-channels property.

Fixes: c5094f3710 ("riscv: dts: microchip: refactor icicle kit device tree")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2022-06-12 19:33:52 +01:00
Linus Torvalds
abe71eb32f Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
 "Fixes all over the place, most notably fixes for latent bugs in
  drivers that got exposed by suppressing interrupts before DRIVER_OK,
  which in turn has been done by 8b4ec69d7e ("virtio: harden vring
  IRQ")"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  um: virt-pci: set device ready in probe()
  vdpa: make get_vq_group and set_group_asid optional
  virtio: Fix all occurences of the "the the" typo
  vduse: Fix NULL pointer dereference on sysfs access
  vringh: Fix loop descriptors check in the indirect cases
  vdpa/mlx5: clean up indenting in handle_ctrl_vlan()
  vdpa/mlx5: fix error code for deleting vlan
  virtio-mmio: fix missing put_device() when vm_cmdline_parent registration failed
  vdpa/mlx5: Fix syntax errors in comments
  virtio-rng: make device ready before making request
2022-06-11 16:32:47 -07:00
Linus Torvalds
0678afa605 Merge tag 'loongarch-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen.
 "Fix build errors and a stale comment"

* tag 'loongarch-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: Remove MIPS comment about cycle counter
  LoongArch: Fix copy_thread() build errors
  LoongArch: Fix the !CONFIG_SMP build
2022-06-11 12:37:39 -07:00
Vincent Whitchurch
eacea84459 um: virt-pci: set device ready in probe()
Call virtio_device_ready() to make this driver work after commit
b4ec69d7e09 ("virtio: harden vring IRQ"), since the driver uses the
virtqueues in the probe function.  (The virtio core sets the device
ready when probe returns.)

Fixes: 8b4ec69d7e ("virtio: harden vring IRQ")
Fixes: 68f5d3f3b6 ("um: add PCI over virtio emulation driver")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Message-Id: <20220610151203.3492541-1-vincent.whitchurch@axis.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
2022-06-10 20:38:06 -04:00
Linus Torvalds
36a2366379 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:

 - SME save/restore for EFI fix - incorrect logic for detecting the need
   for saving/restoring the FFR state.

 - SME fix for a CPU ID field value.

 - Sysreg generation awk script fix (comparison operator).

 - Some typos in documentation or comments and silence a sparse warning
   (missing prototype).

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Add kasan_hw_tags_enable() prototype to silence sparse
  arm64/sme: Fix EFI save/restore
  arm64/fpsimd: Fix typo in comment
  arm64/sysreg: Fix typo in Enum element regex
  arm64/sme: Fix SVE/SME typo in ABI documentation
  arm64/sme: Fix tests for 0b1111 value ID registers
2022-06-10 11:03:51 -07:00
Catalin Marinas
78cdaf3f42 arm64: Add kasan_hw_tags_enable() prototype to silence sparse
This function is only called from assembly, no need for a prototype
declaration in a header file. In addition, add #ifdef around the
function since it is only used when CONFIG_KASAN_HW_TAGS.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: kernel test robot <lkp@intel.com>
2022-06-10 18:04:05 +01:00
Linus Torvalds
f2ecc964b9 Merge tag 'for-linus-5.19a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:

 - a small cleanup removing "export" of an __init function

 - a small series adding a new infrastructure for platform flags

 - a series adding generic virtio support for Xen guests (frontend side)

* tag 'for-linus-5.19a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: unexport __init-annotated xen_xlate_map_ballooned_pages()
  arm/xen: Assign xen-grant DMA ops for xen-grant DMA devices
  xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices
  xen/grant-dma-iommu: Introduce stub IOMMU driver
  dt-bindings: Add xen,grant-dma IOMMU description for xen-grant DMA ops
  xen/virtio: Enable restricted memory access using Xen grant mappings
  xen/grant-dma-ops: Add option to restrict memory access under Xen
  xen/grants: support allocating consecutive grants
  arm/xen: Introduce xen_setup_dma_ops()
  virtio: replace arch_has_restricted_virtio_memory_access()
  kernel: add platform_has() infrastructure
2022-06-10 09:57:11 -07:00
Mark Brown
2e990e6322 arm64/sme: Fix EFI save/restore
The EFI save/restore code is confused. When saving the check for saving
FFR is inverted due to confusion with the streaming mode check, and when
restoring we check if we need to restore FFR by checking the percpu
efi_sm_state without the required wrapper rather than based on the
combination of FA64 support and streaming mode.

Fixes: e0838f6373 ("arm64/sme: Save and restore streaming mode over EFI runtime calls")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220602124132.3528951-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-10 17:29:23 +01:00
Xiang wangx
bb314511b6 arm64/fpsimd: Fix typo in comment
Delete the redundant word 'in'.

Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Link: https://lore.kernel.org/r/20220610070543.59338-1-wangxiang@cdjrlc.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-10 16:24:15 +01:00
Alejandro Tafalla
ce253b8573 arm64/sysreg: Fix typo in Enum element regex
In the awk script, there was a typo with the comparison operator when
checking if the matched pattern is inside an Enum block.
This prevented the generation of the whole sysreg-defs.h header.

Fixes: 66847e0618 ("arm64: Add sysreg header generation scripting")
Signed-off-by: Alejandro Tafalla <atafalla@dnyon.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220609204220.12112-1-atafalla@dnyon.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-10 16:23:48 +01:00
Linus Torvalds
95fc76c81b Merge tag 'powerpc-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - On 32-bit fix overread/overwrite of thread_struct via ptrace
   PEEK/POKE.

 - Fix softirqs not switching to the softirq stack since we moved
   irq_exit().

 - Force thread size increase when KASAN is enabled to avoid stack
   overflows.

 - On Book3s 64 mark more code as not to be instrumented by KASAN to
   avoid crashes.

 - Exempt __get_wchan() from KASAN checking, as it's inherently racy.

 - Fix a recently introduced crash in the papr_scm driver in some
   configurations.

 - Remove include of <generated/compile.h> which is forbidden.

Thanks to Ariel Miculas, Chen Jingwen, Christophe Leroy, Erhard Furtner,
He Ying, Kees Cook, Masahiro Yamada, Nageswara R Sastry, Paul Mackerras,
Sachin Sant, Vaibhav Jain, and Wanming Hu.

* tag 'powerpc-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32: Fix overread/overwrite of thread_struct via ptrace
  powerpc/book3e: get rid of #include <generated/compile.h>
  powerpc/kasan: Force thread size increase with KASAN
  powerpc/papr_scm: don't requests stats with '0' sized stats buffer
  powerpc: Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK
  powerpc/kasan: Silence KASAN warnings in __get_wchan()
  powerpc/kasan: Mark more real-mode code as not to be instrumented
2022-06-09 12:17:43 -07:00
Linus Torvalds
825464e79d Merge tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - eth: amt: fix possible null-ptr-deref in amt_rcv()

  Previous releases - regressions:

   - tcp: use alloc_large_system_hash() to allocate table_perturb

   - af_unix: fix a data-race in unix_dgram_peer_wake_me()

   - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling

   - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF

  Previous releases - always broken:

   - ipv6: fix signed integer overflow in __ip6_append_data

   - netfilter:
       - nat: really support inet nat without l3 address
       - nf_tables: memleak flow rule from commit path

   - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs

   - openvswitch: fix misuse of the cached connection on tuple changes

   - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred

   - eth: altera: fix refcount leak in altera_tse_mdio_create

  Misc:

   - add Quentin Monnet to bpftool maintainers"

* tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
  net: amd-xgbe: fix clang -Wformat warning
  tcp: use alloc_large_system_hash() to allocate table_perturb
  net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY
  net: dsa: mv88e6xxx: correctly report serdes link failure
  net: dsa: mv88e6xxx: fix BMSR error to be consistent with others
  net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete
  net: altera: Fix refcount leak in altera_tse_mdio_create
  net: openvswitch: fix misuse of the cached connection on tuple changes
  net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag
  ip_gre: test csum_start instead of transport header
  au1000_eth: stop using virt_to_bus()
  ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg
  ipv6: Fix signed integer overflow in __ip6_append_data
  nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred
  nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION
  nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling
  nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION
  net: ipv6: unexport __init-annotated seg6_hmac_init()
  net: xfrm: unexport __init-annotated xfrm4_protocol_init()
  net: mdio: unexport __init-annotated mdio_bus_init()
  ...
2022-06-09 12:06:52 -07:00
Linus Torvalds
f0be87c42c gcc-12: disable '-Warray-bounds' universally for now
In commit 8b202ee218 ("s390: disable -Warray-bounds") the s390 people
disabled the '-Warray-bounds' warning for gcc-12, because the new logic
in gcc would cause warnings for their use of the S390_lowcore macro,
which accesses absolute pointers.

It turns out gcc-12 has many other issues in this area, so this takes
that s390 warning disable logic, and turns it into a kernel build config
entry instead.

Part of the intent is that we can make this all much more targeted, and
use this conflig flag to disable it in only particular configurations
that cause problems, with the s390 case as an example:

        select GCC12_NO_ARRAY_BOUNDS

and we could do that for other configuration cases that cause issues.

Or we could possibly use the CONFIG_CC_NO_ARRAY_BOUNDS thing in a more
targeted way, and disable the warning only for particular uses: again
the s390 case as an example:

  KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_CC_NO_ARRAY_BOUNDS),-Wno-array-bounds)

but this ends up just doing it globally in the top-level Makefile, since
the current issues are spread fairly widely all over:

  KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds

We'll try to limit this later, since the gcc-12 problems are rare enough
that *much* of the kernel can be built with it without disabling this
warning.

Cc: Kees Cook <keescook@chromium.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09 10:11:12 -07:00
Paolo Bonzini
e3cdaab5ff KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE
Commit 74fd41ed16 ("KVM: x86: nSVM: support PAUSE filtering when L0
doesn't intercept PAUSE") introduced passthrough support for nested pause
filtering, (when the host doesn't intercept PAUSE) (either disabled with
kvm module param, or disabled with '-overcommit cpu-pm=on')

Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards,
the feature was exposed as supported by KVM cpuid unconditionally, thus
if L1 could try to use it even when the L0 KVM can't really support it.

In this case the fallback caused KVM to intercept each PAUSE instruction;
in some cases, such intercept can slow down the nested guest so much
that it can fail to boot.  Instead, before the problematic commit KVM
was already setting both thresholds to 0 in vmcb02, but after the first
userspace VM exit shrink_ple_window was called and would reset the
pause_filter_count to the default value.

To fix this, change the fallback strategy - ignore the guest threshold
values, but use/update the host threshold values unless the guest
specifically requests disabling PAUSE filtering (either simple or
advanced).

Also fix a minor bug: on nested VM exit, when PAUSE filter counter
were copied back to vmcb01, a dirty bit was not set.

Thanks a lot to Suravee Suthikulpanit for debugging this!

Fixes: 74fd41ed16 ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE")
Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Co-developed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220518072709.730031-1-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:21 -04:00
Maxim Levitsky
ba8ec27324 KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put
Now that these functions are always called with preemption disabled,
remove the preempt_disable()/preempt_enable() pair inside them.

No functional change intended.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-8-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:20 -04:00
Maxim Levitsky
66c768d30e KVM: x86: disable preemption while updating apicv inhibition
Currently nothing prevents preemption in kvm_vcpu_update_apicv.

On SVM, If the preemption happens after we update the
vcpu->arch.apicv_active, the preemption itself will
'update' the inhibition since the AVIC will be first disabled
on vCPU unload and then enabled, when the current task
is loaded again.

Then we will try to update it again, which will lead to a warning
in __avic_vcpu_load, that the AVIC is already enabled.

Fix this by disabling preemption in this code.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-6-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:19 -04:00
Maxim Levitsky
603ccef42c KVM: x86: SVM: fix avic_kick_target_vcpus_fast
There are two issues in avic_kick_target_vcpus_fast

1. It is legal to issue an IPI request with APIC_DEST_NOSHORT
   and a physical destination of 0xFF (or 0xFFFFFFFF in case of x2apic),
   which must be treated as a broadcast destination.

   Fix this by explicitly checking for it.
   Also don’t use ‘index’ in this case as it gives no new information.

2. It is legal to issue a logical IPI request to more than one target.
   Index field only provides index in physical id table of first
   such target and therefore can't be used before we are sure
   that only a single target was addressed.

   Instead, parse the ICRL/ICRH, double check that a unicast interrupt
   was requested, and use that info to figure out the physical id
   of the target vCPU.
   At that point there is no need to use the index field as well.

In addition to fixing the above	issues,	also skip the call to
kvm_apic_match_dest.

It is possible to do this now, because now as long as AVIC is not
inhibited, it is guaranteed that none of the vCPUs changed their
apic id from its default value.

This fixes boot of windows guest with AVIC enabled because it uses
IPI with 0xFF destination and no destination shorthand.

Fixes: 7223fd2d53 ("KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible")
Cc: stable@vger.kernel.org

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-5-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:19 -04:00
Maxim Levitsky
f5f9089f76 KVM: x86: SVM: remove avic's broken code that updated APIC ID
AVIC is now inhibited if the guest changes the apic id,
and therefore this code is no longer needed.

There are several ways this code was broken, including:

1. a vCPU was only allowed to change its apic id to an apic id
of an existing vCPU.

2. After such change, the vCPU whose apic id entry was overwritten,
could not correctly change its own apic id, because its own
entry is already overwritten.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:18 -04:00
Maxim Levitsky
3743c2f025 KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base
Neither of these settings should be changed by the guest and it is
a burden to support it in the acceleration code, so just inhibit
this code instead.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:18 -04:00
Maxim Levitsky
a9603ae0e4 KVM: x86: document AVIC/APICv inhibit reasons
These days there are too many AVIC/APICv inhibit
reasons, and it doesn't hurt to have some documentation
for them.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:17 -04:00
Yuan Yao
d2263de137 KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs
Assign shadow_me_value, not shadow_me_mask, to PAE root entries,
a.k.a. shadow PDPTRs, when host memory encryption is supported.  The
"mask" is the set of all possible memory encryption bits, e.g. MKTME
KeyIDs, whereas "value" holds the actual value that needs to be
stuffed into host page tables.

Using shadow_me_mask results in a failed VM-Entry due to setting
reserved PA bits in the PDPTRs, and ultimately causes an OOPS due to
physical addresses with non-zero MKTME bits sending to_shadow_page()
into the weeds:

set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
BUG: unable to handle page fault for address: ffd43f00063049e8
PGD 86dfd8067 P4D 0
Oops: 0000 [#1] PREEMPT SMP
RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm]
 kvm_mmu_free_roots+0xd1/0x200 [kvm]
 __kvm_mmu_unload+0x29/0x70 [kvm]
 kvm_mmu_unload+0x13/0x20 [kvm]
 kvm_arch_destroy_vm+0x8a/0x190 [kvm]
 kvm_put_kvm+0x197/0x2d0 [kvm]
 kvm_vm_release+0x21/0x30 [kvm]
 __fput+0x8e/0x260
 ____fput+0xe/0x10
 task_work_run+0x6f/0xb0
 do_exit+0x327/0xa90
 do_group_exit+0x35/0xa0
 get_signal+0x911/0x930
 arch_do_signal_or_restart+0x37/0x720
 exit_to_user_mode_prepare+0xb2/0x140
 syscall_exit_to_user_mode+0x16/0x30
 do_syscall_64+0x4e/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: e54f1ff244 ("KVM: x86/mmu: Add shadow_me_value and repurpose shadow_me_mask")
Signed-off-by: Yuan Yao <yuan.yao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-Id: <20220608012015.19566-1-yuan.yao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:16 -04:00
Paolo Bonzini
76599a4761 Merge tag 'kvmarm-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.19, take #1

- Properly reset the SVE/SME flags on vcpu load

- Fix a vgic-v2 regression regarding accessing the pending
  state of a HW interrupt from userspace (and make the code
  common with vgic-v3)

- Fix access to the idreg range for protected guests

- Ignore 'kvm-arm.mode=protected' when using VHE

- Return an error from kvm_arch_init_vm() on allocation failure

- A bunch of small cleanups (comments, annotations, indentation)
2022-06-09 10:32:17 -04:00