Commit Graph

1154336 Commits

Author SHA1 Message Date
Oliver Upton
52b603628a Merge branch kvm-arm64/parallel-access-faults into kvmarm/next
* kvm-arm64/parallel-access-faults:
  : Parallel stage-2 access fault handling
  :
  : The parallel faults changes that went in to 6.2 covered most stage-2
  : aborts, with the exception of stage-2 access faults. Building on top of
  : the new infrastructure, this series adds support for handling access
  : faults (i.e. updating the access flag) in parallel.
  :
  : This is expected to provide a performance uplift for cores that do not
  : implement FEAT_HAFDBS, such as those from the fruit company.
  KVM: arm64: Condition HW AF updates on config option
  KVM: arm64: Handle access faults behind the read lock
  KVM: arm64: Don't serialize if the access flag isn't set
  KVM: arm64: Return EAGAIN for invalid PTE in attr walker
  KVM: arm64: Ignore EAGAIN for walks outside of a fault
  KVM: arm64: Use KVM's pte type/helpers in handle_access_fault()

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-13 22:33:10 +00:00
Oliver Upton
e8789ab704 Merge branch kvm-arm64/virtual-cache-geometry into kvmarm/next
* kvm-arm64/virtual-cache-geometry:
  : Virtualized cache geometry for KVM guests, courtesy of Akihiko Odaki.
  :
  : KVM/arm64 has always exposed the host cache geometry directly to the
  : guest, even though non-secure software should never perform CMOs by
  : Set/Way. This was slightly wrong, as the cache geometry was derived from
  : the PE on which the vCPU thread was running and not a sanitized value.
  :
  : All together this leads to issues migrating VMs on heterogeneous
  : systems, as the cache geometry saved/restored could be inconsistent.
  :
  : KVM/arm64 now presents 1 level of cache with 1 set and 1 way. The cache
  : geometry is entirely controlled by userspace, such that migrations from
  : older kernels continue to work.
  KVM: arm64: Mark some VM-scoped allocations as __GFP_ACCOUNT
  KVM: arm64: Normalize cache configuration
  KVM: arm64: Mask FEAT_CCIDX
  KVM: arm64: Always set HCR_TID2
  arm64/cache: Move CLIDR macro definitions
  arm64/sysreg: Add CCSIDR2_EL1
  arm64/sysreg: Convert CCSIDR_EL1 to automatic generation
  arm64: Allow the definition of UNKNOWN system register fields

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-13 22:32:40 +00:00
Oliver Upton
619cec0085 Merge branch arm64/for-next/sme2 into kvmarm/next
Merge the SME2 branch to fix up a rather annoying conflict due to the
EL2 finalization refactor.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-13 22:30:17 +00:00
Oliver Upton
92425e058a Merge branch kvm/kvm-hw-enable-refactor into kvmarm/next
Merge the kvm_init() + hardware enable rework to avoid conflicts
with kvmarm.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-13 22:28:34 +00:00
Oliver Upton
5f623a598d KVM: arm64: Mark some VM-scoped allocations as __GFP_ACCOUNT
Generally speaking, any memory allocations that can be associated with a
particular VM should be charged to the cgroup of its process.
Nonetheless, there are a couple spots in KVM/arm64 that aren't currently
accounted:

 - the ccsidr array containing the virtualized cache hierarchy

 - the cpumask of supported cpus, for use of the vPMU on heterogeneous
   systems

Go ahead and set __GFP_ACCOUNT for these allocations.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Link: https://lore.kernel.org/r/20230206235229.4174711-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-07 13:56:18 +00:00
Marc Zyngier
9442d05bba arm64/sme: Fix __finalise_el2 SMEver check
When checking for ID_AA64SMFR0_EL1.SMEver, __check_override assumes
that the ID_AA64SMFR0_EL1 value is in x1, and the intent of the code
is to reuse value read a few lines above.

However, as the comment says at the beginning of the macro, x1 will
be clobbered, and the checks always fails.

The easiest fix is just to reload the id register before checking it.

Fixes: f122576f35 ("arm64/sme: Enable host kernel to access ZT0")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-02-06 16:34:29 +00:00
Mark Brown
b2ab432bcf kselftest/arm64: Remove redundant _start labels from zt-test
The newly added zt-test program copied the pattern from the other FP
stress test programs of having a redundant _start label which is
rejected by clang, as we did in a parallel series for the other tests
remove the label so we can build with clang.

No functional change.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230130-arm64-fix-sme2-clang-v1-1-3ce81d99ea8f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-31 16:02:13 +00:00
Akihiko Odaki
7af0c2534f KVM: arm64: Normalize cache configuration
Before this change, the cache configuration of the physical CPU was
exposed to vcpus. This is problematic because the cache configuration a
vcpu sees varies when it migrates between vcpus with different cache
configurations.

Fabricate cache configuration from the sanitized value, which holds the
CTR_EL0 value the userspace sees regardless of which physical CPU it
resides on.

CLIDR_EL1 and CCSIDR_EL1 are now writable from the userspace so that
the VMM can restore the values saved with the old kernel.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Link: https://lore.kernel.org/r/20230112023852.42012-8-akihiko.odaki@daynix.com
[ Oliver: Squash Marc's fix for CCSIDR_EL1.LineSize when set from userspace ]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-01-21 18:09:23 +00:00
Mark Brown
3eb1b41fba kselftest/arm64: Add coverage of SME 2 and 2.1 hwcaps
Add the hwcaps defined by SME 2 and 2.1 to the hwcaps test.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-21-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:09 +00:00
Mark Brown
4e1aa1a18f kselftest/arm64: Add coverage of the ZT ptrace regset
Add coverage of the ZT ptrace interface.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-20-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:09 +00:00
Mark Brown
49886aa9ab kselftest/arm64: Add SME2 coverage to syscall-abi
Verify that ZT0 is preserved over syscalls when it is present and
PSTATE.ZA is set.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-19-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:08 +00:00
Mark Brown
18f8729ab3 kselftest/arm64: Add test coverage for ZT register signal frames
We should have a ZT register frame with an expected size when ZA is enabled
and have no ZT frame when ZA is disabled. Since we don't load any data into
ZT we expect the data to all be zeros since the architecture guarantees it
will be set to 0 as ZA is enabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-18-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:08 +00:00
Mark Brown
afe6f18275 kselftest/arm64: Teach the generic signal context validation about ZT
Add ZT to the set of signal contexts that the shared code understands and
validates the form of.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-17-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:08 +00:00
Mark Brown
6382937326 kselftest/arm64: Enumerate SME2 in the signal test utility code
Support test cases for SME2 by adding it to the set of features that we
enumerate so test cases can check for it.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-16-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:08 +00:00
Mark Brown
f63a9f15b2 kselftest/arm64: Cover ZT in the FP stress test
Hook up the newly added zt-test program in the FPSIMD stress tests, start
a copy per CPU when SME2 is supported.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-15-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:08 +00:00
Mark Brown
1c07425e90 kselftest/arm64: Add a stress test program for ZT0
Following the pattern for the other register sets add a stress test program
for ZT0 which continually loads and verifies patterns in the register in
an effort to discover context switching problems.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-14-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:07 +00:00
Mark Brown
7d5d8601e4 arm64/sme: Add hwcaps for SME 2 and 2.1 features
In order to allow userspace to discover the presence of the new SME features
add hwcaps for them.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-13-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:07 +00:00
Mark Brown
f90b529bcb arm64/sme: Implement ZT0 ptrace support
Implement support for a new note type NT_ARM64_ZT providing access to
ZT0 when implemented. Since ZT0 is a register with constant size this is
much simpler than for other SME state.

As ZT0 is only accessible when PSTATE.ZA is set writes to ZT0 cause
PSTATE.ZA to be set, the main alternative would be to return -EBUSY in
this case but this seemed more constructive. Practical users are also
going to be working with ZA anyway and have some understanding of the
state.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-12-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:07 +00:00
Mark Brown
ee072cf708 arm64/sme: Implement signal handling for ZT
Add a new signal context type for ZT which is present in the signal frame
when ZA is enabled and ZT is supported by the system. In order to account
for the possible addition of further ZT registers in the future we make the
number of registers variable in the ABI, though currently the only possible
number is 1. We could just use a bare list head for the context since the
number of registers can be inferred from the size of the context but for
usability and future extensibility we define a header with the number of
registers and some reserved fields in it.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-11-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:06 +00:00
Mark Brown
95fcec7132 arm64/sme: Implement context switching for ZT0
When the system supports SME2 the ZT0 register must be context switched as
part of the floating point state. This register is stored immediately
after ZA in memory and is only accessible when PSTATE.ZA is set so we
handle it in the same functions we use to save and restore ZA.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-10-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:06 +00:00
Mark Brown
d6138b4adc arm64/sme: Provide storage for ZT0
When the system supports SME2 there is an additional register ZT0 which
we must store when the task is using SME. Since ZT0 is accessible only
when PSTATE.ZA is set just like ZA we allocate storage for it along with
ZA, increasing the allocation size for the memory region where we store
ZA and storing the data for ZT after that for ZA.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-9-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:06 +00:00
Mark Brown
d4913eee15 arm64/sme: Add basic enumeration for SME2
Add basic feature detection for SME2, detecting that the feature is present
and disabling traps for ZT0.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-8-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:06 +00:00
Mark Brown
f122576f35 arm64/sme: Enable host kernel to access ZT0
The new register ZT0 introduced by SME2 comes with a new trap, disable it
for the host kernel so that we can implement support for it.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-7-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:06 +00:00
Mark Brown
2cdeecdb95 arm64/sme: Manually encode ZT0 load and store instructions
In order to avoid unrealistic toolchain requirements we manually encode the
instructions for loading and storing ZT0.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-6-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:06 +00:00
Mark Brown
8ef55603b8 arm64/esr: Document ISS for ZT0 being disabled
SME2 defines a new ISS code for use when trapping acesses to ZT0, add a
definition for it.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-5-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:05 +00:00
Mark Brown
4edc11744e arm64/sme: Document SME 2 and SME 2.1 ABI
As well as a number of simple features which only add new instructions and
require corresponding hwcaps SME2 introduces a new register ZT0 for which
we must define ABI. Fortunately this is a fixed size 512 bits and therefore
much more straightforward than the base SME state, the only wrinkle is that
it is only accessible when ZA is accessible.

While there is only a single register the architecture is written with a
view to exensibility, including a number in the name, so follow this in the
ABI.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-4-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:05 +00:00
Mark Brown
0f3bbe0edf arm64/sysreg: Update system registers for SME 2 and 2.1
FEAT_SME2 and FEAT_SME2P1 introduce several new SME features which can
be enumerated via ID_AA64SMFR0_EL1 and a new register ZT0 access to
which is controlled via SMCR_ELn, add the relevant register description.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-3-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:05 +00:00
Mark Brown
6dabf1fac6 arm64: Document boot requirements for SME 2
SME 2 introduces the new ZT0 register, we require that access to this
reigster is not trapped when we identify that the feature is supported.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-2-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:05 +00:00
Mark Brown
ce514000da arm64/sme: Rename za_state to sme_state
In preparation for adding support for storage for ZT0 to the thread_struct
rename za_state to sme_state. Since ZT0 is accessible when PSTATE.ZA is
set just like ZA itself we will extend the allocation done for ZA to
cover it, avoiding the need to further expand task_struct for non-SME
tasks.

No functional changes.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-1-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-20 12:23:05 +00:00
Linus Torvalds
5dc4c995db Linux 6.2-rc4 2023-01-15 09:22:43 -06:00
Linus Torvalds
f0f70ddb8f Merge tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Make sure the poking PGD is pinned for Xen PV as it requires it this
   way

 - Fixes for two resctrl races when moving a task or creating a new
   monitoring group

 - Fix SEV-SNP guests running under HyperV where MTRRs are disabled to
   not return a UC- type mapping type on memremap() and thus cause a
   serious slowdown

 - Fix insn mnemonics in bioscall.S now that binutils is starting to fix
   confusing insn suffixes

* tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: fix poking_init() for Xen PV guests
  x86/resctrl: Fix event counts regression in reused RMIDs
  x86/resctrl: Fix task CLOSID/RMID update race
  x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
  x86/boot: Avoid using Intel mnemonics in AT&T syntax asm
2023-01-15 07:17:44 -06:00
Linus Torvalds
8aa9761223 Merge tag 'edac_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:

 - Fix the EDAC device's confusion in the polling setting units

 - Fix a memory leak in highbank's probing function

* tag 'edac_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/highbank: Fix memory leak in highbank_mc_probe()
  EDAC/device: Fix period calculation in edac_device_reset_delay_period()
2023-01-15 07:12:58 -06:00
Linus Torvalds
b1d63f0c77 Merge tag 'powerpc-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - Fix a build failure with some versions of ld that have an odd version
   string

 - Fix incorrect use of mutex in the IMC PMU driver

Thanks to Kajol Jain, Michael Petlan, Ojaswin Mujoo, Peter Zijlstra, and
Yang Yingliang.

* tag 'powerpc-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s/hash: Make stress_hpt_timer_fn() static
  powerpc/imc-pmu: Fix use of mutex in IRQs disabled section
  powerpc/boot: Fix incorrect version calculation issue in ld_version
2023-01-15 07:09:41 -06:00
Linus Torvalds
7c69844052 Merge tag 'iommu-fixes-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:

 - Core: Fix an iommu-group refcount leak

 - Fix overflow issue in IOVA alloc path

 - ARM-SMMU fixes from Will:
    - Fix VFIO regression on NXP SoCs by reporting IOMMU_CAP_CACHE_COHERENCY
    - Fix SMMU shutdown paths to avoid device unregistration race

 - Error handling fix for Mediatek IOMMU driver

* tag 'iommu-fixes-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe()
  iommu/iova: Fix alloc iova overflows issue
  iommu: Fix refcount leak in iommu_device_claim_dma_owner
  iommu/arm-smmu-v3: Don't unregister on shutdown
  iommu/arm-smmu: Don't unregister on shutdown
  iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY even betterer
2023-01-14 10:48:15 -06:00
Linus Torvalds
4f43ade45d Merge tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fix from Mike Rapoport:
 "memblock: always release pages to the buddy allocator in
  memblock_free_late()

  If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
  only releases pages to the buddy allocator if they are not in the
  deferred range. This is correct for free pages (as defined by
  for_each_free_mem_pfn_range_in_zone()) because free pages in the
  deferred range will be initialized and released as part of the
  deferred init process.

  memblock_free_pages() is called by memblock_free_late(), which is used
  to free reserved ranges after memblock_free_all() has run. All pages
  in reserved ranges have been initialized at that point, and
  accordingly, those pages are not touched by the deferred init process.

  This means that currently, if the pages that memblock_free_late()
  intends to release are in the deferred range, they will never be
  released to the buddy allocator. They will forever be reserved.

  In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
  which is also correct for free pages but is not correct for reserved
  pages. KMSAN metadata for reserved pages is initialized by
  kmsan_init_shadow(), which runs shortly before memblock_free_all().

  For both of these reasons, memblock_free_pages() should only be called
  for free pages, and memblock_free_late() should call
  __free_pages_core() directly instead.

  One case where this issue can occur in the wild is EFI boot on x86_64.
  The x86 EFI code reserves all EFI boot services memory ranges via
  memblock_reserve() and frees them later via memblock_free_late()
  (efi_reserve_boot_services() and efi_free_boot_services(),
  respectively).

  If any of those ranges happens to fall within the deferred init range,
  the pages will not be released and that memory will be unavailable.

  For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:

    v6.2-rc2:
    Node 0, zone      DMA
          spanned  4095
          present  3999
          managed  3840
    Node 0, zone    DMA32
          spanned  246652
          present  245868
          managed  178867

    v6.2-rc2 + patch:
    Node 0, zone      DMA
          spanned  4095
          present  3999
          managed  3840
    Node 0, zone    DMA32
          spanned  246652
          present  245868
          managed  222816   # +43,949 pages"

* tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  mm: Always release pages to the buddy allocator in memblock_free_late().
2023-01-14 10:08:08 -06:00
Linus Torvalds
880ca43e5c Merge tag 'hardening-v6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull kernel hardening fixes from Kees Cook:

 - Fix CFI hash randomization with KASAN (Sami Tolvanen)

 - Check size of coreboot table entry and use flex-array

* tag 'hardening-v6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kbuild: Fix CFI hash randomization with KASAN
  firmware: coreboot: Check size of table entry and use flex-array
2023-01-14 10:04:00 -06:00
Linus Torvalds
8b7be52f3f Merge tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module fix from Luis Chamberlain:
 "Just one fix for modules by Nick"

* tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  kallsyms: Fix scheduling with interrupts disabled in self-test
2023-01-14 08:17:27 -06:00
Linus Torvalds
b35ad63eec Merge tag '6.2-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:

 - memory leak and double free fix

 - two symlink fixes

 - minor cleanup fix

 - two smb1 fixes

* tag '6.2-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix uninitialized memory read for smb311 posix symlink create
  cifs: fix potential memory leaks in session setup
  cifs: do not query ifaces on smb1 mounts
  cifs: fix double free on failed kerberos auth
  cifs: remove redundant assignment to the variable match
  cifs: fix file info setting in cifs_open_file()
  cifs: fix file info setting in cifs_query_path_info()
2023-01-14 08:08:25 -06:00
Linus Torvalds
8e76813085 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Two minor fixes in the hisi_sas driver which only impact enterprise
  style multi-expander and shared disk situations and no core changes"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: hisi_sas: Set a port invalid only if there are no devices attached when refreshing port id
  scsi: hisi_sas: Use abort task set to reset SAS disks when discovered
2023-01-14 07:57:25 -06:00
Linus Torvalds
34cbf89afc Merge tag 'ata-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA fix from Damien Le Moal:
 "A single fix to prevent building the pata_cs5535 driver with user mode
  linux as it uses msr operations that are not defined with UML"

* tag 'ata-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: pata_cs5535: Don't build on UML
2023-01-14 07:52:11 -06:00
Linus Torvalds
97ec4d559d Merge tag 'block-6.2-2023-01-13' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
 "Nothing major in here, just a collection of NVMe fixes and dropping a
  wrong might_sleep() that static checkers tripped over but which isn't
  valid"

* tag 'block-6.2-2023-01-13' of git://git.kernel.dk/linux:
  MAINTAINERS: stop nvme matching for nvmem files
  nvme: don't allow unprivileged passthrough on partitions
  nvme: replace the "bool vec" arguments with flags in the ioctl path
  nvme: remove __nvme_ioctl
  nvme-pci: fix error handling in nvme_pci_enable()
  nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers
  nvme-apple: add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression
  block: Drop spurious might_sleep() from blk_put_queue()
2023-01-13 17:41:19 -06:00
Linus Torvalds
2ce7592df9 Merge tag 'io_uring-6.2-2023-01-13' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
 "A fix for a regression that happened last week, rest is fixes that
  will be headed to stable as well. In detail:

   - Fix for a regression added with the leak fix from last week (me)

   - In writing a test case for that leak, inadvertently discovered a
     case where we a poll request can race. So fix that up and mark it
     for stable, and also ensure that fdinfo covers both the poll tables
     that we have. The latter was an oversight when the split poll table
     were added (me)

   - Fix for a lockdep reported issue with IOPOLL (Pavel)"

* tag 'io_uring-6.2-2023-01-13' of git://git.kernel.dk/linux:
  io_uring: lock overflowing for IOPOLL
  io_uring/poll: attempt request issue after racy poll wakeup
  io_uring/fdinfo: include locked hash table in fdinfo output
  io_uring/poll: add hash if ready poll request can't complete inline
  io_uring/io-wq: only free worker if it was allocated for creation
2023-01-13 17:37:09 -06:00
Linus Torvalds
9e058c2952 Merge tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fixes from Bjorn Helgaas:

 - Work around apparent firmware issue that made Linux reject MMCONFIG
   space, which broke PCI extended config space (Bjorn Helgaas)

 - Fix CONFIG_PCIE_BT1 dependency due to mid-air collision between a
   PCI_MSI_IRQ_DOMAIN -> PCI_MSI change and addition of PCIE_BT1 (Lukas
   Bulwahn)

* tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
  x86/pci: Simplify is_mmconf_reserved() messages
  PCI: dwc: Adjust to recent removal of PCI_MSI_IRQ_DOMAIN
2023-01-13 17:32:22 -06:00
Sami Tolvanen
42633ed852 kbuild: Fix CFI hash randomization with KASAN
Clang emits a asan.module_ctor constructor to each object file
when KASAN is enabled, and these functions are indirectly called
in do_ctors. With CONFIG_CFI_CLANG, the compiler also emits a CFI
type hash before each address-taken global function so they can
pass indirect call checks.

However, in commit 0c3e806ec0 ("x86/cfi: Add boot time hash
randomization"), x86 implemented boot time hash randomization,
which relies on the .cfi_sites section generated by objtool. As
objtool is run against vmlinux.o instead of individual object
files with X86_KERNEL_IBT (enabled by default), CFI types in
object files that are not part of vmlinux.o end up not being
included in .cfi_sites, and thus won't get randomized and trip
CFI when called.

Only .vmlinux.export.o and init/version-timestamp.o are linked
into vmlinux separately from vmlinux.o. As these files don't
contain any functions, disable KASAN for both of them to avoid
breaking hash randomization.

Link: https://github.com/ClangBuiltLinux/linux/issues/1742
Fixes: 0c3e806ec0 ("x86/cfi: Add boot time hash randomization")
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230112224948.1479453-2-samitolvanen@google.com
2023-01-13 15:22:03 -08:00
Kees Cook
3b293487b8 firmware: coreboot: Check size of table entry and use flex-array
The memcpy() of the data following a coreboot_table_entry couldn't
be evaluated by the compiler under CONFIG_FORTIFY_SOURCE. To make it
easier to reason about, add an explicit flexible array member to struct
coreboot_device so the entire entry can be copied at once. Additionally,
validate the sizes before copying. Avoids this run-time false positive
warning:

  memcpy: detected field-spanning write (size 168) of single field "&device->entry" at drivers/firmware/google/coreboot_table.c:103 (size 8)

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/all/03ae2704-8c30-f9f0-215b-7cdf4ad35a9a@molgen.mpg.de/
Cc: Jack Rosenthal <jrosenth@chromium.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Julius Werner <jwerner@chromium.org>
Cc: Brian Norris <briannorris@chromium.org>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Julius Werner <jwerner@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Link: https://lore.kernel.org/r/20230107031406.gonna.761-kees@kernel.org
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Jack Rosenthal <jrosenth@chromium.org>
Link: https://lore.kernel.org/r/20230112230312.give.446-kees@kernel.org
2023-01-13 15:22:03 -08:00
Nicholas Piggin
da35048f26 kallsyms: Fix scheduling with interrupts disabled in self-test
kallsyms_on_each* may schedule so must not be called with interrupts
disabled. The iteration function could disable interrupts, but this
also changes lookup_symbol() to match the change to the other timing
code.

Reported-by: Erhard F. <erhard_f@mailbox.org>
Link: https://lore.kernel.org/all/bug-216902-206035@https.bugzilla.kernel.org%2F/
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202212251728.8d0872ff-oliver.sang@intel.com
Fixes: 30f3bb0977 ("kallsyms: Add self-test facility")
Tested-by: "Erhard F." <erhard_f@mailbox.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-01-13 15:09:08 -08:00
Peter Foley
22eebaa631 ata: pata_cs5535: Don't build on UML
This driver uses MSR functions that aren't implemented under UML.
Avoid building it to prevent tripping up allyesconfig.

e.g.
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x3a3): undefined reference to `__tracepoint_read_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x3d2): undefined reference to `__tracepoint_write_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x457): undefined reference to `__tracepoint_write_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x481): undefined reference to `do_trace_write_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x4d5): undefined reference to `do_trace_write_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x4f5): undefined reference to `do_trace_read_msr'
/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x51c): undefined reference to `do_trace_write_msr'

Signed-off-by: Peter Foley <pefoley2@pefoley.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2023-01-14 07:38:48 +09:00
Linus Torvalds
92783a90bc Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Fix the PMCR_EL0 reset value after the PMU rework

   - Correctly handle S2 fault triggered by a S1 page table walk by not
     always classifying it as a write, as this breaks on R/O memslots

   - Document why we cannot exit with KVM_EXIT_MMIO when taking a write
     fault from a S1 PTW on a R/O memslot

   - Put the Apple M2 on the naughty list for not being able to
     correctly implement the vgic SEIS feature, just like the M1 before
     it

   - Reviewer updates: Alex is stepping down, replaced by Zenghui

  x86:

   - Fix various rare locking issues in Xen emulation and teach lockdep
     to detect them

   - Documentation improvements

   - Do not return host topology information from KVM_GET_SUPPORTED_CPUID"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock
  KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule
  KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()
  KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking
  Documentation: kvm: fix SRCU locking order docs
  KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  KVM: nSVM: clarify recalc_intercepts() wrt CR8
  MAINTAINERS: Remove myself as a KVM/arm64 reviewer
  MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer
  KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations
  KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*
  KVM: arm64: Document the behaviour of S1PTW faults on RO memslots
  KVM: arm64: Fix S1PTW handling on RO memslots
  KVM: arm64: PMU: Fix PMCR_EL0 reset value
2023-01-13 14:41:50 -06:00
Mateusz Guzik
f5fe24ef17 lockref: stop doing cpu_relax in the cmpxchg loop
On the x86-64 architecture even a failing cmpxchg grants exclusive
access to the cacheline, making it preferable to retry the failed op
immediately instead of stalling with the pause instruction.

To illustrate the impact, below are benchmark results obtained by
running various will-it-scale tests on top of the 6.2-rc3 kernel and
Cascade Lake (2 sockets * 24 cores * 2 threads) CPU.

All results in ops/s.  Note there is some variance in re-runs, but the
code is consistently faster when contention is present.

  open3 ("Same file open/close"):
  proc          stock       no-pause
     1         805603         814942       (+%1)
     2        1054980        1054781       (-0%)
     8        1544802        1822858      (+18%)
    24        1191064        2199665      (+84%)
    48         851582        1469860      (+72%)
    96         609481        1427170     (+134%)

  fstat2 ("Same file fstat"):
  proc          stock       no-pause
     1        3013872        3047636       (+1%)
     2        4284687        4400421       (+2%)
     8        3257721        5530156      (+69%)
    24        2239819        5466127     (+144%)
    48        1701072        5256609     (+209%)
    96        1269157        6649326     (+423%)

Additionally, a kernel with a private patch to help access() scalability:
access2 ("Same file access"):

  proc          stock        patched      patched
                                         +nopause
    24        2378041        2005501      5370335  (-15% / +125%)

That is, fixing the problems in access itself *reduces* scalability
after the cacheline ping-pong only happens in lockref with the pause
instruction.

Note that fstat and access benchmarks are not currently integrated into
will-it-scale, but interested parties can find them in pull requests to
said project.

Code at hand has a rather tortured history.  First modification showed
up in commit d472d9d98b ("lockref: Relax in cmpxchg loop"), written
with Itanium in mind.  Later it got patched up to use an arch-dependent
macro to stop doing it on s390 where it caused a significant regression.
Said macro had undergone revisions and was ultimately eliminated later,
going back to cpu_relax.

While I intended to only remove cpu_relax for x86-64, I got the
following comment from Linus:

    I would actually prefer just removing it entirely and see if
    somebody else hollers. You have the numbers to prove it hurts on
    real hardware, and I don't think we have any numbers to the
    contrary.

    So I think it's better to trust the numbers and remove it as a
    failure, than say "let's just remove it on x86-64 and leave
    everybody else with the potentially broken code"

Additionally, Will Deacon (maintainer of the arm64 port, one of the
architectures previously benchmarked):

    So, from the arm64 side of the fence, I'm perfectly happy just
    removing the cpu_relax() calls from lockref.

As such, come back full circle in history and whack it altogether.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/all/CAGudoHHx0Nqg6DE70zAVA75eV-HXfWyhVMWZ-aSeOofkA_=WdA@mail.gmail.com/
Acked-by: Tony Luck <tony.luck@intel.com> # ia64
Acked-by: Nicholas Piggin <npiggin@gmail.com> # powerpc
Acked-by: Will Deacon <will@kernel.org> # arm64
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-01-13 14:35:38 -06:00
Bjorn Helgaas
fd3a8cff4d x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
Normally we reject ECAM space unless it is reported as reserved in the E820
table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2).

07eab0901e ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
E820 entries that correspond to EfiMemoryMappedIO regions because some
other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
E820 entries prevent Linux from allocating BAR space for hot-added devices.

Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
normally converted to an E820 entry by a bootloader or EFI stub.  After
07eab0901e, that E820 entry is removed, so we reject this ECAM space,
which makes PCI extended config space (offsets 0x100-0xfff) inaccessible.

The lack of extended config space breaks anything that relies on it,
including perf, VSEC telemetry, EDAC, QAT, SR-IOV, etc.

Allow use of ECAM for extended config space when the region is covered by
an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
_CRS.

Link: https://lore.kernel.org/r/ac2693d8-8ba3-72e0-5b66-b3ae008d539d@linux.intel.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216891
Fixes: 07eab0901e ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
Link: https://lore.kernel.org/r/20230110180243.1590045-3-helgaas@kernel.org
Reported-by: Kan Liang <kan.liang@linux.intel.com>
Reported-by: Tony Luck <tony.luck@intel.com>
Reported-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reported-by: Yunying Sun <yunying.sun@intel.com>
Reported-by: Baowen Zheng <baowen.zheng@corigine.com>
Reported-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reported-by: Yang Lixiao <lixiao.yang@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Yunying Sun <yunying.sun@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
2023-01-13 11:53:54 -06:00