Changes in 6.1.8
dma-buf: fix dma_buf_export init order v2
btrfs: fix trace event name typo for FLUSH_DELAYED_REFS
wifi: iwlwifi: fw: skip PPAG for JF
pNFS/filelayout: Fix coalescing test for single DS
selftests/bpf: check null propagation only neither reg is PTR_TO_BTF_ID
net: ethernet: marvell: octeontx2: Fix uninitialized variable warning
tools/virtio: initialize spinlocks in vring_test.c
vdpa/mlx5: Return error on vlan ctrl commands if not supported
vdpa/mlx5: Avoid using reslock in event_handler
vdpa/mlx5: Avoid overwriting CVQ iotlb
virtio_pci: modify ENOENT to EINVAL
vduse: Validate vq_num in vduse_validate_config()
vdpa_sim_net: should not drop the multicast/broadcast packet
net/ethtool/ioctl: return -EOPNOTSUPP if we have no phy stats
r8169: move rtl_wol_enable_rx() and rtl_prepare_power_down()
r8169: fix dmar pte write access is not set error
bpf: keep a reference to the mm, in case the task is dead.
RDMA/srp: Move large values to a new enum for gcc13
selftests: net: fix cmsg_so_mark.sh test hang
btrfs: always report error in run_one_delayed_ref()
x86/asm: Fix an assembler warning with current binutils
f2fs: let's avoid panic if extent_tree is not created
perf/x86/rapl: Treat Tigerlake like Icelake
cifs: fix race in assemble_neg_contexts()
memblock tests: Fix compilation error.
perf/x86/rapl: Add support for Intel Meteor Lake
perf/x86/rapl: Add support for Intel Emerald Rapids
of: fdt: Honor CONFIG_CMDLINE* even without /chosen node, take 2
fbdev: omapfb: avoid stack overflow warning
Bluetooth: hci_sync: Fix use HCI_OP_LE_READ_BUFFER_SIZE_V2
Bluetooth: hci_qca: Fix driver shutdown on closed serdev
wifi: brcmfmac: fix regression for Broadcom PCIe wifi devices
wifi: mac80211: fix MLO + AP_VLAN check
wifi: mac80211: reset multiple BSSID options in stop_ap()
wifi: mac80211: sdata can be NULL during AMPDU start
wifi: mac80211: fix initialization of rx->link and rx->link_sta
nommu: fix memory leak in do_mmap() error path
nommu: fix do_munmap() error path
nommu: fix split_vma() map_count error
proc: fix PIE proc-empty-vm, proc-pid-vm tests
Add exception protection processing for vd in axi_chan_handle_err function
LoongArch: Add HWCAP_LOONGARCH_CPUCFG to elf_hwcap
zonefs: Detect append writes at invalid locations
nilfs2: fix general protection fault in nilfs_btree_insert()
mm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE
hugetlb: unshare some PMDs when splitting VMAs
mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma
serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler
Revert "serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler"
xhci-pci: set the dma max_seg_size
usb: xhci: Check endpoint is valid before dereferencing it
xhci: Fix null pointer dereference when host dies
xhci: Add update_hub_device override for PCI xHCI hosts
xhci: Add a flag to disable USB3 lpm on a xhci root port level.
usb: acpi: add helper to check port lpm capability using acpi _DSM
xhci: Detect lpm incapable xHC USB3 roothub ports from ACPI tables
prlimit: do_prlimit needs to have a speculation check
USB: serial: option: add Quectel EM05-G (GR) modem
USB: serial: option: add Quectel EM05-G (CS) modem
USB: serial: option: add Quectel EM05-G (RS) modem
USB: serial: option: add Quectel EC200U modem
USB: serial: option: add Quectel EM05CN (SG) modem
USB: serial: option: add Quectel EM05CN modem
staging: vchiq_arm: fix enum vchiq_status return types
USB: misc: iowarrior: fix up header size for USB_DEVICE_ID_CODEMERCS_IOW100
usb: misc: onboard_hub: Invert driver registration order
usb: misc: onboard_hub: Move 'attach' work to the driver
misc: fastrpc: Fix use-after-free and race in fastrpc_map_find
misc: fastrpc: Don't remove map on creater_process and device_release
misc: fastrpc: Fix use-after-free race condition for maps
usb: core: hub: disable autosuspend for TI TUSB8041
comedi: adv_pci1760: Fix PWM instruction handling
ACPI: PRM: Check whether EFI runtime is available
mmc: sunxi-mmc: Fix clock refcount imbalance during unbind
mmc: sdhci-esdhc-imx: correct the tuning start tap and step setting
mm/hugetlb: fix PTE marker handling in hugetlb_change_protection()
mm/hugetlb: fix uffd-wp handling for migration entries in hugetlb_change_protection()
mm/hugetlb: pre-allocate pgtable pages for uffd wr-protects
mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA
mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end
btrfs: add extra error messages to cover non-ENOMEM errors from device_add_list()
btrfs: fix missing error handling when logging directory items
btrfs: fix directory logging due to race with concurrent index key deletion
btrfs: add missing setup of log for full commit at add_conflicting_inode()
btrfs: do not abort transaction on failure to write log tree when syncing log
btrfs: do not abort transaction on failure to update log root
btrfs: qgroup: do not warn on record without old_roots populated
btrfs: fix invalid leaf access due to inline extent during lseek
btrfs: fix race between quota rescan and disable leading to NULL pointer deref
cifs: do not include page data when checking signature
thunderbolt: Disable XDomain lane 1 only in software connection manager
thunderbolt: Use correct function to calculate maximum USB3 link rate
thunderbolt: Do not report errors if on-board retimers are found
thunderbolt: Do not call PM runtime functions in tb_retimer_scan()
riscv: dts: sifive: fu740: fix size of pcie 32bit memory
bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD
tty: serial: qcom-geni-serial: fix slab-out-of-bounds on RX FIFO buffer
tty: fix possible null-ptr-defer in spk_ttyio_release
pktcdvd: check for NULL returna fter calling bio_split_to_limits()
io_uring/poll: don't reissue in case of poll race on multishot request
mptcp: explicitly specify sock family at subflow creation time
mptcp: netlink: respect v4/v6-only sockets
selftests: mptcp: userspace: validate v4-v6 subflows mix
USB: gadgetfs: Fix race between mounting and unmounting
USB: serial: cp210x: add SCALANCE LPE-9000 device id
usb: cdns3: remove fetched trb from cache before dequeuing
usb: host: ehci-fsl: Fix module alias
usb: musb: fix error return code in omap2430_probe()
usb: typec: tcpm: Fix altmode re-registration causes sysfs create fail
usb: typec: altmodes/displayport: Add pin assignment helper
usb: typec: altmodes/displayport: Fix pin assignment calculation
usb: gadget: g_webcam: Send color matching descriptor per frame
USB: gadget: Add ID numbers to configfs-gadget driver names
usb: gadget: f_ncm: fix potential NULL ptr deref in ncm_bitrate()
usb-storage: apply IGNORE_UAS only for HIKSEMI MD202 on RTL9210
arm64: dts: imx8mp: correct usb clocks
dt-bindings: phy: g12a-usb2-phy: fix compatible string documentation
dt-bindings: phy: g12a-usb3-pcie-phy: fix compatible string documentation
serial: pch_uart: Pass correct sg to dma_unmap_sg()
dmaengine: lgm: Move DT parsing after initialization
dmaengine: tegra210-adma: fix global intr clear
dmaengine: idxd: Let probe fail when workqueue cannot be enabled
dmaengine: idxd: Prevent use after free on completion memory
dmaengine: idxd: Do not call DMX TX callbacks during workqueue disable
serial: amba-pl011: fix high priority character transmission in rs486 mode
serial: atmel: fix incorrect baudrate setup
serial: exar: Add support for Sealevel 7xxxC serial cards
gsmi: fix null-deref in gsmi_get_variable
mei: bus: fix unlink on bus in error path
mei: me: add meteor lake point M DID
VMCI: Use threaded irqs instead of tasklets
ARM: dts: qcom: apq8084-ifc6540: fix overriding SDHCI
ARM: omap1: fix !ARCH_OMAP1_ANY link failures
drm/amdgpu: fix amdgpu_job_free_resources v2
drm/amdgpu: allow multipipe policy on ASICs with one MEC
drm/amdgpu: Correct the power calcultion for Renior/Cezanne.
drm/i915: re-disable RC6p on Sandy Bridge
drm/i915/display: Check source height is > 0
drm/i915: Allow switching away via vga-switcheroo if uninitialized
drm/i915: Remove unused variable
drm/amd/display: Fix set scaling doesn's work
drm/amd/display: Calculate output_color_space after pixel encoding adjustment
drm/amd/display: Fix COLOR_SPACE_YCBCR2020_TYPE matrix
drm/amd/display: disable S/G display on DCN 3.1.5
drm/amd/display: disable S/G display on DCN 3.1.4
cifs: reduce roundtrips on create/qinfo requests
fs/ntfs3: Fix attr_punch_hole() null pointer derenference
arm64: efi: Execute runtime services from a dedicated stack
efi: rt-wrapper: Add missing include
panic: Separate sysctl logic from CONFIG_SMP
exit: Put an upper limit on how often we can oops
exit: Expose "oops_count" to sysfs
exit: Allow oops_limit to be disabled
panic: Consolidate open-coded panic_on_warn checks
panic: Introduce warn_limit
panic: Expose "warn_count" to sysfs
docs: Fix path paste-o for /sys/kernel/warn_count
exit: Use READ_ONCE() for all oops/warn limit reads
x86/fpu: Use _Alignof to avoid undefined behavior in TYPE_ALIGN
drm/amdgpu/discovery: enable soc21 common for GC 11.0.4
drm/amdgpu/discovery: enable gmc v11 for GC 11.0.4
drm/amdgpu/discovery: enable gfx v11 for GC 11.0.4
drm/amdgpu/discovery: enable mes support for GC v11.0.4
drm/amdgpu: set GC 11.0.4 family
drm/amdgpu/discovery: set the APU flag for GC 11.0.4
drm/amdgpu: add gfx support for GC 11.0.4
drm/amdgpu: add gmc v11 support for GC 11.0.4
drm/amdgpu/discovery: add PSP IP v13.0.11 support
drm/amdgpu/pm: enable swsmu for SMU IP v13.0.11
drm/amdgpu: add smu 13 support for smu 13.0.11
drm/amdgpu/pm: add GFXOFF control IP version check for SMU IP v13.0.11
drm/amdgpu/soc21: add mode2 asic reset for SMU IP v13.0.11
drm/amdgpu/pm: use the specific mailbox registers only for SMU IP v13.0.4
drm/amdgpu/discovery: enable nbio support for NBIO v7.7.1
drm/amdgpu: enable PSP IP v13.0.11 support
drm/amdgpu: enable GFX IP v11.0.4 CG support
drm/amdgpu: enable GFX Power Gating for GC IP v11.0.4
drm/amdgpu: enable GFX Clock Gating control for GC IP v11.0.4
drm/amdgpu: add tmz support for GC 11.0.1
drm/amdgpu: add tmz support for GC IP v11.0.4
drm/amdgpu: correct MEC number for gfx11 APUs
octeontx2-pf: Avoid use of GFP_KERNEL in atomic context
net/ulp: use consistent error code when blocking ULP
octeontx2-pf: Fix the use of GFP_KERNEL in atomic context on rt
net/mlx5: fix missing mutex_unlock in mlx5_fw_fatal_reporter_err_work()
block: mq-deadline: Rename deadline_is_seq_writes()
Revert "wifi: mac80211: fix memory leak in ieee80211_if_add()"
soc: qcom: apr: Make qcom,protection-domain optional again
Linux 6.1.8
Change-Id: I35d5b5a1ed4822eddb2fc8b29b323b36f7d11926
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Currently, PKVM_PAGE_RESTRICTED_PROT == __PKVM_PAGE_RESERVED + 2, i.e.
BIT(55) | BIT(56) | BIT(1). IOW, It is not possible to distinguish
RESTRICTED from ownership.
Make PKVM_PAGE_RESTRICTED_PROT the second bit of pkvm_page_state so it
can be combined with the ownership status.
Bug: 244543039
Bug: 244373730
Change-Id: Iee9b84d4f07fca323b35e2d7da54f3657ae2cff9
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Reason for revert:
MMIO guard is initialised too late for earlycon, so the fixup is
a waste of time.
Signed-off-by: Keir Fraser <keirf@google.com>
Bug: 259645922
Change-Id: Iace214f7140baf073cfa5cef1ee65fca9e4a76d0
This introduces a description of the MEM_RELINQUISH hypercall. MMIO
guard hypercalls are already described in another file, which we now
link to.
Signed-off-by: Keir Fraser <keirf@google.com>
Bug: 265943840
Change-Id: Iaffde3419f6432d76598e48c9bab53f672430b7a
Firstly, the hypercall IDs have been renumbered but that was not
reflected in the documentation. Secondly, the argument registers r1-r3
are checked to be zero when calling
ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO.
Signed-off-by: Keir Fraser <keirf@google.com>
Bug: 265943840
Change-Id: I9684d1e71af7d8627d079cfd89d437cfc28be09f
Change a few uses of ARM_SMCCC_VENDOR_HYP_KVM_FOO_FUNC_ID to
ARM_SMCCC_KVM_FUNC_FOO.
Signed-off-by: Keir Fraser <keirf@google.com>
Bug: 265943840
Change-Id: Id1642243ca4105a23e808cf28d13bfb81d9a5ac2
Add kasan.page_alloc.sample=10 to CONFIG_CMDLINE in gki_defconfig to make
Hardware Tag-Based (MTE) KASAN tag only one out of every 10 page_alloc
allocations with the order equal or larger than 3, which the omitted
default value for the kasan.page_alloc.sample.order parameter.
As Hardware Tag-Based KASAN is intended to be used in production, its
performance impact is crucial. As page_alloc allocations tend to be big,
tagging and checking all such allocations can introduce a significant
slowdown.
When running a local loopback test on a testing MTE-enabled device in sync
mode, enabling Hardware Tag-Based KASAN introduces a ~50% slowdown.
Setting kasan.page_alloc.sampling to a value higher than 1 allows to lower
the slowdown. The performance improvement saturates around the sampling
interval value of 10 with the default sampling page order of 3, see
b/238286329. This lowers the slowdown to ~20%. The slowdown in real
scenarios involving the network will likely be better.
Enabling page_alloc sampling has a downside: KASAN misses bad accesses to
a page_alloc allocation that has not been tagged. This lowers the value
of KASAN as a security mitigation.
However, based on measuring the number of page_alloc allocations of
different orders during boot in a test build, sampling with the default
kasan.page_alloc.sample.order value affects only ~7% of allocations. The
rest ~93% of allocations are still checked deterministically.
Bug: 238286329
Bug: 264310057
Change-Id: Id361822b8bbf929378cabbe0350b658d6120e840
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
[The patch is in the mm-unstable tree.]
The implementation of page_alloc poisoning sampling assumed that
tag_clear_highpage resets page tags for __GFP_ZEROTAGS allocations.
However, this is no longer the case since commit 70c248aca9
("mm: kasan: Skip unpoisoning of user pages").
This leads to kernel crashes when MTE-enabled userspace mappings are
used with Hardware Tag-Based KASAN enabled.
Reset page tags for __GFP_ZEROTAGS allocations in post_alloc_hook().
Also clarify and fix related comments.
Fixes: 44383cef54 ("kasan: allow sampling page_alloc allocations for HW_TAGS")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Peter Collingbourne <pcc@google.com>
Tested-by: Peter Collingbourne <pcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 238286329
Bug: 264310057
Link: https://lore.kernel.org/all/5dbd866714b4839069e2d8469ac45b60953db290.1674592780.git.andreyknvl@google.com/
Change-Id: Iea4234bcf7e35337c8063827b07039583bca9c66
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
[The patch is in mm-stable tree.]
As Hardware Tag-Based KASAN is intended to be used in production, its
performance impact is crucial. As page_alloc allocations tend to be big,
tagging and checking all such allocations can introduce a significant
slowdown.
Add two new boot parameters that allow to alleviate that slowdown:
- kasan.page_alloc.sample, which makes Hardware Tag-Based KASAN tag only
every Nth page_alloc allocation with the order configured by the second
added parameter (default: tag every such allocation).
- kasan.page_alloc.sample.order, which makes sampling enabled by the first
parameter only affect page_alloc allocations with the order equal or
greater than the specified value (default: 3, see below).
The exact performance improvement caused by using the new parameters
depends on their values and the applied workload.
The chosen default value for kasan.page_alloc.sample.order is 3, which
matches both PAGE_ALLOC_COSTLY_ORDER and SKB_FRAG_PAGE_ORDER. This is
done for two reasons:
1. PAGE_ALLOC_COSTLY_ORDER is "the order at which allocations are deemed
costly to service", which corresponds to the idea that only large and
thus costly allocations are supposed to sampled.
2. One of the workloads targeted by this patch is a benchmark that sends
a large amount of data over a local loopback connection. Most multi-page
data allocations in the networking subsystem have the order of
SKB_FRAG_PAGE_ORDER (or PAGE_ALLOC_COSTLY_ORDER).
When running a local loopback test on a testing MTE-enabled device in sync
mode, enabling Hardware Tag-Based KASAN introduces a ~50% slowdown.
Applying this patch and setting kasan.page_alloc.sampling to a value
higher than 1 allows to lower the slowdown. The performance improvement
saturates around the sampling interval value of 10 with the default
sampling page order of 3. This lowers the slowdown to ~20%. The slowdown
in real scenarios involving the network will likely be better.
Enabling page_alloc sampling has a downside: KASAN misses bad accesses to
a page_alloc allocation that has not been tagged. This lowers the value
of KASAN as a security mitigation.
However, based on measuring the number of page_alloc allocations of
different orders during boot in a test build, sampling with the default
kasan.page_alloc.sample.order value affects only ~7% of allocations. The
rest ~93% of allocations are still checked deterministically.
Link: https://lkml.kernel.org/r/129da0614123bb85ed4dd61ae30842b2dd7c903f.1671471846.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Brand <markbrand@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 238286329
Bug: 264310057
(cherry picked from commit 44383cef54 git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-stable)
Change-Id: I85f9eb4e93eeddff8f8d06238f433226affca177
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
commit 599d41fb8e upstream.
APR should not fail if the service device tree node does not have
the qcom,protection-domain property, since this functionality does
not exist on older platforms such as MSM8916 and MSM8996.
Ignore -EINVAL (returned when the property does not exist) to fix
a regression on 6.2-rc1 that prevents audio from working:
qcom,apr remoteproc0:smd-edge.apr_audio_svc.-1.-1:
Failed to read second value of qcom,protection-domain
qcom,apr remoteproc0:smd-edge.apr_audio_svc.-1.-1:
Failed to add apr 3 svc
Fixes: 6d7860f575 ("soc: qcom: apr: Add check for idr_alloc and of_property_read_string_index")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20221229151648.19839-3-stephan@gerhold.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55ba18dc62 upstream.
The commit 4af1b64f80 ("octeontx2-pf: Fix lmtst ID used in aura
free") uses the get/put_cpu() to protect the usage of percpu pointer
in ->aura_freeptr() callback, but it also unnecessarily disable the
preemption for the blockable memory allocation. The commit 87b93b678e
("octeontx2-pf: Avoid use of GFP_KERNEL in atomic context") tried to
fix these sleep inside atomic warnings. But it only fix the one for
the non-rt kernel. For the rt kernel, we still get the similar warnings
like below.
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by swapper/0/1:
#0: ffff800009fc5fe8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30
#1: ffff000100c276c0 (&mbox->lock){+.+.}-{3:3}, at: otx2_init_hw_resources+0x8c/0x3a4
#2: ffffffbfef6537e0 (&cpu_rcache->lock){+.+.}-{2:2}, at: alloc_iova_fast+0x1ac/0x2ac
Preemption disabled at:
[<ffff800008b1908c>] otx2_rq_aura_pool_init+0x14c/0x284
CPU: 20 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc3-rt1-yocto-preempt-rt #1
Hardware name: Marvell OcteonTX CN96XX board (DT)
Call trace:
dump_backtrace.part.0+0xe8/0xf4
show_stack+0x20/0x30
dump_stack_lvl+0x9c/0xd8
dump_stack+0x18/0x34
__might_resched+0x188/0x224
rt_spin_lock+0x64/0x110
alloc_iova_fast+0x1ac/0x2ac
iommu_dma_alloc_iova+0xd4/0x110
__iommu_dma_map+0x80/0x144
iommu_dma_map_page+0xe8/0x260
dma_map_page_attrs+0xb4/0xc0
__otx2_alloc_rbuf+0x90/0x150
otx2_rq_aura_pool_init+0x1c8/0x284
otx2_init_hw_resources+0xe4/0x3a4
otx2_open+0xf0/0x610
__dev_open+0x104/0x224
__dev_change_flags+0x1e4/0x274
dev_change_flags+0x2c/0x7c
ic_open_devs+0x124/0x2f8
ip_auto_config+0x180/0x42c
do_one_initcall+0x90/0x4dc
do_basic_setup+0x10c/0x14c
kernel_init_freeable+0x10c/0x13c
kernel_init+0x2c/0x140
ret_from_fork+0x10/0x20
Of course, we can shuffle the get/put_cpu() to only wrap the invocation
of ->aura_freeptr() as what commit 87b93b678e does. But there are only
two ->aura_freeptr() callbacks, otx2_aura_freeptr() and
cn10k_aura_freeptr(). There is no usage of perpcu variable in the
otx2_aura_freeptr() at all, so the get/put_cpu() seems redundant to it.
We can move the get/put_cpu() into the corresponding callback which
really has the percpu variable usage and avoid the sprinkling of
get/put_cpu() in several places.
Fixes: 4af1b64f80 ("octeontx2-pf: Fix lmtst ID used in aura free")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Link: https://lore.kernel.org/r/20230118071300.3271125-1-haokexin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ccc99362b upstream.
The referenced commit changed the error code returned by the kernel
when preventing a non-established socket from attaching the ktls
ULP. Before to such a commit, the user-space got ENOTCONN instead
of EINVAL.
The existing self-tests depend on such error code, and the change
caused a failure:
RUN global.non_established ...
tls.c:1673:non_established:Expected errno (22) == ENOTCONN (107)
non_established: Test failed at step #3
FAIL global.non_established
In the unlikely event existing applications do the same, address
the issue by restoring the prior error code in the above scenario.
Note that the only other ULP performing similar checks at init
time - smc_ulp_ops - also fails with ENOTCONN when trying to attach
the ULP to a non-established socket.
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Fixes: 2c02d41d71 ("net/ulp: prevent ULP without clone op from entering the LISTEN status")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/7bb199e7a93317fb6f8bf8b9b2dc71c18f337cde.1674042685.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55228db269 upstream.
WG14 N2350 specifies that it is an undefined behavior to have type
definitions within offsetof", see
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2350.htm
This specification is also part of C23.
Therefore, replace the TYPE_ALIGN macro with the _Alignof builtin to
avoid undefined behavior. (_Alignof itself is C11 and the kernel is
built with -gnu11).
ISO C11 _Alignof is subtly different from the GNU C extension
__alignof__. Latter is the preferred alignment and _Alignof the
minimal alignment. For long long on x86 these are 8 and 4
respectively.
The macro TYPE_ALIGN's behavior matches _Alignof rather than
__alignof__.
[ bp: Massage commit message. ]
Signed-off-by: YingChi Long <me@inclyc.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20220925153151.2467884-1-me@inclyc.cn
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4ccd54d28 upstream.
Many Linux systems are configured to not panic on oops; but allowing an
attacker to oops the system **really** often can make even bugs that look
completely unexploitable exploitable (like NULL dereferences and such) if
each crash elevates a refcount by one or a lock is taken in read mode, and
this causes a counter to eventually overflow.
The most interesting counters for this are 32 bits wide (like open-coded
refcounts that don't use refcount_t). (The ldsem reader count on 32-bit
platforms is just 16 bits, but probably nobody cares about 32-bit platforms
that much nowadays.)
So let's panic the system if the kernel is constantly oopsing.
The speed of oopsing 2^32 times probably depends on several factors, like
how long the stack trace is and which unwinder you're using; an empirically
important one is whether your console is showing a graphical environment or
a text console that oopses will be printed to.
In a quick single-threaded benchmark, it looks like oopsing in a vfork()
child with a very short stack trace only takes ~510 microseconds per run
when a graphical console is active; but switching to a text console that
oopses are printed to slows it down around 87x, to ~45 milliseconds per
run.
(Adding more threads makes this faster, but the actual oops printing
happens under &die_lock on x86, so you can maybe speed this up by a factor
of around 2 and then any further improvement gets eaten up by lock
contention.)
It looks like it would take around 8-12 days to overflow a 32-bit counter
with repeated oopsing on a multi-core X86 system running a graphical
environment; both me (in an X86 VM) and Seth (with a distro kernel on
normal hardware in a standard configuration) got numbers in that ballpark.
12 days aren't *that* short on a desktop system, and you'd likely need much
longer on a typical server system (assuming that people don't run graphical
desktop environments on their servers), and this is a *very* noisy and
violent approach to exploiting the kernel; and it also seems to take orders
of magnitude longer on some machines, probably because stuff like EFI
pstore will slow it down a ton if that's active.
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20221107201317.324457-1-jannh@google.com
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-2-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>