Commit Graph

985159 Commits

Author SHA1 Message Date
Jeson Gao
64999249d5 ANDROID: thermal: Add hook to enable/disable thermal power throttle
By default, thermal power throttle is always enable, but sometimes it
need to be disabled for a period of time, so add it to meet platform
thermal requirement.

Bug: 209386157

Signed-off-by: Jeson Gao <jeson.gao@unisoc.com>
Change-Id: If9c53a9669eec8e2821d837cfa3c660a9cfbf934
2021-12-07 20:44:29 +00:00
Donnie Pollitz
d1b2876104 ANDROID: GKI: Add symbols abi for USB IP kernel modules.
Bug: 207100354
Test: Manually, Emulator boots up.

Signed-off-by: Donnie Pollitz <donpollitz@google.com>
Change-Id: If60fdd76bf70f956d8e9c534924c5c796e75d0b4
2021-12-07 17:36:20 +00:00
David Brazdil
8a8500235e Revert "ANDROID: KVM: arm64: Unmap S2MPU MMIO regions in MPT"
Unmapping S2MPU MMIO regions from MPTs is causing issues on oriole/raven
because it forces a 4K mapping in the first physical GB region instead
of 1G. During suspend AOC/APM can issue traffic through the S2MPUs and
the page lookup transaction fails because the MIF is powered off.

Revert the patch until the problem is fixed by AOC/APM device drivers.

This reverts commit 533c59945d.

Test: unplug device, wait until it suspends, observe no crash
Bug: 190463801
Bug: 209399107
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I9a84ccf4ace459dc35918fc31d86933bc5b923f7
2021-12-07 10:32:54 +00:00
Alex Hong
f398bdcb07 ANDROID: Update the ABI symbol list
Update the generic symbol list.

Bug: 199698959
Signed-off-by: Alex Hong <rurumihong@google.com>
Change-Id: I68b786d210bb5303bf462d3798510d4789212be8
2021-12-07 08:37:24 +08:00
Jaegeuk Kim
35cfa55917 FROMGIT: f2fs: show number of pending discard commands
This information can be used to check how much time we need to give to issue
all the discard commands.

Bug: 206863097
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit fc4ae5492ca4afd7a8a9d261f4908b09f221d314
 git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Change-Id: Ibd2f1d6c171f584ec9ca3817d9ea561db98f4693
2021-12-06 18:58:38 +00:00
David Brazdil
395d045123 ANDROID: KVM: arm64: Initialize pkvm_pgtable.mm_ops earlier
The `init` callback of an IOMMU driver is called just before
`finalize_host_mappings` so that EL2 mappings created by drivers are
subsequently unmapped from host stage-2. However, at this point hyp has
already switched to the buddy allocator, having reserved pages allocated
by the early allocator, but `pkvm_pgtable.mm_ops` have not been switched
to buddy allocator callbacks. As a result, pages allocated for EL2
mappings of the IOMMU driver are allocated by the obsoleted early
allocator and remain treated as free by the buddy allocator. This likely
leads to a corruption in the free page lists and a later hyp panic.

Move the initialization of `pkvm_pgtable.mm_ops` before
`finalize_host_mappings` and the call to IOMMU's `init`.

Test: run a VM
Test: adb shell cmd jobscheduler run -f android 5132250
Bug: 190463801
Bug: 209004831
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I42ad28e6b4cfb014d50dbbf722e4c02d5f684178
2021-12-06 11:33:50 +00:00
Konstantin Vyshetsky
ee507cfc11 ANDROID: ABI: update generic symbol list
add blk_crypto_init_key

Bug: 202417931
Signed-off-by: Konstantin Vyshetsky <vkon@google.com>
Change-Id: I6976ec1519ec9f5fcc4d557cccfce8a26335ce17
2021-12-03 11:00:27 -08:00
Quentin Perret
52ec143f0b ANDROID: sched: Make uclamp changes depend on CAP_SYS_NICE
There is currently nothing preventing tasks from changing their per-task
clamp values in anyway that they like. The rationale is probably that
system administrators are still able to limit those clamps thanks to the
cgroup interface. However, this causes pain in a system where both
per-task and per-cgroup clamp values are expected to be under the
control of core system components (as is the case for Android).

To fix this, let's require CAP_SYS_NICE to change per-task clamp values.
There are ongoing discussions upstream about more flexible approaches
than this using the RLIMIT API -- see [1]. But the upstream discussion
has not converged yet, and this is way too late for UAPI changes in
android12-5.10 anyway, so let's apply this change which provides the
behaviour we want without actually impacting UAPIs.

[1] https://lore.kernel.org/lkml/20210623123441.592348-4-qperret@google.com/

Bug: 187186685
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I749312a77306460318ac5374cf243d00b78120dd
2021-12-03 18:10:01 +00:00
Martin Liu
4266de9b4b ANDROID: sched: move blocked reason trace point to cover all class
Now, we only export CFS taks' blocked reasons but it's
important and useful to know other class' blocked
reasons such as RT tasks.

Move the blocked reason trace point to where the scheduler
core layer and before the task's state moves to the waking
state. Thus, we could cover all the sched classes.

Bug: 203080186
Test: check traces
Signed-off-by: Martin Liu <liumartin@google.com>
Change-Id: Ic61865642d852d0127cdcf474adf8c06e4c2d570
(cherry picked from commit 44447dec6e)
Signed-off-by: Quentin Perret <qperret@google.com>
2021-12-03 18:09:21 +00:00
vincent.wang
cf33d6fae0 ANDROID: vendor_hooks: add a hook to control the delay time of frequency up and down.
balance the power and performance via the different delay time.

Bug: 208722787

Signed-off-by: vincent.wang <vincent.wang@unisoc.com>
Change-Id: I104a090bc697ab9c68007081e6533820364f1351
2021-12-03 05:42:33 +00:00
David Brazdil
160ebf93a9 UPSTREAM: of: restricted dma: Fix condition for rmem init
of_dma_set_restricted_buffer fails to handle negative return values from
of_property_count_elems_of_size, e.g. when the property does not exist.
This results in an attempt to assign a non-existent reserved memory
region to the device and a warning being printed. Fix the condition to
take negative values into account.

Fixes: f3cfd136ae ("of: restricted dma: Don't fail device probe on rmem init failure")
Cc: Will Deacon <will@kernel.org>
Signed-off-by: David Brazdil <dbrazdil@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210917131423.2760155-1-dbrazdil@google.com
Signed-off-by: Rob Herring <robh@kernel.org>
(cherry picked from commit 31c8025fac)
Bug: 190591509
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I2a5ea9ba6f78d2998cb5db040a2e13e85ec0194f
2021-12-02 09:43:37 +00:00
Jeson Gao
fc827b344f ANDROID: thermal: Add vendor hooks for thermal
Need to get the request frequency and target frequency
use it to do frequency check and modify it according to
the platform thermal requirements.

Bug: 208166320

Signed-off-by: Jeson Gao <jeson.gao@unisoc.com>
Change-Id: I776b43c8f559b8a072abd8d3abcb3528348b2c5d
2021-12-01 18:46:06 +00:00
Quentin Perret
479f72db7d UPSTREAM: sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS
SCHED_FLAG_KEEP_PARAMS can be passed to sched_setattr to specify that
the call must not touch scheduling parameters (nice or priority). This
is particularly handy for uclamp when used in conjunction with
SCHED_FLAG_KEEP_POLICY as that allows to issue a syscall that only
impacts uclamp values.

However, sched_setattr always checks whether the priorities and nice
values passed in sched_attr are valid first, even if those never get
used down the line. This is useless at best since userspace can
trivially bypass this check to set the uclamp values by specifying low
priorities. However, it is cumbersome to do so as there is no single
expression of this that skips both RT and CFS checks at once. As such,
userspace needs to query the task policy first with e.g. sched_getattr
and then set sched_attr.sched_priority accordingly. This is racy and
slower than a single call.

As the priority and nice checks are useless when SCHED_FLAG_KEEP_PARAMS
is specified, simply inherit them in this case to match the policy
inheritance of SCHED_FLAG_KEEP_POLICY.

Bug: 208552362
Reported-by: Wei Wang <wvw@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Link: https://lore.kernel.org/r/20210805102154.590709-3-qperret@google.com
(cherry picked from commit f4dddf90d5)
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I18913667e69e558d96cbe3d991c1920e8342bf8b
2021-12-01 18:04:02 +00:00
Quentin Perret
c7ba583d00 UPSTREAM: sched: Don't report SCHED_FLAG_SUGOV in sched_getattr()
SCHED_FLAG_SUGOV is supposed to be a kernel-only flag that userspace
cannot interact with. However, sched_getattr() currently reports it
in sched_flags if called on a sugov worker even though it is not
actually defined in a UAPI header. To avoid this, make sure to
clean-up the sched_flags field in sched_getattr() before returning to
userspace.

Bug: 208552362
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210727101103.2729607-3-qperret@google.com
(cherry picked from commit 7ad721bf10)
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I6ffe847ce0cc1135519b7061882d118cc8d683e1
2021-12-01 18:03:56 +00:00
Huang Jianan
13b2af2668 UPSTREAM: erofs: fix deadlock when shrink erofs slab
We observed the following deadlock in the stress test under low
memory scenario:

Thread A                               Thread B
- erofs_shrink_scan
 - erofs_try_to_release_workgroup
  - erofs_workgroup_try_to_freeze -- A
                                       - z_erofs_do_read_page
                                        - z_erofs_collection_begin
                                         - z_erofs_register_collection
                                          - erofs_insert_workgroup
                                           - xa_lock(&sbi->managed_pslots) -- B
                                           - erofs_workgroup_get
                                            - erofs_wait_on_workgroup_freezed -- A
  - xa_erase
   - xa_lock(&sbi->managed_pslots) -- B

To fix this, it needs to hold xa_lock before freezing the workgroup
since xarray will be touched then. So let's hold the lock before
accessing each workgroup, just like what we did with the radix tree
before.

[ Gao Xiang: Jianhua Hao also reports this issue at
  https://lore.kernel.org/r/b10b85df30694bac8aadfe43537c897a@xiaomi.com ]

Link: https://lore.kernel.org/r/20211118135844.3559-1-huangjianan@oppo.com
Fixes: 64094a0441 ("erofs: convert workstn to XArray")
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Reported-by: Jianhua Hao <haojianhua1@xiaomi.com>
Signed-off-by: Gao Xiang <xiang@kernel.org>
Signed-off-by: haojianhua1 <haojianhua1@xiaomi.com>
Change-Id: Icdfb520c073de8773d44cff64fd5f7a8499cdc85
Signed-off-by: haojianhua1 <haojianhua1@xiaomi.com>

Bug:208318427
(cherry picked from commit 57bbeacdbe)
Signed-off-by: Gao Xiang <xiang@kernel.org>

Change-Id: Ie282b4362898fa8e0c829b8c2454887e953a2610
Signed-off-by: haojianhua1 <haojianhua1@xiaomi.com>
2021-12-01 17:48:15 +00:00
Catalin Marinas
02e966de95 UPSTREAM: KVM: arm64: Avoid setting the upper 32 bits of TCR_EL2 and CPTR_EL2 to 1
Having a signed (1 << 31) constant for TCR_EL2_RES1 and CPTR_EL2_TCPAC
causes the upper 32-bit to be set to 1 when assigning them to a 64-bit
variable. Bit 32 in TCR_EL2 is no longer RES0 in ARMv8.7: with FEAT_LPA2
it changes the meaning of bits 49:48 and 9:8 in the stage 1 EL2 page
table entries. As a result of the sign-extension, a non-VHE kernel can
no longer boot on a model with ARMv8.7 enabled.

CPTR_EL2 still has the top 32 bits RES0 but we should preempt any future
problems

Make these top bit constants unsigned as per commit df655b75c4
("arm64: KVM: Avoid setting the upper 32 bits of VTCR_EL2 to 1").

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Chris January <Chris.January@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211125152014.2806582-1-catalin.marinas@arm.com
(cherry picked from commit 1f80d15020)
Bug: 204960018
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I078b87529df898086e0f883f28bd2e9ff7f0e74a
2021-12-01 15:09:15 +00:00
Marc Zyngier
cb3cc13c72 UPSTREAM: KVM: arm64: Move pkvm's special 32bit handling into a generic infrastructure
Protected KVM is trying to turn AArch32 exceptions into an illegal
exception entry. Unfortunately, it does that in a way that is a bit
abrupt, and too early for PSTATE to be available.

Instead, move it to the fixup code, which is a more reasonable place
for it. This will also be useful for the NV code.

Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 7183b2b5ae)
Bug: 204960018
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I9e8926e8c9cd9399073216abb8885a3e2613836f
2021-12-01 15:09:15 +00:00
Marc Zyngier
e740c119d2 UPSTREAM: KVM: arm64: Save PSTATE early on exit
In order to be able to use primitives such as vcpu_mode_is_32bit(),
we need to synchronize the guest PSTATE. However, this is currently
done deep into the bowels of the world-switch code, and we do have
helpers evaluating this much earlier (__vgic_v3_perform_cpuif_access
and handle_aarch32_guest, for example).

Move the saving of the guest pstate into the early fixups, which
cures the first issue. The second one will be addressed separately.

Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
(cherry picked from commit 83bb2c1a01)
Bug: 204960018
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Iddbe02a20d34c470651da59390fb7f76ba89ba4b
2021-12-01 15:09:15 +00:00
Vitaly Kuznetsov
df21a6213e UPSTREAM: KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
Generally, it doesn't make sense to return the recommended maximum number
of vCPUs which exceeds the maximum possible number of vCPUs.

Note: ARM64 is special as the value returned by KVM_CAP_MAX_VCPUS differs
depending on whether it is a system-wide ioctl or a per-VM one. Previously,
KVM_CAP_NR_VCPUS didn't have this difference and it seems preferable to
keep the status quo. Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
which is what gets returned by system-wide KVM_CAP_MAX_VCPUS.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211116163443.88707-2-vkuznets@redhat.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit f60a00d729)
Bug: 204960018
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I8f4470d8a79d111b8f4a2730f97ec1bde4d76935
2021-12-01 15:09:15 +00:00
Greg Kroah-Hartman
6a79abcd18 Merge 5.10.83 into android13-5.10
Changes in 5.10.83
	bpf: Fix toctou on read-only map's constant scalar tracking
	ACPI: Get acpi_device's parent from the parent field
	USB: serial: option: add Telit LE910S1 0x9200 composition
	USB: serial: option: add Fibocom FM101-GL variants
	usb: dwc2: gadget: Fix ISOC flow for elapsed frames
	usb: dwc2: hcd_queue: Fix use of floating point literal
	usb: dwc3: gadget: Ignore NoStream after End Transfer
	usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer
	usb: dwc3: gadget: Fix null pointer exception
	net: nexthop: fix null pointer dereference when IPv6 is not enabled
	usb: chipidea: ci_hdrc_imx: fix potential error pointer dereference in probe
	usb: typec: fusb302: Fix masking of comparator and bc_lvl interrupts
	usb: hub: Fix usb enumeration issue due to address0 race
	usb: hub: Fix locking issues with address0_mutex
	binder: fix test regression due to sender_euid change
	ALSA: ctxfi: Fix out-of-range access
	ALSA: hda/realtek: Add quirk for ASRock NUC Box 1100
	ALSA: hda/realtek: Fix LED on HP ProBook 435 G7
	media: cec: copy sequence field for the reply
	Revert "parisc: Fix backtrace to always include init funtion names"
	HID: wacom: Use "Confidence" flag to prevent reporting invalid contacts
	staging/fbtft: Fix backlight
	staging: greybus: Add missing rwsem around snd_ctl_remove() calls
	staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect()
	fuse: release pipe buf after last use
	xen: don't continue xenstore initialization in case of errors
	xen: detect uninitialized xenbus in xenbus_init
	KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
	tracing/uprobe: Fix uprobe_perf_open probes iteration
	tracing: Fix pid filtering when triggers are attached
	mmc: sdhci-esdhc-imx: disable CMDQ support
	mmc: sdhci: Fix ADMA for PAGE_SIZE >= 64KiB
	mdio: aspeed: Fix "Link is Down" issue
	powerpc/32: Fix hardlockup on vmap stack overflow
	PCI: aardvark: Deduplicate code in advk_pcie_rd_conf()
	PCI: aardvark: Update comment about disabling link training
	PCI: aardvark: Implement re-issuing config requests on CRS response
	PCI: aardvark: Simplify initialization of rootcap on virtual bridge
	PCI: aardvark: Fix link training
	proc/vmcore: fix clearing user buffer by properly using clear_user()
	netfilter: ctnetlink: fix filtering with CTA_TUPLE_REPLY
	netfilter: ctnetlink: do not erase error code with EINVAL
	netfilter: ipvs: Fix reuse connection if RS weight is 0
	netfilter: flowtable: fix IPv6 tunnel addr match
	ARM: dts: BCM5301X: Fix I2C controller interrupt
	ARM: dts: BCM5301X: Add interrupt properties to GPIO node
	ARM: dts: bcm2711: Fix PCIe interrupts
	ASoC: qdsp6: q6routing: Conditionally reset FrontEnd Mixer
	ASoC: qdsp6: q6asm: fix q6asm_dai_prepare error handling
	ASoC: topology: Add missing rwsem around snd_ctl_remove() calls
	ASoC: codecs: wcd934x: return error code correctly from hw_params
	net: ieee802154: handle iftypes as u32
	firmware: arm_scmi: pm: Propagate return value to caller
	NFSv42: Don't fail clone() unless the OP_CLONE operation failed
	ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE
	drm/nouveau/acr: fix a couple NULL vs IS_ERR() checks
	scsi: mpt3sas: Fix kernel panic during drive powercycle test
	drm/vc4: fix error code in vc4_create_object()
	net: marvell: prestera: fix double free issue on err path
	iavf: Prevent changing static ITR values if adaptive moderation is on
	ALSA: intel-dsp-config: add quirk for JSL devices based on ES8336 codec
	mptcp: fix delack timer
	firmware: smccc: Fix check for ARCH_SOC_ID not implemented
	ipv6: fix typos in __ip6_finish_output()
	nfp: checking parameter process for rx-usecs/tx-usecs is invalid
	net: stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume
	net: stmmac: retain PTP clock time during SIOCSHWTSTAMP ioctls
	net: ipv6: add fib6_nh_release_dsts stub
	net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group
	ice: fix vsi->txq_map sizing
	ice: avoid bpf_prog refcount underflow
	scsi: core: sysfs: Fix setting device state to SDEV_RUNNING
	scsi: scsi_debug: Zero clear zones at reset write pointer
	erofs: fix deadlock when shrink erofs slab
	net/smc: Ensure the active closing peer first closes clcsock
	mlxsw: Verify the accessed index doesn't exceed the array length
	mlxsw: spectrum: Protect driver from buggy firmware
	net: marvell: mvpp2: increase MTU limit when XDP enabled
	nvmet-tcp: fix incomplete data digest send
	net/ncsi : Add payload to be 32-bit aligned to fix dropped packets
	PM: hibernate: use correct mode for swsusp_close()
	drm/amd/display: Set plane update flags for all planes in reset
	tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows
	lan743x: fix deadlock in lan743x_phy_link_status_change()
	net: phylink: Force link down and retrigger resolve on interface change
	net: phylink: Force retrigger in case of latched link-fail indicator
	net/smc: Fix NULL pointer dereferencing in smc_vlan_by_tcpsk()
	net/smc: Fix loop in smc_listen
	nvmet: use IOCB_NOWAIT only if the filesystem supports it
	igb: fix netpoll exit with traffic
	MIPS: loongson64: fix FTLB configuration
	MIPS: use 3-level pgtable for 64KB page size on MIPS_VA_BITS_48
	tls: splice_read: fix record type check
	tls: fix replacing proto_ops
	net/sched: sch_ets: don't peek at classes beyond 'nbands'
	net: vlan: fix underflow for the real_dev refcnt
	net/smc: Don't call clcsock shutdown twice when smc shutdown
	net: hns3: fix VF RSS failed problem after PF enable multi-TCs
	net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
	net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
	tcp: correctly handle increased zerocopy args struct size
	sched/scs: Reset task stack state in bringup_cpu()
	f2fs: set SBI_NEED_FSCK flag when inconsistent node block found
	ceph: properly handle statfs on multifs setups
	smb3: do not error on fsync when readonly
	iommu/amd: Clarify AMD IOMMUv2 initialization messages
	vhost/vsock: fix incorrect used length reported to the guest
	tracing: Check pid filtering when creating events
	xen: sync include/xen/interface/io/ring.h with Xen's newest version
	xen/blkfront: read response from backend only once
	xen/blkfront: don't take local copy of a request from the ring page
	xen/blkfront: don't trust the backend response data blindly
	xen/netfront: read response from backend only once
	xen/netfront: don't read data from request on the ring page
	xen/netfront: disentangle tx_skb_freelist
	xen/netfront: don't trust the backend response data blindly
	tty: hvc: replace BUG_ON() with negative return value
	s390/mm: validate VMA in PGSTE manipulation functions
	shm: extend forced shm destroy to support objects from several IPC nses
	net: stmmac: platform: fix build warning when with !CONFIG_PM_SLEEP
	drm/amdgpu/gfx9: switch to golden tsc registers for renoir+
	Linux 5.10.83

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I934dc727030cfb60b31525252df104436ff00ae0
2021-12-01 09:37:11 +01:00
Greg Kroah-Hartman
a324ad7945 Linux 5.10.83
Link: https://lore.kernel.org/r/20211129181711.642046348@linuxfoundation.org
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Hulk Robot <hulkrobot@huawei.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Fox Chen <foxhlchen@gmail.com>
Tested-by: Pavel Machek (CIP) <pavel@denx.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:10 +01:00
Alex Deucher
45b42cd053 drm/amdgpu/gfx9: switch to golden tsc registers for renoir+
commit 53af98c091 upstream.

Renoir and newer gfx9 APUs have new TSC register that is
not part of the gfxoff tile, so it can be read without
needing to disable gfx off.

Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:10 +01:00
Joakim Zhang
98b02755d5 net: stmmac: platform: fix build warning when with !CONFIG_PM_SLEEP
commit 2a48d96fd5 upstream.

Use __maybe_unused for noirq_suspend()/noirq_resume() hooks to avoid
build warning with !CONFIG_PM_SLEEP:

>> drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:796:12: error: 'stmmac_pltfr_noirq_resume' defined but not used [-Werror=unused-function]
     796 | static int stmmac_pltfr_noirq_resume(struct device *dev)
         |            ^~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:775:12: error: 'stmmac_pltfr_noirq_suspend' defined but not used [-Werror=unused-function]
     775 | static int stmmac_pltfr_noirq_suspend(struct device *dev)
         |            ^~~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: all warnings being treated as errors

Fixes: 276aae3772 ("net: stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:10 +01:00
Alexander Mikhalitsyn
a15261d2a1 shm: extend forced shm destroy to support objects from several IPC nses
commit 85b6d24646 upstream.

Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.

This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).

This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.

To achieve that we do several things:

1. add a namespace (non-refcounted) pointer to the struct shmid_kernel

2. during new shm object creation (newseg()/shmget syscall) we
   initialize this pointer by current task IPC ns

3. exit_shm() fully reworked such that it traverses over all shp's in
   task->sysvshm.shm_clist and gets IPC namespace not from current task
   as it was before but from shp's object itself, then call
   shm_destroy(shp, ns).

Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed.  To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".

Q/A

Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
   lifetime, so, if we get shp object from the task->sysvshm.shm_clist
   while holding task_lock(task) nobody can steal our namespace.

Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
   namespace without getting task->sysvshm.shm_clist list cleaned up.

Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife.com
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f7991 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:10 +01:00
David Hildenbrand
aa20e966d8 s390/mm: validate VMA in PGSTE manipulation functions
commit fe3d100240 upstream.

We should not walk/touch page tables outside of VMA boundaries when
holding only the mmap sem in read mode. Evil user space can modify the
VMA layout just before this function runs and e.g., trigger races with
page table removal code since commit dd2283f260 ("mm: mmap: zap pages
with read mmap_sem in munmap"). gfn_to_hva() will only translate using
KVM memory regions, but won't validate the VMA.

Further, we should not allocate page tables outside of VMA boundaries: if
evil user space decides to map hugetlbfs to these ranges, bad things will
happen because we suddenly have PTE or PMD page tables where we
shouldn't have them.

Similarly, we have to check if we suddenly find a hugetlbfs VMA, before
calling get_locked_pte().

Fixes: 2d42f94773 ("s390/kvm: Add PGSTE manipulation functions")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20210909162248.14969-4-david@redhat.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:10 +01:00
Juergen Gross
a94e4a7b77 tty: hvc: replace BUG_ON() with negative return value
commit e679004dec upstream.

Xen frontends shouldn't BUG() in case of illegal data received from
their backends. So replace the BUG_ON()s when reading illegal data from
the ring page with negative return values.

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210707091045.460-1-jgross@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:10 +01:00
Juergen Gross
1c5f722a8f xen/netfront: don't trust the backend response data blindly
commit a884daa61a upstream.

Today netfront will trust the backend to send only sane response data.
In order to avoid privilege escalations or crashes in case of malicious
backends verify the data to be within expected limits. Especially make
sure that the response always references an outstanding request.

Note that only the tx queue needs special id handling, as for the rx
queue the id is equal to the index in the ring page.

Introduce a new indicator for the device whether it is broken and let
the device stop working when it is set. Set this indicator in case the
backend sets any weird data.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:09 +01:00
Juergen Gross
334b0f2787 xen/netfront: disentangle tx_skb_freelist
commit 21631d2d74 upstream.

The tx_skb_freelist elements are in a single linked list with the
request id used as link reference. The per element link field is in a
union with the skb pointer of an in use request.

Move the link reference out of the union in order to enable a later
reuse of it for requests which need a populated skb pointer.

Rename add_id_to_freelist() and get_id_from_freelist() to
add_id_to_list() and get_id_from_list() in order to prepare using
those for other lists as well. Define ~0 as value to indicate the end
of a list and place that value into the link for a request not being
on the list.

When freeing a skb zero the skb pointer in the request. Use a NULL
value of the skb pointer instead of skb_entry_is_link() for deciding
whether a request has a skb linked to it.

Remove skb_entry_set_link() and open code it instead as it is really
trivial now.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:09 +01:00
Juergen Gross
e17ee047ee xen/netfront: don't read data from request on the ring page
commit 162081ec33 upstream.

In order to avoid a malicious backend being able to influence the local
processing of a request build the request locally first and then copy
it to the ring page. Any reading from the request influencing the
processing in the frontend needs to be done on the local instance.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:09 +01:00
Juergen Gross
f5e4937098 xen/netfront: read response from backend only once
commit 8446066bf8 upstream.

In order to avoid problems in case the backend is modifying a response
on the ring page while the frontend has already seen it, just read the
response into a local buffer in one go and then operate on that buffer
only.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:09 +01:00
Juergen Gross
1ffb20f052 xen/blkfront: don't trust the backend response data blindly
commit b94e4b147f upstream.

Today blkfront will trust the backend to send only sane response data.
In order to avoid privilege escalations or crashes in case of malicious
backends verify the data to be within expected limits. Especially make
sure that the response always references an outstanding request.

Introduce a new state of the ring BLKIF_STATE_ERROR which will be
switched to in case an inconsistency is being detected. Recovering from
this state is possible only via removing and adding the virtual device
again (e.g. via a suspend/resume cycle).

Make all warning messages issued due to valid error responses rate
limited in order to avoid message floods being triggered by a malicious
backend.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Link: https://lore.kernel.org/r/20210730103854.12681-4-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:09 +01:00
Juergen Gross
8e147855fc xen/blkfront: don't take local copy of a request from the ring page
commit 8f5a695d99 upstream.

In order to avoid a malicious backend being able to influence the local
copy of a request build the request locally first and then copy it to
the ring page instead of doing it the other way round as today.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Link: https://lore.kernel.org/r/20210730103854.12681-3-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:09 +01:00
Juergen Gross
273f04d5d1 xen/blkfront: read response from backend only once
commit 71b66243f9 upstream.

In order to avoid problems in case the backend is modifying a response
on the ring page while the frontend has already seen it, just read the
response into a local buffer in one go and then operate on that buffer
only.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Link: https://lore.kernel.org/r/20210730103854.12681-2-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:09 +01:00
Juergen Gross
b98284aa3f xen: sync include/xen/interface/io/ring.h with Xen's newest version
commit 629a5d87e2 upstream.

Sync include/xen/interface/io/ring.h with Xen's newest version in
order to get the RING_COPY_RESPONSE() and RING_RESPONSE_PROD_OVERFLOW()
macros.

Note that this will correct the wrong license info by adding the
missing original copyright notice.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:09 +01:00
Steven Rostedt (VMware)
406f2d5fe3 tracing: Check pid filtering when creating events
commit 6cb206508b upstream.

When pid filtering is activated in an instance, all of the events trace
files for that instance has the PID_FILTER flag set. This determines
whether or not pid filtering needs to be done on the event, otherwise the
event is executed as normal.

If pid filtering is enabled when an event is created (via a dynamic event
or modules), its flag is not updated to reflect the current state, and the
events are not filtered properly.

Cc: stable@vger.kernel.org
Fixes: 3fdaf80f4a ("tracing: Implement event pid filtering")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:09 +01:00
Stefano Garzarella
4fd0ad08ee vhost/vsock: fix incorrect used length reported to the guest
commit 49d8c5ffad upstream.

The "used length" reported by calling vhost_add_used() must be the
number of bytes written by the device (using "in" buffers).

In vhost_vsock_handle_tx_kick() the device only reads the guest
buffers (they are all "out" buffers), without writing anything,
so we must pass 0 as "used length" to comply virtio spec.

Fixes: 433fc58e6b ("VSOCK: Introduce vhost_vsock.ko")
Cc: stable@vger.kernel.org
Reported-by: Halil Pasic <pasic@linux.ibm.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20211122163525.294024-2-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:09 +01:00
Joerg Roedel
fbc0514e1a iommu/amd: Clarify AMD IOMMUv2 initialization messages
commit 717e88aad3 upstream.

The messages printed on the initialization of the AMD IOMMUv2 driver
have caused some confusion in the past. Clarify the messages to lower
the confusion in the future.

Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20211123105507.7654-3-joro@8bytes.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:19:09 +01:00
Steve French
5655b8bccb smb3: do not error on fsync when readonly
[ Upstream commit 71e6864eac ]

Linux allows doing a flush/fsync on a file open for read-only,
but the protocol does not allow that.  If the file passed in
on the flush is read-only try to find a writeable handle for
the same inode, if that is not possible skip sending the
fsync call to the server to avoid breaking the apps.

Reported-by: Julian Sikorski <belegdol@gmail.com>
Tested-by: Julian Sikorski <belegdol@gmail.com>
Suggested-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01 09:19:08 +01:00
Jeff Layton
c380062d08 ceph: properly handle statfs on multifs setups
[ Upstream commit 8cfc0c7ed3 ]

ceph_statfs currently stuffs the cluster fsid into the f_fsid field.
This was fine when we only had a single filesystem per cluster, but now
that we have multiples we need to use something that will vary between
them.

Change ceph_statfs to xor each 32-bit chunk of the fsid (aka cluster id)
into the lower bits of the statfs->f_fsid. Change the lower bits to hold
the fscid (filesystem ID within the cluster).

That should give us a value that is guaranteed to be unique between
filesystems within a cluster, and should minimize the chance of
collisions between mounts of different clusters.

URL: https://tracker.ceph.com/issues/52812
Reported-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01 09:19:08 +01:00
Weichao Guo
22423c966e f2fs: set SBI_NEED_FSCK flag when inconsistent node block found
[ Upstream commit 6663b138de ]

Inconsistent node block will cause a file fail to open or read,
which could make the user process crashes or stucks. Let's mark
SBI_NEED_FSCK flag to trigger a fix at next fsck time. After
unlinking the corrupted file, the user process could regenerate
a new one and work correctly.

Signed-off-by: Weichao Guo <guoweichao@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01 09:19:08 +01:00
Mark Rutland
e6ee7abd6b sched/scs: Reset task stack state in bringup_cpu()
[ Upstream commit dce1ca0525 ]

To hot unplug a CPU, the idle task on that CPU calls a few layers of C
code before finally leaving the kernel. When KASAN is in use, poisoned
shadow is left around for each of the active stack frames, and when
shadow call stacks are in use. When shadow call stacks (SCS) are in use
the task's saved SCS SP is left pointing at an arbitrary point within
the task's shadow call stack.

When a CPU is offlined than onlined back into the kernel, this stale
state can adversely affect execution. Stale KASAN shadow can alias new
stackframes and result in bogus KASAN warnings. A stale SCS SP is
effectively a memory leak, and prevents a portion of the shadow call
stack being used. Across a number of hotplug cycles the idle task's
entire shadow call stack can become unusable.

We previously fixed the KASAN issue in commit:

  e1b77c9298 ("sched/kasan: remove stale KASAN poison after hotplug")

... by removing any stale KASAN stack poison immediately prior to
onlining a CPU.

Subsequently in commit:

  f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")

... the refactoring left the KASAN and SCS cleanup in one-time idle
thread initialization code rather than something invoked prior to each
CPU being onlined, breaking both as above.

We fixed SCS (but not KASAN) in commit:

  63acd42c0d ("sched/scs: Reset the shadow stack when idle_task_exit")

... but as this runs in the context of the idle task being offlined it's
potentially fragile.

To fix these consistently and more robustly, reset the SCS SP and KASAN
shadow of a CPU's idle task immediately before we online that CPU in
bringup_cpu(). This ensures the idle task always has a consistent state
when it is running, and removes the need to so so when exiting an idle
task.

Whenever any thread is created, dup_task_struct() will give the task a
stack which is free of KASAN shadow, and initialize the task's SCS SP,
so there's no need to specially initialize either for idle thread within
init_idle(), as this was only necessary to handle hotplug cycles.

I've tested this on arm64 with:

* gcc 11.1.0, defconfig +KASAN_INLINE, KASAN_STACK
* clang 12.0.0, defconfig +KASAN_INLINE, KASAN_STACK, SHADOW_CALL_STACK

... offlining and onlining CPUS with:

| while true; do
|   for C in /sys/devices/system/cpu/cpu*/online; do
|     echo 0 > $C;
|     echo 1 > $C;
|   done
| done

Fixes: f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Link: https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01 09:19:08 +01:00
Arjun Roy
71e38a0c7c tcp: correctly handle increased zerocopy args struct size
[ Upstream commit e0fecb289a ]

A prior patch increased the size of struct tcp_zerocopy_receive
but did not update do_tcp_getsockopt() handling to properly account
for this.

This patch simply reintroduces content erroneously cut from the
referenced prior patch that handles the new struct size.

Fixes: 18fb76ed53 ("net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy.")
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01 09:19:08 +01:00
Vladimir Oltean
72f2117e45 net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
[ Upstream commit c49a35eedf ]

The driver doesn't support RX timestamping for non-PTP packets, but it
declares that it does. Restrict the reported RX filters to PTP v2 over
L2 and over L4.

Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01 09:19:08 +01:00
Vladimir Oltean
73115a2b38 net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
[ Upstream commit 8a075464d1 ]

The ocelot driver, when asked to timestamp all receiving packets, 1588
v1 or NTP, says "nah, here's 1588 v2 for you".

According to this discussion:
https://patchwork.kernel.org/project/netdevbpf/patch/20211104133204.19757-8-martin.kaistra@linutronix.de/#24577647
drivers that downgrade from a wider request to a narrower response (or
even a response where the intersection with the request is empty) are
buggy, and should return -ERANGE instead. This patch fixes that.

Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Suggested-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01 09:19:08 +01:00
Guangbin Huang
62343dadbb net: hns3: fix VF RSS failed problem after PF enable multi-TCs
[ Upstream commit 8d2ad993aa ]

When PF is set to multi-TCs and configured mapping relationship between
priorities and TCs, the hardware will active these settings for this PF
and its VFs.

In this case when VF just uses one TC and its rx packets contain priority,
and if the priority is not mapped to TC0, as other TCs of VF is not valid,
hardware always put this kind of packets to the queue 0. It cause this kind
of packets of VF can not be used RSS function.

To fix this problem, set tc mode of all unused TCs of VF to the setting of
TC0, then rx packet with priority which map to unused TC will be direct to
TC0.

Fixes: e2cb1dec97 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01 09:19:08 +01:00
Tony Lu
215167df45 net/smc: Don't call clcsock shutdown twice when smc shutdown
[ Upstream commit bacb6c1e47 ]

When applications call shutdown() with SHUT_RDWR in userspace,
smc_close_active() calls kernel_sock_shutdown(), and it is called
twice in smc_shutdown().

This fixes this by checking sk_state before do clcsock shutdown, and
avoids missing the application's call of smc_shutdown().

Link: https://lore.kernel.org/linux-s390/1f67548e-cbf6-0dce-82b5-10288a4583bd@linux.ibm.com/
Fixes: 606a63c978 ("net/smc: Ensure the active closing peer first closes clcsock")
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20211126024134.45693-1-tonylu@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01 09:19:08 +01:00
Ziyang Xuan
6e800ee432 net: vlan: fix underflow for the real_dev refcnt
[ Upstream commit 01d9cc2dea ]

Inject error before dev_hold(real_dev) in register_vlan_dev(),
and execute the following testcase:

ip link add dev dummy1 type dummy
ip link add name dummy1.100 link dummy1 type vlan id 100
ip link del dev dummy1

When the dummy netdevice is removed, we will get a WARNING as following:

=======================================================================
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 2 PID: 0 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0

and an endless loop of:

=======================================================================
unregister_netdevice: waiting for dummy1 to become free. Usage count = -1073741824

That is because dev_put(real_dev) in vlan_dev_free() be called without
dev_hold(real_dev) in register_vlan_dev(). It makes the refcnt of real_dev
underflow.

Move the dev_hold(real_dev) to vlan_dev_init() which is the call-back of
ndo_init(). That makes dev_hold() and dev_put() for vlan's real_dev
symmetrical.

Fixes: 563bcbae3b ("net: vlan: fix a UAF in vlan_dev_real_dev()")
Reported-by: Petr Machata <petrm@nvidia.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Link: https://lore.kernel.org/r/20211126015942.2918542-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01 09:19:08 +01:00
Davide Caratti
ae2659d2c6 net/sched: sch_ets: don't peek at classes beyond 'nbands'
[ Upstream commit de6d25924c ]

when the number of DRR classes decreases, the round-robin active list can
contain elements that have already been freed in ets_qdisc_change(). As a
consequence, it's possible to see a NULL dereference crash, caused by the
attempt to call cl->qdisc->ops->peek(cl->qdisc) when cl->qdisc is NULL:

 BUG: kernel NULL pointer dereference, address: 0000000000000018
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP NOPTI
 CPU: 1 PID: 910 Comm: mausezahn Not tainted 5.16.0-rc1+ #475
 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
 RIP: 0010:ets_qdisc_dequeue+0x129/0x2c0 [sch_ets]
 Code: c5 01 41 39 ad e4 02 00 00 0f 87 18 ff ff ff 49 8b 85 c0 02 00 00 49 39 c4 0f 84 ba 00 00 00 49 8b ad c0 02 00 00 48 8b 7d 10 <48> 8b 47 18 48 8b 40 38 0f ae e8 ff d0 48 89 c3 48 85 c0 0f 84 9d
 RSP: 0000:ffffbb36c0b5fdd8 EFLAGS: 00010287
 RAX: ffff956678efed30 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000002 RSI: ffffffff9b938dc9 RDI: 0000000000000000
 RBP: ffff956678efed30 R08: e2f3207fe360129c R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000001 R12: ffff956678efeac0
 R13: ffff956678efe800 R14: ffff956611545000 R15: ffff95667ac8f100
 FS:  00007f2aa9120740(0000) GS:ffff95667b800000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000018 CR3: 000000011070c000 CR4: 0000000000350ee0
 Call Trace:
  <TASK>
  qdisc_peek_dequeued+0x29/0x70 [sch_ets]
  tbf_dequeue+0x22/0x260 [sch_tbf]
  __qdisc_run+0x7f/0x630
  net_tx_action+0x290/0x4c0
  __do_softirq+0xee/0x4f8
  irq_exit_rcu+0xf4/0x130
  sysvec_apic_timer_interrupt+0x52/0xc0
  asm_sysvec_apic_timer_interrupt+0x12/0x20
 RIP: 0033:0x7f2aa7fc9ad4
 Code: b9 ff ff 48 8b 54 24 18 48 83 c4 08 48 89 ee 48 89 df 5b 5d e9 ed fc ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa <53> 48 83 ec 10 48 8b 05 10 64 33 00 48 8b 00 48 85 c0 0f 85 84 00
 RSP: 002b:00007ffe5d33fab8 EFLAGS: 00000202
 RAX: 0000000000000002 RBX: 0000561f72c31460 RCX: 0000561f72c31720
 RDX: 0000000000000002 RSI: 0000561f72c31722 RDI: 0000561f72c31720
 RBP: 000000000000002a R08: 00007ffe5d33fa40 R09: 0000000000000014
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000561f7187e380
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000561f72c31460
  </TASK>
 Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt intel_rapl_msr iTCO_vendor_support intel_rapl_common joydev virtio_balloon lpc_ich i2c_i801 i2c_smbus pcspkr ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel serio_raw libata virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000018

Ensuring that 'alist' was never zeroed [1] was not sufficient, we need to
remove from the active list those elements that are no more SP nor DRR.

[1] https://lore.kernel.org/netdev/60d274838bf09777f0371253416e8af71360bc08.1633609148.git.dcaratti@redhat.com/

v3: fix race between ets_qdisc_change() and ets_qdisc_dequeue() delisting
    DRR classes beyond 'nbands' in ets_qdisc_change() with the qdisc lock
    acquired, thanks to Cong Wang.

v2: when a NULL qdisc is found in the DRR active list, try to dequeue skb
    from the next list item.

Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: dcc68b4d80 ("net: sch_ets: Add a new Qdisc")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/7a5c496eed2d62241620bdbb83eb03fb9d571c99.1637762721.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01 09:19:07 +01:00
Jakub Kicinski
e3509feb46 tls: fix replacing proto_ops
[ Upstream commit f3911f73f5 ]

We replace proto_ops whenever TLS is configured for RX. But our
replacement also overrides sendpage_locked, which will crash
unless TX is also configured. Similarly we plug both of those
in for TLS_HW (NIC crypto offload) even tho TLS_HW has a completely
different implementation for TX.

Last but not least we always plug in something based on inet_stream_ops
even though a few of the callbacks differ for IPv6 (getname, release,
bind).

Use a callback building method similar to what we do for struct proto.

Fixes: c46234ebb4 ("tls: RX path for ktls")
Fixes: d4ffb02dee ("net/tls: enable sk_msg redirect to tls socket egress")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01 09:19:07 +01:00
Jakub Kicinski
22156242b1 tls: splice_read: fix record type check
[ Upstream commit 520493f66f ]

We don't support splicing control records. TLS 1.3 changes moved
the record type check into the decrypt if(). The skb may already
be decrypted and still be an alert.

Note that decrypt_skb_update() is idempotent and updates ctx->decrypted
so the if() is pointless.

Reorder the check for decryption errors with the content type check
while touching them. This part is not really a bug, because if
decryption failed in TLS 1.3 content type will be DATA, and for
TLS 1.2 it will be correct. Nevertheless its strange to touch output
before checking if the function has failed.

Fixes: fedf201e12 ("net: tls: Refactor control message handling on recv")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01 09:19:07 +01:00