linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-06 10:58:48 +09:00

Author	SHA1	Message	Date
Greg Kroah-Hartman	6a79abcd18	Merge 5.10.83 into android13-5.10 Changes in 5.10.83 bpf: Fix toctou on read-only map's constant scalar tracking ACPI: Get acpi_device's parent from the parent field USB: serial: option: add Telit LE910S1 0x9200 composition USB: serial: option: add Fibocom FM101-GL variants usb: dwc2: gadget: Fix ISOC flow for elapsed frames usb: dwc2: hcd_queue: Fix use of floating point literal usb: dwc3: gadget: Ignore NoStream after End Transfer usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer usb: dwc3: gadget: Fix null pointer exception net: nexthop: fix null pointer dereference when IPv6 is not enabled usb: chipidea: ci_hdrc_imx: fix potential error pointer dereference in probe usb: typec: fusb302: Fix masking of comparator and bc_lvl interrupts usb: hub: Fix usb enumeration issue due to address0 race usb: hub: Fix locking issues with address0_mutex binder: fix test regression due to sender_euid change ALSA: ctxfi: Fix out-of-range access ALSA: hda/realtek: Add quirk for ASRock NUC Box 1100 ALSA: hda/realtek: Fix LED on HP ProBook 435 G7 media: cec: copy sequence field for the reply Revert "parisc: Fix backtrace to always include init funtion names" HID: wacom: Use "Confidence" flag to prevent reporting invalid contacts staging/fbtft: Fix backlight staging: greybus: Add missing rwsem around snd_ctl_remove() calls staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect() fuse: release pipe buf after last use xen: don't continue xenstore initialization in case of errors xen: detect uninitialized xenbus in xenbus_init KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB tracing/uprobe: Fix uprobe_perf_open probes iteration tracing: Fix pid filtering when triggers are attached mmc: sdhci-esdhc-imx: disable CMDQ support mmc: sdhci: Fix ADMA for PAGE_SIZE >= 64KiB mdio: aspeed: Fix "Link is Down" issue powerpc/32: Fix hardlockup on vmap stack overflow PCI: aardvark: Deduplicate code in advk_pcie_rd_conf() PCI: aardvark: Update comment about disabling link training PCI: aardvark: Implement re-issuing config requests on CRS response PCI: aardvark: Simplify initialization of rootcap on virtual bridge PCI: aardvark: Fix link training proc/vmcore: fix clearing user buffer by properly using clear_user() netfilter: ctnetlink: fix filtering with CTA_TUPLE_REPLY netfilter: ctnetlink: do not erase error code with EINVAL netfilter: ipvs: Fix reuse connection if RS weight is 0 netfilter: flowtable: fix IPv6 tunnel addr match ARM: dts: BCM5301X: Fix I2C controller interrupt ARM: dts: BCM5301X: Add interrupt properties to GPIO node ARM: dts: bcm2711: Fix PCIe interrupts ASoC: qdsp6: q6routing: Conditionally reset FrontEnd Mixer ASoC: qdsp6: q6asm: fix q6asm_dai_prepare error handling ASoC: topology: Add missing rwsem around snd_ctl_remove() calls ASoC: codecs: wcd934x: return error code correctly from hw_params net: ieee802154: handle iftypes as u32 firmware: arm_scmi: pm: Propagate return value to caller NFSv42: Don't fail clone() unless the OP_CLONE operation failed ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE drm/nouveau/acr: fix a couple NULL vs IS_ERR() checks scsi: mpt3sas: Fix kernel panic during drive powercycle test drm/vc4: fix error code in vc4_create_object() net: marvell: prestera: fix double free issue on err path iavf: Prevent changing static ITR values if adaptive moderation is on ALSA: intel-dsp-config: add quirk for JSL devices based on ES8336 codec mptcp: fix delack timer firmware: smccc: Fix check for ARCH_SOC_ID not implemented ipv6: fix typos in __ip6_finish_output() nfp: checking parameter process for rx-usecs/tx-usecs is invalid net: stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume net: stmmac: retain PTP clock time during SIOCSHWTSTAMP ioctls net: ipv6: add fib6_nh_release_dsts stub net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group ice: fix vsi->txq_map sizing ice: avoid bpf_prog refcount underflow scsi: core: sysfs: Fix setting device state to SDEV_RUNNING scsi: scsi_debug: Zero clear zones at reset write pointer erofs: fix deadlock when shrink erofs slab net/smc: Ensure the active closing peer first closes clcsock mlxsw: Verify the accessed index doesn't exceed the array length mlxsw: spectrum: Protect driver from buggy firmware net: marvell: mvpp2: increase MTU limit when XDP enabled nvmet-tcp: fix incomplete data digest send net/ncsi : Add payload to be 32-bit aligned to fix dropped packets PM: hibernate: use correct mode for swsusp_close() drm/amd/display: Set plane update flags for all planes in reset tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows lan743x: fix deadlock in lan743x_phy_link_status_change() net: phylink: Force link down and retrigger resolve on interface change net: phylink: Force retrigger in case of latched link-fail indicator net/smc: Fix NULL pointer dereferencing in smc_vlan_by_tcpsk() net/smc: Fix loop in smc_listen nvmet: use IOCB_NOWAIT only if the filesystem supports it igb: fix netpoll exit with traffic MIPS: loongson64: fix FTLB configuration MIPS: use 3-level pgtable for 64KB page size on MIPS_VA_BITS_48 tls: splice_read: fix record type check tls: fix replacing proto_ops net/sched: sch_ets: don't peek at classes beyond 'nbands' net: vlan: fix underflow for the real_dev refcnt net/smc: Don't call clcsock shutdown twice when smc shutdown net: hns3: fix VF RSS failed problem after PF enable multi-TCs net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP net: mscc: ocelot: correctly report the timestamping RX filters in ethtool tcp: correctly handle increased zerocopy args struct size sched/scs: Reset task stack state in bringup_cpu() f2fs: set SBI_NEED_FSCK flag when inconsistent node block found ceph: properly handle statfs on multifs setups smb3: do not error on fsync when readonly iommu/amd: Clarify AMD IOMMUv2 initialization messages vhost/vsock: fix incorrect used length reported to the guest tracing: Check pid filtering when creating events xen: sync include/xen/interface/io/ring.h with Xen's newest version xen/blkfront: read response from backend only once xen/blkfront: don't take local copy of a request from the ring page xen/blkfront: don't trust the backend response data blindly xen/netfront: read response from backend only once xen/netfront: don't read data from request on the ring page xen/netfront: disentangle tx_skb_freelist xen/netfront: don't trust the backend response data blindly tty: hvc: replace BUG_ON() with negative return value s390/mm: validate VMA in PGSTE manipulation functions shm: extend forced shm destroy to support objects from several IPC nses net: stmmac: platform: fix build warning when with !CONFIG_PM_SLEEP drm/amdgpu/gfx9: switch to golden tsc registers for renoir+ Linux 5.10.83 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I934dc727030cfb60b31525252df104436ff00ae0	2021-12-01 09:37:11 +01:00
Greg Kroah-Hartman	a324ad7945	Linux 5.10.83 Link: https://lore.kernel.org/r/20211129181711.642046348@linuxfoundation.org Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Hulk Robot <hulkrobot@huawei.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Fox Chen <foxhlchen@gmail.com> Tested-by: Pavel Machek (CIP) <pavel@denx.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:10 +01:00
Alex Deucher	45b42cd053	drm/amdgpu/gfx9: switch to golden tsc registers for renoir+ commit `53af98c091` upstream. Renoir and newer gfx9 APUs have new TSC register that is not part of the gfxoff tile, so it can be read without needing to disable gfx off. Acked-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:10 +01:00
Joakim Zhang	98b02755d5	net: stmmac: platform: fix build warning when with !CONFIG_PM_SLEEP commit `2a48d96fd5` upstream. Use __maybe_unused for noirq_suspend()/noirq_resume() hooks to avoid build warning with !CONFIG_PM_SLEEP: >> drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:796:12: error: 'stmmac_pltfr_noirq_resume' defined but not used [-Werror=unused-function] 796 \| static int stmmac_pltfr_noirq_resume(struct device dev) \| ^~~~~~~~~~~~~~~~~~~~~~~~~ >> drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:775:12: error: 'stmmac_pltfr_noirq_suspend' defined but not used [-Werror=unused-function] 775 \| static int stmmac_pltfr_noirq_suspend(struct device dev) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Fixes: `276aae3772` ("net: stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:10 +01:00
Alexander Mikhalitsyn	a15261d2a1	shm: extend forced shm destroy to support objects from several IPC nses commit `85b6d24646` upstream. Currently, the exit_shm() function not designed to work properly when task->sysvshm.shm_clist holds shm objects from different IPC namespaces. This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it leads to use-after-free (reproducer exists). This is an attempt to fix the problem by extending exit_shm mechanism to handle shm's destroy from several IPC ns'es. To achieve that we do several things: 1. add a namespace (non-refcounted) pointer to the struct shmid_kernel 2. during new shm object creation (newseg()/shmget syscall) we initialize this pointer by current task IPC ns 3. exit_shm() fully reworked such that it traverses over all shp's in task->sysvshm.shm_clist and gets IPC namespace not from current task as it was before but from shp's object itself, then call shm_destroy(shp, ns). Note: We need to be really careful here, because as it was said before (1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we using special helper get_ipc_ns_not_zero() which allows to get IPC ns refcounter only if IPC ns not in the "state of destruction". Q/A Q: Why can we access shp->ns memory using non-refcounted pointer? A: Because shp object lifetime is always shorther than IPC namespace lifetime, so, if we get shp object from the task->sysvshm.shm_clist while holding task_lock(task) nobody can steal our namespace. Q: Does this patch change semantics of unshare/setns/clone syscalls? A: No. It's just fixes non-covered case when process may leave IPC namespace without getting task->sysvshm.shm_clist list cleaned up. Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife.com Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com Fixes: `ab602f7991` ("shm: make exit_shm work proportional to task activity") Co-developed-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Andrei Vagin <avagin@gmail.com> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:10 +01:00
David Hildenbrand	aa20e966d8	s390/mm: validate VMA in PGSTE manipulation functions commit `fe3d100240` upstream. We should not walk/touch page tables outside of VMA boundaries when holding only the mmap sem in read mode. Evil user space can modify the VMA layout just before this function runs and e.g., trigger races with page table removal code since commit `dd2283f260` ("mm: mmap: zap pages with read mmap_sem in munmap"). gfn_to_hva() will only translate using KVM memory regions, but won't validate the VMA. Further, we should not allocate page tables outside of VMA boundaries: if evil user space decides to map hugetlbfs to these ranges, bad things will happen because we suddenly have PTE or PMD page tables where we shouldn't have them. Similarly, we have to check if we suddenly find a hugetlbfs VMA, before calling get_locked_pte(). Fixes: `2d42f94773` ("s390/kvm: Add PGSTE manipulation functions") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210909162248.14969-4-david@redhat.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:10 +01:00
Juergen Gross	a94e4a7b77	tty: hvc: replace BUG_ON() with negative return value commit `e679004dec` upstream. Xen frontends shouldn't BUG() in case of illegal data received from their backends. So replace the BUG_ON()s when reading illegal data from the ring page with negative return values. Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20210707091045.460-1-jgross@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:10 +01:00
Juergen Gross	1c5f722a8f	xen/netfront: don't trust the backend response data blindly commit `a884daa61a` upstream. Today netfront will trust the backend to send only sane response data. In order to avoid privilege escalations or crashes in case of malicious backends verify the data to be within expected limits. Especially make sure that the response always references an outstanding request. Note that only the tx queue needs special id handling, as for the rx queue the id is equal to the index in the ring page. Introduce a new indicator for the device whether it is broken and let the device stop working when it is set. Set this indicator in case the backend sets any weird data. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:09 +01:00
Juergen Gross	334b0f2787	xen/netfront: disentangle tx_skb_freelist commit `21631d2d74` upstream. The tx_skb_freelist elements are in a single linked list with the request id used as link reference. The per element link field is in a union with the skb pointer of an in use request. Move the link reference out of the union in order to enable a later reuse of it for requests which need a populated skb pointer. Rename add_id_to_freelist() and get_id_from_freelist() to add_id_to_list() and get_id_from_list() in order to prepare using those for other lists as well. Define ~0 as value to indicate the end of a list and place that value into the link for a request not being on the list. When freeing a skb zero the skb pointer in the request. Use a NULL value of the skb pointer instead of skb_entry_is_link() for deciding whether a request has a skb linked to it. Remove skb_entry_set_link() and open code it instead as it is really trivial now. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:09 +01:00
Juergen Gross	e17ee047ee	xen/netfront: don't read data from request on the ring page commit `162081ec33` upstream. In order to avoid a malicious backend being able to influence the local processing of a request build the request locally first and then copy it to the ring page. Any reading from the request influencing the processing in the frontend needs to be done on the local instance. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:09 +01:00
Juergen Gross	f5e4937098	xen/netfront: read response from backend only once commit `8446066bf8` upstream. In order to avoid problems in case the backend is modifying a response on the ring page while the frontend has already seen it, just read the response into a local buffer in one go and then operate on that buffer only. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:09 +01:00
Juergen Gross	1ffb20f052	xen/blkfront: don't trust the backend response data blindly commit `b94e4b147f` upstream. Today blkfront will trust the backend to send only sane response data. In order to avoid privilege escalations or crashes in case of malicious backends verify the data to be within expected limits. Especially make sure that the response always references an outstanding request. Introduce a new state of the ring BLKIF_STATE_ERROR which will be switched to in case an inconsistency is being detected. Recovering from this state is possible only via removing and adding the virtual device again (e.g. via a suspend/resume cycle). Make all warning messages issued due to valid error responses rate limited in order to avoid message floods being triggered by a malicious backend. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210730103854.12681-4-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:09 +01:00
Juergen Gross	8e147855fc	xen/blkfront: don't take local copy of a request from the ring page commit `8f5a695d99` upstream. In order to avoid a malicious backend being able to influence the local copy of a request build the request locally first and then copy it to the ring page instead of doing it the other way round as today. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210730103854.12681-3-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:09 +01:00
Juergen Gross	273f04d5d1	xen/blkfront: read response from backend only once commit `71b66243f9` upstream. In order to avoid problems in case the backend is modifying a response on the ring page while the frontend has already seen it, just read the response into a local buffer in one go and then operate on that buffer only. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210730103854.12681-2-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:09 +01:00
Juergen Gross	b98284aa3f	xen: sync include/xen/interface/io/ring.h with Xen's newest version commit `629a5d87e2` upstream. Sync include/xen/interface/io/ring.h with Xen's newest version in order to get the RING_COPY_RESPONSE() and RING_RESPONSE_PROD_OVERFLOW() macros. Note that this will correct the wrong license info by adding the missing original copyright notice. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:09 +01:00
Steven Rostedt (VMware)	406f2d5fe3	tracing: Check pid filtering when creating events commit `6cb206508b` upstream. When pid filtering is activated in an instance, all of the events trace files for that instance has the PID_FILTER flag set. This determines whether or not pid filtering needs to be done on the event, otherwise the event is executed as normal. If pid filtering is enabled when an event is created (via a dynamic event or modules), its flag is not updated to reflect the current state, and the events are not filtered properly. Cc: stable@vger.kernel.org Fixes: `3fdaf80f4a` ("tracing: Implement event pid filtering") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:09 +01:00
Stefano Garzarella	4fd0ad08ee	vhost/vsock: fix incorrect used length reported to the guest commit `49d8c5ffad` upstream. The "used length" reported by calling vhost_add_used() must be the number of bytes written by the device (using "in" buffers). In vhost_vsock_handle_tx_kick() the device only reads the guest buffers (they are all "out" buffers), without writing anything, so we must pass 0 as "used length" to comply virtio spec. Fixes: `433fc58e6b` ("VSOCK: Introduce vhost_vsock.ko") Cc: stable@vger.kernel.org Reported-by: Halil Pasic <pasic@linux.ibm.com> Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20211122163525.294024-2-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:09 +01:00
Joerg Roedel	fbc0514e1a	iommu/amd: Clarify AMD IOMMUv2 initialization messages commit `717e88aad3` upstream. The messages printed on the initialization of the AMD IOMMUv2 driver have caused some confusion in the past. Clarify the messages to lower the confusion in the future. Cc: stable@vger.kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20211123105507.7654-3-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-01 09:19:09 +01:00
Steve French	5655b8bccb	smb3: do not error on fsync when readonly [ Upstream commit `71e6864eac` ] Linux allows doing a flush/fsync on a file open for read-only, but the protocol does not allow that. If the file passed in on the flush is read-only try to find a writeable handle for the same inode, if that is not possible skip sending the fsync call to the server to avoid breaking the apps. Reported-by: Julian Sikorski <belegdol@gmail.com> Tested-by: Julian Sikorski <belegdol@gmail.com> Suggested-by: Jeremy Allison <jra@samba.org> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:08 +01:00
Jeff Layton	c380062d08	ceph: properly handle statfs on multifs setups [ Upstream commit `8cfc0c7ed3` ] ceph_statfs currently stuffs the cluster fsid into the f_fsid field. This was fine when we only had a single filesystem per cluster, but now that we have multiples we need to use something that will vary between them. Change ceph_statfs to xor each 32-bit chunk of the fsid (aka cluster id) into the lower bits of the statfs->f_fsid. Change the lower bits to hold the fscid (filesystem ID within the cluster). That should give us a value that is guaranteed to be unique between filesystems within a cluster, and should minimize the chance of collisions between mounts of different clusters. URL: https://tracker.ceph.com/issues/52812 Reported-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:08 +01:00
Weichao Guo	22423c966e	f2fs: set SBI_NEED_FSCK flag when inconsistent node block found [ Upstream commit `6663b138de` ] Inconsistent node block will cause a file fail to open or read, which could make the user process crashes or stucks. Let's mark SBI_NEED_FSCK flag to trigger a fix at next fsck time. After unlinking the corrupted file, the user process could regenerate a new one and work correctly. Signed-off-by: Weichao Guo <guoweichao@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:08 +01:00
Mark Rutland	e6ee7abd6b	sched/scs: Reset task stack state in bringup_cpu() [ Upstream commit `dce1ca0525` ] To hot unplug a CPU, the idle task on that CPU calls a few layers of C code before finally leaving the kernel. When KASAN is in use, poisoned shadow is left around for each of the active stack frames, and when shadow call stacks are in use. When shadow call stacks (SCS) are in use the task's saved SCS SP is left pointing at an arbitrary point within the task's shadow call stack. When a CPU is offlined than onlined back into the kernel, this stale state can adversely affect execution. Stale KASAN shadow can alias new stackframes and result in bogus KASAN warnings. A stale SCS SP is effectively a memory leak, and prevents a portion of the shadow call stack being used. Across a number of hotplug cycles the idle task's entire shadow call stack can become unusable. We previously fixed the KASAN issue in commit: `e1b77c9298` ("sched/kasan: remove stale KASAN poison after hotplug") ... by removing any stale KASAN stack poison immediately prior to onlining a CPU. Subsequently in commit: `f1a0a376ca` ("sched/core: Initialize the idle task with preemption disabled") ... the refactoring left the KASAN and SCS cleanup in one-time idle thread initialization code rather than something invoked prior to each CPU being onlined, breaking both as above. We fixed SCS (but not KASAN) in commit: `63acd42c0d` ("sched/scs: Reset the shadow stack when idle_task_exit") ... but as this runs in the context of the idle task being offlined it's potentially fragile. To fix these consistently and more robustly, reset the SCS SP and KASAN shadow of a CPU's idle task immediately before we online that CPU in bringup_cpu(). This ensures the idle task always has a consistent state when it is running, and removes the need to so so when exiting an idle task. Whenever any thread is created, dup_task_struct() will give the task a stack which is free of KASAN shadow, and initialize the task's SCS SP, so there's no need to specially initialize either for idle thread within init_idle(), as this was only necessary to handle hotplug cycles. I've tested this on arm64 with: * gcc 11.1.0, defconfig +KASAN_INLINE, KASAN_STACK * clang 12.0.0, defconfig +KASAN_INLINE, KASAN_STACK, SHADOW_CALL_STACK ... offlining and onlining CPUS with: \| while true; do \| for C in /sys/devices/system/cpu/cpu*/online; do \| echo 0 > $C; \| echo 1 > $C; \| done \| done Fixes: `f1a0a376ca` ("sched/core: Initialize the idle task with preemption disabled") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Qian Cai <quic_qiancai@quicinc.com> Link: https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/ Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:08 +01:00
Arjun Roy	71e38a0c7c	tcp: correctly handle increased zerocopy args struct size [ Upstream commit `e0fecb289a` ] A prior patch increased the size of struct tcp_zerocopy_receive but did not update do_tcp_getsockopt() handling to properly account for this. This patch simply reintroduces content erroneously cut from the referenced prior patch that handles the new struct size. Fixes: `18fb76ed53` ("net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy.") Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:08 +01:00
Vladimir Oltean	72f2117e45	net: mscc: ocelot: correctly report the timestamping RX filters in ethtool [ Upstream commit `c49a35eedf` ] The driver doesn't support RX timestamping for non-PTP packets, but it declares that it does. Restrict the reported RX filters to PTP v2 over L2 and over L4. Fixes: `4e3b0468e6` ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:08 +01:00
Vladimir Oltean	73115a2b38	net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP [ Upstream commit `8a075464d1` ] The ocelot driver, when asked to timestamp all receiving packets, 1588 v1 or NTP, says "nah, here's 1588 v2 for you". According to this discussion: https://patchwork.kernel.org/project/netdevbpf/patch/20211104133204.19757-8-martin.kaistra@linutronix.de/#24577647 drivers that downgrade from a wider request to a narrower response (or even a response where the intersection with the request is empty) are buggy, and should return -ERANGE instead. This patch fixes that. Fixes: `4e3b0468e6` ("net: mscc: PTP Hardware Clock (PHC) support") Suggested-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:08 +01:00
Guangbin Huang	62343dadbb	net: hns3: fix VF RSS failed problem after PF enable multi-TCs [ Upstream commit `8d2ad993aa` ] When PF is set to multi-TCs and configured mapping relationship between priorities and TCs, the hardware will active these settings for this PF and its VFs. In this case when VF just uses one TC and its rx packets contain priority, and if the priority is not mapped to TC0, as other TCs of VF is not valid, hardware always put this kind of packets to the queue 0. It cause this kind of packets of VF can not be used RSS function. To fix this problem, set tc mode of all unused TCs of VF to the setting of TC0, then rx packet with priority which map to unused TC will be direct to TC0. Fixes: `e2cb1dec97` ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:08 +01:00
Tony Lu	215167df45	net/smc: Don't call clcsock shutdown twice when smc shutdown [ Upstream commit `bacb6c1e47` ] When applications call shutdown() with SHUT_RDWR in userspace, smc_close_active() calls kernel_sock_shutdown(), and it is called twice in smc_shutdown(). This fixes this by checking sk_state before do clcsock shutdown, and avoids missing the application's call of smc_shutdown(). Link: https://lore.kernel.org/linux-s390/1f67548e-cbf6-0dce-82b5-10288a4583bd@linux.ibm.com/ Fixes: `606a63c978` ("net/smc: Ensure the active closing peer first closes clcsock") Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20211126024134.45693-1-tonylu@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:08 +01:00
Ziyang Xuan	6e800ee432	net: vlan: fix underflow for the real_dev refcnt [ Upstream commit `01d9cc2dea` ] Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute the following testcase: ip link add dev dummy1 type dummy ip link add name dummy1.100 link dummy1 type vlan id 100 ip link del dev dummy1 When the dummy netdevice is removed, we will get a WARNING as following: ======================================================================= refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 2 PID: 0 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 and an endless loop of: ======================================================================= unregister_netdevice: waiting for dummy1 to become free. Usage count = -1073741824 That is because dev_put(real_dev) in vlan_dev_free() be called without dev_hold(real_dev) in register_vlan_dev(). It makes the refcnt of real_dev underflow. Move the dev_hold(real_dev) to vlan_dev_init() which is the call-back of ndo_init(). That makes dev_hold() and dev_put() for vlan's real_dev symmetrical. Fixes: `563bcbae3b` ("net: vlan: fix a UAF in vlan_dev_real_dev()") Reported-by: Petr Machata <petrm@nvidia.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Link: https://lore.kernel.org/r/20211126015942.2918542-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:08 +01:00
Davide Caratti	ae2659d2c6	net/sched: sch_ets: don't peek at classes beyond 'nbands' [ Upstream commit `de6d25924c` ] when the number of DRR classes decreases, the round-robin active list can contain elements that have already been freed in ets_qdisc_change(). As a consequence, it's possible to see a NULL dereference crash, caused by the attempt to call cl->qdisc->ops->peek(cl->qdisc) when cl->qdisc is NULL: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 910 Comm: mausezahn Not tainted 5.16.0-rc1+ #475 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 RIP: 0010:ets_qdisc_dequeue+0x129/0x2c0 [sch_ets] Code: c5 01 41 39 ad e4 02 00 00 0f 87 18 ff ff ff 49 8b 85 c0 02 00 00 49 39 c4 0f 84 ba 00 00 00 49 8b ad c0 02 00 00 48 8b 7d 10 <48> 8b 47 18 48 8b 40 38 0f ae e8 ff d0 48 89 c3 48 85 c0 0f 84 9d RSP: 0000:ffffbb36c0b5fdd8 EFLAGS: 00010287 RAX: ffff956678efed30 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff9b938dc9 RDI: 0000000000000000 RBP: ffff956678efed30 R08: e2f3207fe360129c R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff956678efeac0 R13: ffff956678efe800 R14: ffff956611545000 R15: ffff95667ac8f100 FS: 00007f2aa9120740(0000) GS:ffff95667b800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 000000011070c000 CR4: 0000000000350ee0 Call Trace: <TASK> qdisc_peek_dequeued+0x29/0x70 [sch_ets] tbf_dequeue+0x22/0x260 [sch_tbf] __qdisc_run+0x7f/0x630 net_tx_action+0x290/0x4c0 __do_softirq+0xee/0x4f8 irq_exit_rcu+0xf4/0x130 sysvec_apic_timer_interrupt+0x52/0xc0 asm_sysvec_apic_timer_interrupt+0x12/0x20 RIP: 0033:0x7f2aa7fc9ad4 Code: b9 ff ff 48 8b 54 24 18 48 83 c4 08 48 89 ee 48 89 df 5b 5d e9 ed fc ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa <53> 48 83 ec 10 48 8b 05 10 64 33 00 48 8b 00 48 85 c0 0f 85 84 00 RSP: 002b:00007ffe5d33fab8 EFLAGS: 00000202 RAX: 0000000000000002 RBX: 0000561f72c31460 RCX: 0000561f72c31720 RDX: 0000000000000002 RSI: 0000561f72c31722 RDI: 0000561f72c31720 RBP: 000000000000002a R08: 00007ffe5d33fa40 R09: 0000000000000014 R10: 0000000000000000 R11: 0000000000000246 R12: 0000561f7187e380 R13: 0000000000000000 R14: 0000000000000000 R15: 0000561f72c31460 </TASK> Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt intel_rapl_msr iTCO_vendor_support intel_rapl_common joydev virtio_balloon lpc_ich i2c_i801 i2c_smbus pcspkr ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel serio_raw libata virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000018 Ensuring that 'alist' was never zeroed [1] was not sufficient, we need to remove from the active list those elements that are no more SP nor DRR. [1] https://lore.kernel.org/netdev/60d274838bf09777f0371253416e8af71360bc08.1633609148.git.dcaratti@redhat.com/ v3: fix race between ets_qdisc_change() and ets_qdisc_dequeue() delisting DRR classes beyond 'nbands' in ets_qdisc_change() with the qdisc lock acquired, thanks to Cong Wang. v2: when a NULL qdisc is found in the DRR active list, try to dequeue skb from the next list item. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: `dcc68b4d80` ("net: sch_ets: Add a new Qdisc") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/7a5c496eed2d62241620bdbb83eb03fb9d571c99.1637762721.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:07 +01:00
Jakub Kicinski	e3509feb46	tls: fix replacing proto_ops [ Upstream commit `f3911f73f5` ] We replace proto_ops whenever TLS is configured for RX. But our replacement also overrides sendpage_locked, which will crash unless TX is also configured. Similarly we plug both of those in for TLS_HW (NIC crypto offload) even tho TLS_HW has a completely different implementation for TX. Last but not least we always plug in something based on inet_stream_ops even though a few of the callbacks differ for IPv6 (getname, release, bind). Use a callback building method similar to what we do for struct proto. Fixes: `c46234ebb4` ("tls: RX path for ktls") Fixes: `d4ffb02dee` ("net/tls: enable sk_msg redirect to tls socket egress") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:07 +01:00
Jakub Kicinski	22156242b1	tls: splice_read: fix record type check [ Upstream commit `520493f66f` ] We don't support splicing control records. TLS 1.3 changes moved the record type check into the decrypt if(). The skb may already be decrypted and still be an alert. Note that decrypt_skb_update() is idempotent and updates ctx->decrypted so the if() is pointless. Reorder the check for decryption errors with the content type check while touching them. This part is not really a bug, because if decryption failed in TLS 1.3 content type will be DATA, and for TLS 1.2 it will be correct. Nevertheless its strange to touch output before checking if the function has failed. Fixes: `fedf201e12` ("net: tls: Refactor control message handling on recv") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:07 +01:00
Huang Pei	3b6c71c097	MIPS: use 3-level pgtable for 64KB page size on MIPS_VA_BITS_48 [ Upstream commit `41ce097f71` ] It hangup when booting Loongson 3A1000 with BOTH CONFIG_PAGE_SIZE_64KB and CONFIG_MIPS_VA_BITS_48, that it turn out to use 2-level pgtable instead of 3-level. 64KB page size with 2-level pgtable only cover 42 bits VA, use 3-level pgtable to cover all 48 bits VA(55 bits) Fixes: `1e321fa917` ("MIPS64: Support of at least 48 bits of SEGBITS) Signed-off-by: Huang Pei <huangpei@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:07 +01:00
Huang Pei	a6a5d853f1	MIPS: loongson64: fix FTLB configuration [ Upstream commit `7db5e9e9e5` ] It turns out that 'decode_configs' -> 'set_ftlb_enable' is called under c->cputype unset, which leaves FTLB disabled on BOTH 3A2000 and 3A3000 Fix it by calling "decode_configs" after c->cputype is initialized Fixes: `da1bd29742` ("MIPS: Loongson64: Probe CPU features via CPUCFG") Signed-off-by: Huang Pei <huangpei@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:07 +01:00
Jesse Brandeburg	5e823dbee2	igb: fix netpoll exit with traffic [ Upstream commit `eaeace6077` ] Oleksandr brought a bug report where netpoll causes trace messages in the log on igb. Danielle brought this back up as still occurring, so we'll try again. [22038.710800] ------------[ cut here ]------------ [22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll [22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0 As Alex suggested, change the driver to return work_done at the exit of napi_poll, which should be safe to do in this driver because it is not polling multiple queues in this single napi context (multiple queues attached to one MSI-X vector). Several other drivers contain the same simple sequence, so I hope this will not create new problems. Fixes: `16eb8815c2` ("igb: Refactor clean_rx_irq to reduce overhead and improve performance") Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Reported-by: Danielle Ratson <danieller@nvidia.com> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/20211123204000.1597971-1-jesse.brandeburg@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:07 +01:00
Maurizio Lombardi	f2a58ff3e3	nvmet: use IOCB_NOWAIT only if the filesystem supports it [ Upstream commit `c024b226a4` ] Submit I/O requests with the IOCB_NOWAIT flag set only if the underlying filesystem supports it. Fixes: `50a909db36` ("nvmet: use IOCB_NOWAIT for file-ns buffered I/O") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:07 +01:00
Guo DaXing	12ceb52f2c	net/smc: Fix loop in smc_listen [ Upstream commit `9ebb0c4b27` ] The kernel_listen function in smc_listen will fail when all the available ports are occupied. At this point smc->clcsock->sk->sk_data_ready has been changed to smc_clcsock_data_ready. When we call smc_listen again, now both smc->clcsock->sk->sk_data_ready and smc->clcsk_data_ready point to the smc_clcsock_data_ready function. The smc_clcsock_data_ready() function calls lsmc->clcsk_data_ready which now points to itself resulting in an infinite loop. This patch restores smc->clcsock->sk->sk_data_ready with the old value. Fixes: `a60a2b1e0a` ("net/smc: reduce active tcp_listen workers") Signed-off-by: Guo DaXing <guodaxing@huawei.com> Acked-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:07 +01:00
Karsten Graul	c94cbd262b	net/smc: Fix NULL pointer dereferencing in smc_vlan_by_tcpsk() [ Upstream commit `587acad41f` ] Coverity reports a possible NULL dereferencing problem: in smc_vlan_by_tcpsk(): 6. returned_null: netdev_lower_get_next returns NULL (checked 29 out of 30 times). 7. var_assigned: Assigning: ndev = NULL return value from netdev_lower_get_next. 1623 ndev = (struct net_device *)netdev_lower_get_next(ndev, &lower); CID 1468509 (#1 of 1): Dereference null return value (NULL_RETURNS) 8. dereference: Dereferencing a pointer that might be NULL ndev when calling is_vlan_dev. 1624 if (is_vlan_dev(ndev)) { Remove the manual implementation and use netdev_walk_all_lower_dev() to iterate over the lower devices. While on it remove an obsolete function parameter comment. Fixes: `cb9d43f677` ("net/smc: determine vlan_id of stacked net_device") Suggested-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:07 +01:00
Russell King (Oracle)	3d4937c6a3	net: phylink: Force retrigger in case of latched link-fail indicator [ Upstream commit `dbae3388ea` ] On mv88e6xxx 1G/2.5G PCS, the SerDes register 4.2001.2 has the following description: This register bit indicates when link was lost since the last read. For the current link status, read this register back-to-back. Thus to get current link state, we need to read the register twice. But doing that in the link change interrupt handler would lead to potentially ignoring link down events, which we really want to avoid. Thus this needs to be solved in phylink's resolve, by retriggering another resolve in the event when PCS reports link down and previous link was up, and by re-reading PCS state if the previous link was down. The wrong value is read when phylink requests change from sgmii to 2500base-x mode, and link won't come up. This fixes the bug. Fixes: `9525ae8395` ("phylink: add phylink infrastructure") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:07 +01:00
Russell King (Oracle)	50162ff3c8	net: phylink: Force link down and retrigger resolve on interface change [ Upstream commit `80662f4fd4` ] On PHY state change the phylink_resolve() function can read stale information from the MAC and report incorrect link speed and duplex to the kernel message log. Example with a Marvell 88X3310 PHY connected to a SerDes port on Marvell 88E6393X switch: - PHY driver triggers state change due to PHY interface mode being changed from 10gbase-r to 2500base-x due to copper change in speed from 10Gbps to 2.5Gbps, but the PHY itself either hasn't yet changed its interface to the host, or the interrupt about loss of SerDes link hadn't arrived yet (there can be a delay of several milliseconds for this), so we still think that the 10gbase-r mode is up - phylink_resolve() - phylink_mac_pcs_get_state() - this fills in speed=10g link=up - interface mode is updated to 2500base-x but speed is left at 10Gbps - phylink_major_config() - interface is changed to 2500base-x - phylink_link_up() - mv88e6xxx_mac_link_up() - .port_set_speed_duplex() - speed is set to 10Gbps - reports "Link is Up - 10Gbps/Full" to dmesg Afterwards when the interrupt finally arrives for mv88e6xxx, another resolve is forced in which we get the correct speed from phylink_mac_pcs_get_state(), but since the interface is not being changed anymore, we don't call phylink_major_config() but only phylink_mac_config(), which does not set speed/duplex anymore. To fix this, we need to force the link down and trigger another resolve on PHY interface change event. Fixes: `9525ae8395` ("phylink: add phylink infrastructure") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:06 +01:00
Heiner Kallweit	95ba8f0d57	lan743x: fix deadlock in lan743x_phy_link_status_change() [ Upstream commit `ddb826c2c9` ] Usage of phy_ethtool_get_link_ksettings() in the link status change handler isn't needed, and in combination with the referenced change it results in a deadlock. Simply remove the call and replace it with direct access to phydev->speed. The duplex argument of lan743x_phy_update_flowcontrol() isn't used and can be removed. Fixes: `c10a485c3d` ("phy: phy_ethtool_ksettings_get: Lock the phy for consistency") Reported-by: Alessandro B Maurici <abmaurici@gmail.com> Tested-by: Alessandro B Maurici <abmaurici@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/40e27f76-0ba3-dcef-ee32-a78b9df38b0f@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:06 +01:00
Eric Dumazet	c5e4316d9c	tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows [ Upstream commit `4e1fddc98d` ] While testing BIG TCP patch series, I was expecting that TCP_RR workloads with 80KB requests/answers would send one 80KB TSO packet, then being received as a single GRO packet. It turns out this was not happening, and the root cause was that cubic Hystart ACK train was triggering after a few (2 or 3) rounds of RPC. Hystart was wrongly setting CWND/SSTHRESH to 30, while my RPC needed a budget of ~20 segments. Ideally these TCP_RR flows should not exit slow start. Cubic Hystart should reset itself at each round, instead of assuming every TCP flow is a bulk one. Note that even after this patch, Hystart can still trigger, depending on scheduling artifacts, but at a higher CWND/SSTHRESH threshold, keeping optimal TSO packet sizes. Tested: ip link set dev eth0 gro_ipv6_max_size 131072 gso_ipv6_max_size 131072 nstat -n; netperf -H ... -t TCP_RR -l 5 -- -r 80000,80000 -K cubic; nstat\|egrep "Ip6InReceives\|Hystart\|Ip6OutRequests" Before: 8605 Ip6InReceives 87541 0.0 Ip6OutRequests 129496 0.0 TcpExtTCPHystartTrainDetect 1 0.0 TcpExtTCPHystartTrainCwnd 30 0.0 After: 8760 Ip6InReceives 88514 0.0 Ip6OutRequests 87975 0.0 Fixes: `ae27e98a51` ("[TCP] CUBIC v2.3") Co-developed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Yuchung Cheng <ycheng@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Link: https://lore.kernel.org/r/20211123202535.1843771-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:06 +01:00
Nicholas Kazlauskas	3187623096	drm/amd/display: Set plane update flags for all planes in reset [ Upstream commit `21431f70f6` ] [Why] We're only setting the flags on stream[0]'s planes so this logic fails if we have more than one stream in the state. This can cause a page flip timeout with multiple displays in the configuration. [How] Index into the stream_status array using the stream index - it's a 1:1 mapping. Fixes: `cdaae8371a` ("drm/amd/display: Handle GPU reset for DC block") Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:06 +01:00
Thomas Zeitlhofer	f634c755a0	PM: hibernate: use correct mode for swsusp_close() [ Upstream commit `cefcf24b4d` ] Commit `39fbef4b0f` ("PM: hibernate: Get block device exclusively in swsusp_check()") changed the opening mode of the block device to (FMODE_READ \| FMODE_EXCL). In the corresponding calls to swsusp_close(), the mode is still just FMODE_READ which triggers the warning in blkdev_flush_mapping() on resume from hibernate. So, use the mode (FMODE_READ \| FMODE_EXCL) also when closing the device. Fixes: `39fbef4b0f` ("PM: hibernate: Get block device exclusively in swsusp_check()") Signed-off-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:06 +01:00
Kumar Thangavel	440bd9faad	net/ncsi : Add payload to be 32-bit aligned to fix dropped packets [ Upstream commit `ac13285214` ] Update NC-SI command handler (both standard and OEM) to take into account of payload paddings in allocating skb (in case of payload size is not 32-bit aligned). The checksum field follows payload field, without taking payload padding into account can cause checksum being truncated, leading to dropped packets. Fixes: `fb4ee67529` ("net/ncsi: Add NCSI OEM command support") Signed-off-by: Kumar Thangavel <thangavel.k@hcl.com> Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:06 +01:00
Varun Prakash	ac88cb3c44	nvmet-tcp: fix incomplete data digest send [ Upstream commit `102110efdf` ] Current nvmet_try_send_ddgst() code does not check whether all data digest bytes are transmitted, fix this by returning -EAGAIN if all data digest bytes are not transmitted. Fixes: `872d26a391` ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:06 +01:00
Marek Behún	8889ff80fd	net: marvell: mvpp2: increase MTU limit when XDP enabled [ Upstream commit `7b1b62bc1e` ] Currently mvpp2_xdp_setup won't allow attaching XDP program if mtu > ETH_DATA_LEN (1500). The mvpp2_change_mtu on the other hand checks whether MVPP2_RX_PKT_SIZE(mtu) > MVPP2_BM_LONG_PKT_SIZE. These two checks are semantically different. Moreover this limit can be increased to MVPP2_MAX_RX_BUF_SIZE, since in mvpp2_rx we have xdp.data = data + MVPP2_MH_SIZE + MVPP2_SKB_HEADROOM; xdp.frame_sz = PAGE_SIZE; Change the checks to check whether mtu > MVPP2_MAX_RX_BUF_SIZE Fixes: `07dd0a7aae` ("mvpp2: add basic XDP support") Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:06 +01:00
Amit Cohen	90d0736876	mlxsw: spectrum: Protect driver from buggy firmware [ Upstream commit `63b08b1f68` ] When processing port up/down events generated by the device's firmware, the driver protects itself from events reported for non-existent local ports, but not the CPU port (local port 0), which exists, but lacks a netdev. This can result in a NULL pointer dereference when calling netif_carrier_{on,off}(). Fix this by bailing early when processing an event reported for the CPU port. Problem was only observed when running on top of a buggy emulator. Fixes: `28b1987ef5` ("mlxsw: spectrum: Register CPU port with devlink") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:06 +01:00
Danielle Ratson	33d89128a9	mlxsw: Verify the accessed index doesn't exceed the array length [ Upstream commit `837ec05cfe` ] There are few cases in which an array index queried from a fw register, is accessed without any validation that it doesn't exceed the array length. Add a proper length validation, so accessing memory past the end of an array will be forbidden. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:05 +01:00
Tony Lu	29e1b57347	net/smc: Ensure the active closing peer first closes clcsock [ Upstream commit `606a63c978` ] The side that actively closed socket, it's clcsock doesn't enter TIME_WAIT state, but the passive side does it. It should show the same behavior as TCP sockets. Consider this, when client actively closes the socket, the clcsock in server enters TIME_WAIT state, which means the address is occupied and won't be reused before TIME_WAIT dismissing. If we restarted server, the service would be unavailable for a long time. To solve this issue, shutdown the clcsock in [A], perform the TCP active close progress first, before the passive closed side closing it. So that the actively closed side enters TIME_WAIT, not the passive one. Client \| Server close() // client actively close \| smc_release() \| smc_close_active() // PEERCLOSEWAIT1 \| smc_close_final() // abort or closed = 1\| smc_cdc_get_slot_and_msg_send() \| [A] \| \|smc_cdc_msg_recv_action() // ACTIVE \| queue_work(smc_close_wq, &conn->close_work) \| smc_close_passive_work() // PROCESSABORT or APPCLOSEWAIT1 \| smc_close_passive_abort_received() // only in abort \| \|close() // server recv zero, close \| smc_release() // PROCESSABORT or APPCLOSEWAIT1 \| smc_close_active() \| smc_close_abort() or smc_close_final() // CLOSED \| smc_cdc_get_slot_and_msg_send() // abort or closed = 1 smc_cdc_msg_recv_action() \| smc_clcsock_release() queue_work(smc_close_wq, &conn->close_work) \| sock_release(tcp) // actively close clc, enter TIME_WAIT smc_close_passive_work() // PEERCLOSEWAIT1 \| smc_conn_free() smc_close_passive_abort_received() // CLOSED\| smc_conn_free() \| smc_clcsock_release() \| sock_release(tcp) // passive close clc \| Link: https://www.spinics.net/lists/netdev/msg780407.html Fixes: `b38d732477` ("smc: socket closing and linkgroup cleanup") Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:05 +01:00
Huang Jianan	77d9c2efa8	erofs: fix deadlock when shrink erofs slab [ Upstream commit `57bbeacdbe` ] We observed the following deadlock in the stress test under low memory scenario: Thread A Thread B - erofs_shrink_scan - erofs_try_to_release_workgroup - erofs_workgroup_try_to_freeze -- A - z_erofs_do_read_page - z_erofs_collection_begin - z_erofs_register_collection - erofs_insert_workgroup - xa_lock(&sbi->managed_pslots) -- B - erofs_workgroup_get - erofs_wait_on_workgroup_freezed -- A - xa_erase - xa_lock(&sbi->managed_pslots) -- B To fix this, it needs to hold xa_lock before freezing the workgroup since xarray will be touched then. So let's hold the lock before accessing each workgroup, just like what we did with the radix tree before. [ Gao Xiang: Jianhua Hao also reports this issue at https://lore.kernel.org/r/b10b85df30694bac8aadfe43537c897a@xiaomi.com ] Link: https://lore.kernel.org/r/20211118135844.3559-1-huangjianan@oppo.com Fixes: `64094a0441` ("erofs: convert workstn to XArray") Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Reported-by: Jianhua Hao <haojianhua1@xiaomi.com> Signed-off-by: Gao Xiang <xiang@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-12-01 09:19:05 +01:00

1 2 3 4 5 ...

985140 Commits