linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-09 20:32:04 +09:00

Author	SHA1	Message	Date
Zheng Wang	9d882229d3	xirc2ps_cs: Fix use after free bug in xirc2ps_detach [ Upstream commit `e8d20c3ded` ] In xirc2ps_probe, the local->tx_timeout_task was bounded with xirc2ps_tx_timeout_task. When timeout occurs, it will call xirc_tx_timeout->schedule_work to start the work. When we call xirc2ps_detach to remove the driver, there may be a sequence as follows: Stop responding to timeout tasks and complete scheduled tasks before cleanup in xirc2ps_detach, which will fix the problem. CPU0 CPU1 \|xirc2ps_tx_timeout_task xirc2ps_detach \| free_netdev \| kfree(dev); \| \| \| do_reset \| //use dev Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Zheng Wang <zyytlz.wz@163.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:04 +02:00
Daniil Tatianin	97ea704f39	qed/qed_sriov: guard against NULL derefs from qed_iov_get_vf_info [ Upstream commit `25143b6a01` ] We have to make sure that the info returned by the helper is valid before using it. Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Fixes: `f990c82c38` ("qed: Add support for ndo_set_vf_trust") Fixes: `733def6a04` ("qed: IOV link control") Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:04 +02:00
Ard Biesheuvel	58c11bc7ad	efi/libstub: smbios: Use length member instead of record struct size [ Upstream commit `34343eb06a` ] The type 1 SMBIOS record happens to always be the same size, but there are other record types which have been augmented over time, and so we should really use the length field in the header to decide where the string table starts. Fixes: `550b33cfd4` ("arm64: efi: Force the use of ...") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:04 +02:00
Szymon Heidrich	e041bef1ad	net: usb: smsc95xx: Limit packet length to skb->len [ Upstream commit `ff821092cf` ] Packet length retrieved from descriptor may be larger than the actual socket buffer length. In such case the cloned skb passed up the network stack will leak kernel memory contents. Fixes: `2f7ca802bd` ("net: Add SMSC LAN9500 USB2.0 10/100 ethernet adapter driver") Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230316101954.75836-1-szymon.heidrich@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:04 +02:00
Álvaro Fernández Rojas	53a915a00b	net: dsa: b53: mmap: fix device tree support [ Upstream commit `30796d0dcb` ] CPU port should also be enabled in order to get a working switch. Fixes: `a5538a777b` ("net: dsa: b53: mmap: Add device tree support") Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230316172807.460146-1-noltari@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:04 +02:00
Jeff Layton	51ddb84baf	nfsd: don't replace page in rq_pages if it's a continuation of last page [ Upstream commit `27c934dd88` ] The splice read calls nfsd_splice_actor to put the pages containing file data into the svc_rqst->rq_pages array. It's possible however to get a splice result that only has a partial page at the end, if (e.g.) the filesystem hands back a short read that doesn't cover the whole page. nfsd_splice_actor will plop the partial page into its rq_pages array and return. Then later, when nfsd_splice_actor is called again, the remainder of the page may end up being filled out. At this point, nfsd_splice_actor will put the page into the array _again_ corrupting the reply. If this is done enough times, rq_next_page will overrun the array and corrupt the trailing fields -- the rq_respages and rq_next_page pointers themselves. If we've already added the page to the array in the last pass, don't add it to the array a second time when dealing with a splice continuation. This was originally handled properly in nfsd_splice_actor, but commit `91e23b1c39` ("NFSD: Clean up nfsd_splice_actor()") removed the check for it. Fixes: `91e23b1c39` ("NFSD: Clean up nfsd_splice_actor()") Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Dario Lesca <d.lesca@solinos.it> Tested-by: David Critch <dcritch@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:04 +02:00
Yu Kuai	1c55982beb	scsi: scsi_dh_alua: Fix memleak for 'qdata' in alua_activate() [ Upstream commit `a13faca032` ] If alua_rtpg_queue() failed from alua_activate(), then 'qdata' is not freed, which will cause following memleak: unreferenced object 0xffff88810b2c6980 (size 32): comm "kworker/u16:2", pid 635322, jiffies 4355801099 (age 1216426.076s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 40 39 24 c1 ff ff ff ff 00 f8 ea 0a 81 88 ff ff @9$............. backtrace: [<0000000098f3a26d>] alua_activate+0xb0/0x320 [<000000003b529641>] scsi_dh_activate+0xb2/0x140 [<000000007b296db3>] activate_path_work+0xc6/0xe0 [dm_multipath] [<000000007adc9ace>] process_one_work+0x3c5/0x730 [<00000000c457a985>] worker_thread+0x93/0x650 [<00000000cb80e628>] kthread+0x1ba/0x210 [<00000000a1e61077>] ret_from_fork+0x22/0x30 Fix the problem by freeing 'qdata' in error path. Fixes: `625fe857e4` ("scsi: scsi_dh_alua: Check scsi_device_get() return value") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20230315062154.668812-1-yukuai1@huaweicloud.com Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:03 +02:00
Yicong Yang	a0de3f29d8	i2c: hisi: Only use the completion interrupt to finish the transfer [ Upstream commit `d982635126` ] The controller will always generate a completion interrupt when the transfer is finished normally or not. Currently we use either error or completion interrupt to finish, this may result the completion interrupt unhandled and corrupt the next transfer, especially at low speed mode. Since on error case, the error interrupt will come first then is the completion interrupt. So only use the completion interrupt to finish the whole transfer process. Fixes: `d62fbdb99a` ("i2c: add support for HiSilicon I2C controller") Reported-by: Sheng Feng <fengsheng5@huawei.com> Signed-off-by: Sheng Feng <fengsheng5@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:03 +02:00
Matthias Schiffer	d6ea83a476	i2c: mxs: ensure that DMA buffers are safe for DMA [ Upstream commit `5190417bdf` ] We found that after commit `9c46929e79` ("ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems"), the PCF85063 RTC driver stopped working on i.MX28 due to regmap_bulk_read() reading bogus data into a stack buffer. This is caused by the i2c-mxs driver using DMA transfers even for messages without the I2C_M_DMA_SAFE flag, and the aforementioned commit enabling vmapped stacks. As the MXS I2C controller requires DMA for reads of >4 bytes, DMA can't be disabled, so the issue is fixed by using i2c_get_dma_safe_msg_buf() to create a bounce buffer when needed. Fixes: `9c46929e79` ("ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems") Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:03 +02:00
Alexander Stein	6d1c6e982b	i2c: imx-lpi2c: check only for enabled interrupt flags [ Upstream commit `1c78850045` ] When reading from I2C, the Tx watermark is set to 0. Unfortunately the TDF (transmit data flag) is enabled when Tx FIFO entries is equal or less than watermark. So it is set in every case, hence the reset default of 1. This results in the MSR_RDF _and_ MSR_TDF flags to be set thus trying to send Tx data on a read message. Mask the IRQ status to filter for wanted flags only. Fixes: `a55fa9d0e4` ("i2c: imx-lpi2c: add low power i2c bus driver") Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Tested-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:03 +02:00
AKASHI Takahiro	ec897f7524	igc: fix the validation logic for taprio's gate list [ Upstream commit `2b4cc3d3f4` ] The check introduced in the commit `a5fd39464a` ("igc: Lift TAPRIO schedule restriction") can detect a false positive error in some corner case. For instance, tc qdisc replace ... taprio num_tc 4 ... sched-entry S 0x01 100000 # slot#1 sched-entry S 0x03 100000 # slot#2 sched-entry S 0x04 100000 # slot#3 sched-entry S 0x08 200000 # slot#4 flags 0x02 # hardware offload Here the queue#0 (the first queue) is on at the slot#1 and #2, and off at the slot#3 and #4. Under the current logic, when the slot#4 is examined, validate_schedule() returns false since the enablement count for the queue#0 is two and it is already off at the previous slot (i.e. #3). But this definition is truely correct. Let's fix the logic to enforce a strict validation for consecutively-opened slots. Fixes: `a5fd39464a` ("igc: Lift TAPRIO schedule restriction") Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:03 +02:00
Akihiko Odaki	910e2013d0	igbvf: Regard vf reset nack as success [ Upstream commit `02c83791ef` ] vf reset nack actually represents the reset operation itself is performed but no address is assigned. Therefore, e1000_reset_hw_vf should fill the "perm_addr" with the zero address and return success on such an occasion. This prevents its callers in netdev.c from saying PF still resetting, and instead allows them to correctly report that no address is assigned. Fixes: `6ddbc4cf1f` ("igb: Indicate failure on vf reset for empty mac address") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:03 +02:00
Gaosheng Cui	460e4073b7	intel/igbvf: free irq on the error path in igbvf_request_msix() [ Upstream commit `85eb39bb39` ] In igbvf_request_msix(), irqs have not been freed on the err path, we need to free it. Fix it. Fixes: `d4e0fe01a3` ("igbvf: add new driver to support 82576 virtual functions") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:03 +02:00
Ahmed Zaki	3faa2b8f8f	iavf: do not track VLAN 0 filters [ Upstream commit `964290ff32` ] When an interface with the maximum number of VLAN filters is brought up, a spurious error is logged: [257.483082] 8021q: adding VLAN 0 to HW filter on device enp0s3 [257.483094] iavf 0000:00:03.0 enp0s3: Max allowed VLAN filters 8. Remove existing VLANs or disable filtering via Ethtool if supported. The VF driver complains that it cannot add the VLAN 0 filter. On the other hand, the PF driver always adds VLAN 0 filter on VF initialization. The VF does not need to ask the PF for that filter at all. Fix the error by not tracking VLAN 0 filters altogether. With that, the check added by commit `0e710a3ffd` ("iavf: Fix VF driver counting VLAN 0 filters") in iavf_virtchnl.c is useless and might be confusing if left as it suggests that we track VLAN 0. Fixes: `0e710a3ffd` ("iavf: Fix VF driver counting VLAN 0 filters") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:02 +02:00
Alexander Lobakin	c9c56af093	iavf: fix non-tunneled IPv6 UDP packet type and hashing [ Upstream commit `de58647b43` ] Currently, IAVF's decode_rx_desc_ptype() correctly reports payload type of L4 for IPv4 UDP packets and IPv{4,6} TCP, but only L3 for IPv6 UDP. Originally, i40e, ice and iavf were affected. Commit `73df8c9e3e` ("i40e: Correct UDP packet header for non_tunnel-ipv6") fixed that in i40e, then commit `638a0c8c88` ("ice: fix incorrect payload indicator on PTYPE") fixed that for ice. IPv6 UDP is L4 obviously. Fix it and make iavf report correct L4 hash type for such packets, so that the stack won't calculate it on CPU when needs it. Fixes: `206812b5fc` ("i40e/i40evf: i40e implementation for skb_set_hash") Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:02 +02:00
Alexander Lobakin	0bfadea827	iavf: fix inverted Rx hash condition leading to disabled hash [ Upstream commit `32d57f667f` ] Condition, which checks whether the netdev has hashing enabled is inverted. Basically, the tagged commit effectively disabled passing flow hash from descriptor to skb, unless user disables it via Ethtool. Commit `a876c3ba59` ("i40e/i40evf: properly report Rx packet hash") fixed this problem, but only for i40e. Invert the condition now in iavf and unblock passing hash to skbs again. Fixes: `857942fd1a` ("i40e: Fix Rx hash reported to the stack by our driver") Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:02 +02:00
Kal Conley	a069909acc	xsk: Add missing overflow check in xdp_umem_reg [ Upstream commit `c7df4813b1` ] The number of chunks can overflow u32. Make sure to return -EINVAL on overflow. Also remove a redundant u32 cast assigning umem->npgs. Fixes: `bbff2f321a` ("xsk: new descriptor addressing scheme") Signed-off-by: Kal Conley <kal.conley@dectris.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230308174013.1114745-1-kal.conley@dectris.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:02 +02:00
Dave Wysochanski	4797ad1f56	NFS: Fix /proc/PID/io read_bytes for buffered reads [ Upstream commit `9c88ea00fe` ] Prior to commit `8786fde842` ("Convert NFS from readpages to readahead"), nfs_readpages() used the old mm interface read_cache_pages() which called task_io_account_read() for each NFS page read. After this commit, nfs_readpages() is converted to nfs_readahead(), which now uses the new mm interface readahead_page(). The new interface requires callers to call task_io_account_read() themselves. In addition, to nfs_readahead() task_io_account_read() should also be called from nfs_read_folio(). Fixes: `8786fde842` ("Convert NFS from readpages to readahead") Link: https://lore.kernel.org/linux-nfs/CAPt2mGNEYUk5u8V4abe=5MM5msZqmvzCVrtCP4Qw1n=gCHCnww@mail.gmail.com/ Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:02 +02:00
Alexander Stein	26aef3be6e	arm64: dts: imx93: add missing #address-cells and #size-cells to i2c nodes [ Upstream commit `b3cdf73048` ] Add them to the SoC .dtsi, so that not every board has to specify them. Fixes: `1225396fef` ("arm64: dts: imx93: add lpi2c nodes") Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:02 +02:00
Marek Vasut	9f66c5dbaf	arm64: dts: imx8mn: specify #sound-dai-cells for SAI nodes [ Upstream commit `62fb54148c` ] Add #sound-dai-cells properties to SAI nodes. Reviewed-by: Adam Ford <aford173@gmail.com> Reviewed-by: Fabio Estevam <festevam@gmail.com> Fixes: `9e98600697` ("arm64: dts: imx8mn: Add SAI nodes") Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Marco Felsch <m.felsch@pengutronix.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:02 +02:00
Peng Fan	d75af98786	ARM: dts: imx6sl: tolino-shine2hd: fix usbotg1 pinctrl [ Upstream commit `1cd489e1ad` ] usb@2184000: 'pinctrl-0' is a dependency of 'pinctrl-names' Signed-off-by: Peng Fan <peng.fan@nxp.com> Fixes: `9c7016f1ca` ("ARM: dts: imx: add devicetree for Tolino Shine 2 HD") Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:01 +02:00
Peng Fan	0828dda42e	ARM: dts: imx6sll: e60k02: fix usbotg1 pinctrl [ Upstream commit `957c04e978` ] usb@2184000: 'pinctrl-0' is a dependency of 'pinctrl-names' Signed-off-by: Peng Fan <peng.fan@nxp.com> Fixes: `c100ea86e6` ("ARM: dts: add Netronix E60K02 board common file") Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:01 +02:00
Peng Fan	8505ead526	ARM: dts: imx6sll: e70k02: fix usbotg1 pinctrl [ Upstream commit `3d37f7685d` ] usb@2184000: 'pinctrl-0' is a dependency of 'pinctrl-names' Signed-off-by: Peng Fan <peng.fan@nxp.com> Fixes: `3bb3fd8565` ("ARM: dts: add Netronix E70K02 board common file") Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:01 +02:00
Andrew Halaney	08589e3ca8	arm64: dts: imx8dxl-evk: Fix eqos phy reset gpio [ Upstream commit `feafeb5314` ] The deprecated property is named snps,reset-gpio, but this devicetree used snps,reset-gpios instead which results in the reset not being used and the following make dtbs_check error: ./arch/arm64/boot/dts/freescale/imx8dxl-evk.dtb: ethernet@5b050000: 'snps,reset-gpio' is a dependency of 'snps,reset-delays-us' From schema: ./Documentation/devicetree/bindings/net/snps,dwmac.yaml Use the preferred method of defining the reset gpio in the phy node itself. Note that this drops the 10 us pre-delay, but prior this wasn't used at all and a pre-delay doesn't make much sense in this context so it should be fine. Fixes: `8dd495d123` ("arm64: dts: freescale: add support for i.MX8DXL EVK board") Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:01 +02:00
Wei Fang	bcbc0df731	arm64: dts: imx8dxl-evk: Disable hibernation mode of AR8031 for EQOS [ Upstream commit `0deefb5bd1` ] The hibernation mode of AR8031 PHY defaults to be enabled after hardware reset. When the cable is unplugged, the PHY will enter hibernation mode after about 10 senconds and the PHY clocks will be stopped to save power. However, due to the design of EQOS, the mac needs the RX_CLK of PHY for software reset to complete. Otherwise the software reset of EQOS will be failed and do not work correctly. The only way is to disable hibernation mode of AR8031 PHY for EQOS, the "qca,disable-hibernation-mode" property is used for this purpose and has already been submitted to the upstream, for more details please refer to the below link: https://lore.kernel.org/netdev/20220818030054.1010660-2-wei.fang@nxp.com/ This issue is easy to reproduce, just unplug the cable and "ifconfig eth0 down", after about 10 senconds, then "ifconfig eth0 up", you will see failure log on the serial port. The log is shown as following: root@imx8dxlevk:~# [34.941970] imx-dwmac 5b050000.ethernet eth0: Link is Down root@imx8dxlevk:~# ifconfig eth0 down [35.437814] imx-dwmac 5b050000.ethernet eth0: FPE workqueue stop [35.507913] imx-dwmac 5b050000.ethernet eth0: PHY [stmmac-1:00] driver [Qualcomm Atheros AR8031/AR8033] (irq=POLL) [35.518613] imx-dwmac 5b050000.ethernet eth0: configuring for phy/rgmii-id link mode root@imx8dxlevk:~# ifconfig eth0 up [71.143044] imx-dwmac 5b050000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0 [71.215855] imx-dwmac 5b050000.ethernet eth0: PHY [stmmac-1:00] driver [Qualcomm Atheros AR8031/AR8033] (irq=POLL) [72.230417] imx-dwmac 5b050000.ethernet: Failed to reset the dma [72.236512] imx-dwmac 5b050000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed [72.245258] imx-dwmac 5b050000.ethernet eth0: __stmmac_open: Hw setup failed SIOCSIFFLAGS: Connection timed out After applying this patch, the software reset of EQOS will be successful. And the log is shown as below. root@imx8dxlevk:~# ifconfig eth0 up [96.114344] imx-dwmac 5b050000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0 [96.171466] imx-dwmac 5b050000.ethernet eth0: PHY [stmmac-1:00] driver [Qualcomm Atheros AR8031/AR8033] (irq=POLL) [96.188883] imx-dwmac 5b050000.ethernet eth0: No Safety Features support found [96.196221] imx-dwmac 5b050000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported [96.204846] imx-dwmac 5b050000.ethernet eth0: registered PTP clock [96.225558] imx-dwmac 5b050000.ethernet eth0: FPE workqueue start [96.236858] imx-dwmac 5b050000.ethernet eth0: configuring for phy/rgmii-id link mode [96.249358] 8021q: adding VLAN 0 to HW filter on device eth0 Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Clark Wang <xiaoning.wang@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Stable-dep-of: `feafeb5314` ("arm64: dts: imx8dxl-evk: Fix eqos phy reset gpio") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:01 +02:00
Zheng Wang	47b2e1a67e	power: supply: da9150: Fix use after free bug in da9150_charger_remove due to race condition [ Upstream commit `06615d11cc` ] In da9150_charger_probe, &charger->otg_work is bound with da9150_charger_otg_work. da9150_charger_otg_ncb may be called to start the work. If we remove the module which will call da9150_charger_remove to make cleanup, there may be a unfinished work. The possible sequence is as follows: Fix it by canceling the work before cleanup in the da9150_charger_remove CPU0 CPUc1 \|da9150_charger_otg_work da9150_charger_remove \| power_supply_unregister \| device_unregister \| power_supply_dev_release\| kfree(psy) \| \| \| power_supply_changed(charger->usb); \| //use Fixes: `c1a281e34d` ("power: Add support for DA9150 Charger") Signed-off-by: Zheng Wang <zyytlz.wz@163.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:01 +02:00
Zheng Wang	84bdb3b76b	power: supply: bq24190: Fix use after free bug in bq24190_remove due to race condition [ Upstream commit `47c29d6921` ] In bq24190_probe, &bdi->input_current_limit_work is bound with bq24190_input_current_limit_work. When external power changed, it will call bq24190_charger_external_power_changed to start the work. If we remove the module which will call bq24190_remove to make cleanup, there may be a unfinished work. The possible sequence is as follows: CPU0 CPUc1 \|bq24190_input_current_limit_work bq24190_remove \| power_supply_unregister \| device_unregister \| power_supply_dev_release\| kfree(psy) \| \| \| power_supply_get_property_from_supplier \| //use Fix it by finishing the work before cleanup in the bq24190_remove Fixes: `9777467257` ("power_supply: Initialize changed_work before calling device_add") Signed-off-by: Zheng Wang <zyytlz.wz@163.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:01 +02:00
Manivannan Sadhasivam	1b483a983a	arm64: dts: qcom: sm8450: Mark UFS controller as cache coherent [ Upstream commit `8ba961d433` ] The UFS controller on SM8450 supports cache coherency, hence add the "dma-coherent" property to mark it as such. Fixes: `07fa917a33` ("arm64: dts: qcom: sm8450: add ufs nodes") Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230307153201.180626-2-manivannan.sadhasivam@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:00 +02:00
Cruise Hung	ee9caccc5e	drm/amd/display: Fix DP MST sinks removal issue [ Upstream commit `cbd6c1b17d` ] [Why] In USB4 DP tunneling, it's possible to have this scenario that the path becomes unavailable and CM tears down the path a little bit late. So, in this case, the HPD is high but fails to read any DPCD register. That causes the link connection type to be set to sst. And not all sinks are removed behind the MST branch. [How] Restore the link connection type if it fails to read DPCD register. Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Cruise Hung <Cruise.Hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `cbd6c1b17d`) Modified for stable backport as a lot of the code in this file was moved in 6.3 to drivers/gpu/drm/amd/display/dc/link/link_detection.c. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:00 +02:00
Paolo Abeni	5564be74a2	mptcp: fix UaF in listener shutdown [ Upstream commit `0a3f4f1f9c` ] Backports notes: one simple conflict in net/mptcp/protocol.c with: commit `f8c9dfbd87` ("mptcp: add pm listener events") Where one commit removes code in __mptcp_close_ssk() while the other one adds one line at the same place. We can simply remove the whole condition because this extra instruction is not present in v6.1. As reported by Christoph after having refactored the passive socket initialization, the mptcp listener shutdown path is prone to an UaF issue. BUG: KASAN: use-after-free in _raw_spin_lock_bh+0x73/0xe0 Write of size 4 at addr ffff88810cb23098 by task syz-executor731/1266 CPU: 1 PID: 1266 Comm: syz-executor731 Not tainted 6.2.0-rc59af4eaa31c1f6c00c8f1e448ed99a45c66340dd5 #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x6e/0x91 print_report+0x16a/0x46f kasan_report+0xad/0x130 kasan_check_range+0x14a/0x1a0 _raw_spin_lock_bh+0x73/0xe0 subflow_error_report+0x6d/0x110 sk_error_report+0x3b/0x190 tcp_disconnect+0x138c/0x1aa0 inet_child_forget+0x6f/0x2e0 inet_csk_listen_stop+0x209/0x1060 __mptcp_close_ssk+0x52d/0x610 mptcp_destroy_common+0x165/0x640 mptcp_destroy+0x13/0x80 __mptcp_destroy_sock+0xe7/0x270 __mptcp_close+0x70e/0x9b0 mptcp_close+0x2b/0x150 inet_release+0xe9/0x1f0 __sock_release+0xd2/0x280 sock_close+0x15/0x20 __fput+0x252/0xa20 task_work_run+0x169/0x250 exit_to_user_mode_prepare+0x113/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc The msk grace period can legitly expire in between the last reference count dropped in mptcp_subflow_queue_clean() and the later eventual access in inet_csk_listen_stop() After the previous patch we don't need anymore special-casing msk listener socket cleanup: the mptcp worker will process each of the unaccepted msk sockets. Just drop the now unnecessary code. Please note this commit depends on the two parent ones: mptcp: refactor passive socket initialization mptcp: use the workqueue to destroy unaccepted sockets Fixes: `6aeed90450` ("mptcp: fix race on unaccepted mptcp sockets") Cc: stable@vger.kernel.org Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/346 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:00 +02:00
Paolo Abeni	2827f099b3	mptcp: use the workqueue to destroy unaccepted sockets [ Upstream commit `b6985b9b82` ] Backports notes: one simple conflict in net/mptcp/protocol.c with: commit `a5ef058dc4` ("net: introduce and use custom sockopt socket flag") Where the two commits add a new line for different actions in the same context in mptcp_stream_accept(). Christoph reported a UaF at token lookup time after having refactored the passive socket initialization part: BUG: KASAN: use-after-free in __token_bucket_busy+0x253/0x260 Read of size 4 at addr ffff88810698d5b0 by task syz-executor653/3198 CPU: 1 PID: 3198 Comm: syz-executor653 Not tainted 6.2.0-rc59af4eaa31c1f6c00c8f1e448ed99a45c66340dd5 #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x6e/0x91 print_report+0x16a/0x46f kasan_report+0xad/0x130 __token_bucket_busy+0x253/0x260 mptcp_token_new_connect+0x13d/0x490 mptcp_connect+0x4ed/0x860 __inet_stream_connect+0x80e/0xd90 tcp_sendmsg_fastopen+0x3ce/0x710 mptcp_sendmsg+0xff1/0x1a20 inet_sendmsg+0x11d/0x140 __sys_sendto+0x405/0x490 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc We need to properly clean-up all the paired MPTCP-level resources and be sure to release the msk last, even when the unaccepted subflow is destroyed by the TCP internals via inet_child_forget(). We can re-use the existing MPTCP_WORK_CLOSE_SUBFLOW infra, explicitly checking that for the critical scenario: the closed subflow is the MPC one, the msk is not accepted and eventually going through full cleanup. With such change, __mptcp_destroy_sock() is always called on msk sockets, even on accepted ones. We don't need anymore to transiently drop one sk reference at msk clone time. Please note this commit depends on the parent one: mptcp: refactor passive socket initialization Fixes: `58b0991962` ("mptcp: create msk early") Cc: stable@vger.kernel.org Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/347 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:00 +02:00
Paolo Abeni	1516ddbc34	mptcp: refactor passive socket initialization [ Upstream commit `3a236aef28` ] After commit `30e51b923e` ("mptcp: fix unreleased socket in accept queue") unaccepted msk sockets go throu complete shutdown, we don't need anymore to delay inserting the first subflow into the subflow lists. The reference counting deserve some extra care, as __mptcp_close() is unaware of the request socket linkage to the first subflow. Please note that this is more a refactoring than a fix but because this modification is needed to include other corrections, see the following commits. Then a Fixes tag has been added here to help the stable team. Fixes: `30e51b923e` ("mptcp: fix unreleased socket in accept queue") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:00 +02:00
Saaem Rizvi	75eb69023b	drm/amd/display: Remove OTG DIV register write for Virtual signals. [ Upstream commit `709671ffb1` ] [WHY] Hot plugging and then hot unplugging leads to k1 and k2 values to change, as signal is detected as a virtual signal on hot unplug. Writing these values to OTG_PIXEL_RATE_DIV register might cause primary display to blank (known hw bug). [HOW] No longer write k1 and k2 values to register if signal is virtual, we have safe guards in place in the case that k1 and k2 is unassigned so that an unknown value is not written to the register either. Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Samson Tam <Samson.Tam@amd.com> Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Saaem Rizvi <SyedSaaem.Rizvi@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:00 +02:00
Aurabindo Pillai	0ac86f7dda	drm/amd/display: fix k1 k2 divider programming for phantom streams [ Upstream commit `3b214bb718` ] [Why & How] When k1 and k2 divider programming logic is executed for a phantom stream, the corresponding master stream should be used for the calculation. Fix the if condition to use the master stream for checking signal type instead of the phantom stream. Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: `709671ffb1` ("drm/amd/display: Remove OTG DIV register write for Virtual signals.") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:49:00 +02:00
Eric Bernstein	4a36da161b	drm/amd/display: Include virtual signal to set k1 and k2 values [ Upstream commit `368307cef6` ] Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Eric Bernstein <eric.bernstein@amd.com> Tested-by: Mark Broadworth <mark.broadworth@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: `709671ffb1` ("drm/amd/display: Remove OTG DIV register write for Virtual signals.") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:48:59 +02:00
Costa Shulyupin	a1f4880655	tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr [ Upstream commit `71c7a30442` ] There is a problem with the behavior of hwlat in a container, resulting in incorrect output. A warning message is generated: "cpumask changed while in round-robin mode, switching to mode none", and the tracing_cpumask is ignored. This issue arises because the kernel thread, hwlatd, is not a part of the container, and the function sched_setaffinity is unable to locate it using its PID. Additionally, the task_struct of hwlatd is already known. Ultimately, the function set_cpus_allowed_ptr achieves the same outcome as sched_setaffinity, but employs task_struct instead of PID. Test case: # cd /sys/kernel/tracing # echo 0 > tracing_on # echo round-robin > hwlat_detector/mode # echo hwlat > current_tracer # unshare --fork --pid bash -c 'echo 1 > tracing_on' # dmesg -c Actual behavior: [573502.809060] hwlat_detector: cpumask changed while in round-robin mode, switching to mode none Link: https://lore.kernel.org/linux-trace-kernel/20230316144535.1004952-1-costa.shul@redhat.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: `0330f7aa8e` ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs") Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:48:59 +02:00
Song Liu	d496185c25	perf: fix perf_event_context->time [ Upstream commit `baf1b12a67` ] Time readers rely on perf_event_context->[time\|timestamp\|timeoffset] to get accurate time_enabled and time_running for an event. The difference between ctx->timestamp and ctx->time is the among of time when the context is not enabled. __update_context_time(ctx, false) is used to increase timestamp, but not time. Therefore, it should only be called in ctx_sched_in() when EVENT_TIME was not enabled. Fixes: `09f5e7dc7a` ("perf: Fix perf_event_read_local() time") Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/r/20230313171608.298734-1-song@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:48:59 +02:00
Yang Jihong	ff8137727a	perf/core: Fix perf_output_begin parameter is incorrectly invoked in perf_event_bpf_output [ Upstream commit `eb81a2ed4f` ] syzkaller reportes a KASAN issue with stack-out-of-bounds. The call trace is as follows: dump_stack+0x9c/0xd3 print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 __perf_event_header__init_id+0x34/0x290 perf_event_header__init_id+0x48/0x60 perf_output_begin+0x4a4/0x560 perf_event_bpf_output+0x161/0x1e0 perf_iterate_sb_cpu+0x29e/0x340 perf_iterate_sb+0x4c/0xc0 perf_event_bpf_event+0x194/0x2c0 __bpf_prog_put.constprop.0+0x55/0xf0 __cls_bpf_delete_prog+0xea/0x120 [cls_bpf] cls_bpf_delete_prog_work+0x1c/0x30 [cls_bpf] process_one_work+0x3c2/0x730 worker_thread+0x93/0x650 kthread+0x1b8/0x210 ret_from_fork+0x1f/0x30 commit `267fb27352` ("perf: Reduce stack usage of perf_output_begin()") use on-stack struct perf_sample_data of the caller function. However, perf_event_bpf_output uses incorrect parameter to convert small-sized data (struct perf_bpf_event) into large-sized data (struct perf_sample_data), which causes memory overwriting occurs in __perf_event_header__init_id. Fixes: `267fb27352` ("perf: Reduce stack usage of perf_output_begin()") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230314044735.56551-1-yangjihong1@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:48:59 +02:00
Konrad Dybcio	e13d1b6979	interconnect: qcom: qcm2290: Fix MASTER_SNOC_BIMC_NRT [ Upstream commit `633a12fda6` ] Due to what seems to be a copy-paste error, the _NRT master was identical to the _RT master, which should not be the case.. Fix it using the values available from the downstream kernel [1]. [1] https://android.googlesource.com/kernel/msm-extra/devicetree/+/refs/heads/android-msm-bramble-4.19-android11-qpr1/qcom/scuba-bus.dtsi#127 Fixes: `1a14b1ac39` ("interconnect: qcom: Add QCM2290 driver support") Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Acked-by: Shawn Guo <shawn.guo@linaro.org> Link: https://lore.kernel.org/r/20230103142120.15605-1-konrad.dybcio@linaro.org Signed-off-by: Georgi Djakov <djakov@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:48:59 +02:00
Dmitry Baryshkov	e598cf599a	interconnect: qcom: sm8450: switch to qcom_icc_rpmh_* function [ Upstream commit `87e8fab191` ] Change sm8450 interconnect driver to use generic qcom_icc_rpmh_* functions rather than embedding a copy of thema. This also fixes an overallocation of memory for icc_onecell_data structure. Fixes: `fafc114a46` ("interconnect: qcom: Add SM8450 interconnect provider driver") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20230105002221.1416479-3-dmitry.baryshkov@linaro.org Signed-off-by: Georgi Djakov <djakov@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:48:59 +02:00
Dmitry Baryshkov	d4c3aaee17	interconnect: qcom: osm-l3: fix icc_onecell_data allocation [ Upstream commit `f77ebdda0e` ] This is a struct with a trailing zero-length array of icc_node pointers but it's allocated as if it were a single array of icc_nodes instead. Fortunately this overallocates memory rather then allocating less memory than required. Fix by replacing devm_kcalloc() with devm_kzalloc() and struct_size() macro. Fixes: `5bc9900add` ("interconnect: qcom: Add OSM L3 interconnect provider support") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20230105002221.1416479-2-dmitry.baryshkov@linaro.org Signed-off-by: Georgi Djakov <djakov@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-30 12:48:59 +02:00
Greg Kroah-Hartman	345103eb06	Revert "Revert "wait: Return number of exclusive waiters awaken"" This reverts commit `2b47e2bee0`. It was perserving the ABI, but that is not needed anymore at this point in time. Change-Id: I7efd4dd7abde9f5baa37cb3731aa40b7ff94d2bb Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-30 10:37:08 +00:00
Greg Kroah-Hartman	73bffa9caf	Revert "Revert "kobject: modify kobject_get_path() to take a const *"" This reverts commit `e7db10fe57`. It was perserving the ABI, but that is not needed anymore at this point in time. Change-Id: Ifc79e504dbf17466a88ac76162ed77dcb5c13d19 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-30 10:37:08 +00:00
Yu Zhao	3792ff78b1	UPSTREAM: mm: multi-gen LRU: avoid futile retries Recall that the per-node memcg LRU has two generations and they alternate when the last memcg (of a given node) is moved from one to the other. Each generation is also sharded into multiple bins to improve scalability. A reclaimer starts with a random bin (in the old generation) and, if it fails, it will retry, i.e., to try the rest of the bins. If a reclaimer fails with the last memcg, it should move this memcg to the young generation first, which causes the generations to alternate, and then retry. Otherwise, the retries will be futile because all other bins are empty. Link: https://lkml.kernel.org/r/20230213075322.1416966-1-yuzhao@google.com Fixes: `e4dde56cd2` ("mm: multi-gen LRU: per-node lru_gen_folio lists") Signed-off-by: Yu Zhao <yuzhao@google.com> Reported-by: T.J. Mercier <tjmercier@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Bug: 274865848 (cherry picked from commit `9f550d78b4`) Change-Id: Ie92535676b005ec9e7987632b742fdde8d54436f Signed-off-by: T.J. Mercier <tjmercier@google.com>	2023-03-30 10:23:50 +00:00
Yu Zhao	599cea335f	UPSTREAM: mm: multi-gen LRU: simplify arch_has_hw_pte_young() check Scanning page tables when hardware does not set the accessed bit has no real use cases. Link: https://lkml.kernel.org/r/20221222041905.2431096-9-yuzhao@google.com Signed-off-by: Yu Zhao <yuzhao@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Larabel <Michael@MichaelLarabel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Bug: 274865848 (cherry picked from commit `f386e93140`) Change-Id: I84d97ab665b4e3bb862a9bc7d72f50dea7191a6b Signed-off-by: T.J. Mercier <tjmercier@google.com>	2023-03-30 10:23:50 +00:00
Yu Zhao	6573287e54	BACKPORT: mm: multi-gen LRU: clarify scan_control flags Among the flags in scan_control: 1. sc->may_swap, which indicates swap constraint due to memsw.max, is supported as usual. 2. sc->proactive, which indicates reclaim by memory.reclaim, may not opportunistically skip the aging path, since it is considered less latency sensitive. 3. !(sc->gfp_mask & __GFP_IO), which indicates IO constraint, lowers swappiness to prioritize file LRU, since clean file folios are more likely to exist. 4. sc->may_writepage and sc->may_unmap, which indicates opportunistic reclaim, are rejected, since unmapped clean folios are already prioritized. Scanning for more of them is likely futile and can cause high reclaim latency when there is a large number of memcgs. The rest are handled by the existing code. Link: https://lkml.kernel.org/r/20221222041905.2431096-8-yuzhao@google.com Signed-off-by: Yu Zhao <yuzhao@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Larabel <Michael@MichaelLarabel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Bug: 274865848 (cherry picked from commit `e9d4e1ee78`) [TJ: Resolved conflict with older function signature for min_cgroup_below_min, and over `cdded86118` ("ANDROID: MGLRU: Don't skip anon reclaim if swap low")] Change-Id: Ic2e779eaf4e91a3921831b4e2fa10c740dc59d50 Signed-off-by: T.J. Mercier <tjmercier@google.com>	2023-03-30 10:23:50 +00:00
Yu Zhao	014f97baad	BACKPORT: mm: multi-gen LRU: per-node lru_gen_folio lists For each node, memcgs are divided into two generations: the old and the young. For each generation, memcgs are randomly sharded into multiple bins to improve scalability. For each bin, an RCU hlist_nulls is virtually divided into three segments: the head, the tail and the default. An onlining memcg is added to the tail of a random bin in the old generation. The eviction starts at the head of a random bin in the old generation. The per-node memcg generation counter, whose reminder (mod 2) indexes the old generation, is incremented when all its bins become empty. There are four operations: 1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in its current generation (old or young) and updates its "seg" to "head"; 2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in its current generation (old or young) and updates its "seg" to "tail"; 3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in the old generation, updates its "gen" to "old" and resets its "seg" to "default"; 4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin in the young generation, updates its "gen" to "young" and resets its "seg" to "default". The events that trigger the above operations are: 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD; 2. The first attempt to reclaim an memcg below low, which triggers MEMCG_LRU_TAIL; 3. The first attempt to reclaim an memcg below reclaimable size threshold, which triggers MEMCG_LRU_TAIL; 4. The second attempt to reclaim an memcg below reclaimable size threshold, which triggers MEMCG_LRU_YOUNG; 5. Attempting to reclaim an memcg below min, which triggers MEMCG_LRU_YOUNG; 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG; 7. Offlining an memcg, which triggers MEMCG_LRU_OLD. Note that memcg LRU only applies to global reclaim, and the round-robin incrementing of their max_seq counters ensures the eventual fairness to all eligible memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). Link: https://lkml.kernel.org/r/20221222041905.2431096-7-yuzhao@google.com Signed-off-by: Yu Zhao <yuzhao@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Larabel <Michael@MichaelLarabel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Bug: 274865848 (cherry picked from commit `e4dde56cd2`) [TJ: Resolved conflicts with older function signatures for min_cgroup_below_min / min_cgroup_below_low and includes] Change-Id: Idc8a0f635e035d72dd911f807d1224cb47cbd655 Signed-off-by: T.J. Mercier <tjmercier@google.com>	2023-03-30 10:23:50 +00:00
Yu Zhao	8d0d562dd3	UPSTREAM: mm: multi-gen LRU: shuffle should_run_aging() Move should_run_aging() next to its only caller left. Link: https://lkml.kernel.org/r/20221222041905.2431096-6-yuzhao@google.com Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Larabel <Michael@MichaelLarabel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Bug: 274865848 (cherry picked from commit `77d4459a4a`) Signed-off-by: T.J. Mercier <tjmercier@google.com> Change-Id: I3b0383fe16b93a783b4d8c0b3a0b325160392576 Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: T.J. Mercier <tjmercier@google.com>	2023-03-30 10:23:50 +00:00
Yu Zhao	5e9fed8a03	BACKPORT: mm: multi-gen LRU: remove aging fairness safeguard Recall that the aging produces the youngest generation: first it scans for accessed folios and updates their gen counters; then it increments lrugen->max_seq. The current aging fairness safeguard for kswapd uses two passes to ensure the fairness to multiple eligible memcgs. On the first pass, which is shared with the eviction, it checks whether all eligible memcgs are low on cold folios. If so, it requires a second pass, on which it ages all those memcgs at the same time. With memcg LRU, the aging, while ensuring eventual fairness, will run when necessary. Therefore the current aging fairness safeguard for kswapd will not be needed. Note that memcg LRU only applies to global reclaim. For memcg reclaim, the aging can be unfair to different memcgs, i.e., their lrugen->max_seq can be incremented at different paces. Link: https://lkml.kernel.org/r/20221222041905.2431096-5-yuzhao@google.com Signed-off-by: Yu Zhao <yuzhao@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Larabel <Michael@MichaelLarabel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Bug: 274865848 (cherry picked from commit `7348cc9182`) [TJ: Resolved conflicts with older function signatures for min_cgroup_below_min / min_cgroup_below_low] Change-Id: I6e36ecfbaaefbc0a56d9a9d5d7cbe404ed7f57a5 Signed-off-by: T.J. Mercier <tjmercier@google.com>	2023-03-30 10:23:50 +00:00
Yu Zhao	ce1cc5d887	UPSTREAM: mm: multi-gen LRU: remove eviction fairness safeguard Recall that the eviction consumes the oldest generation: first it bucket-sorts folios whose gen counters were updated by the aging and reclaims the rest; then it increments lrugen->min_seq. The current eviction fairness safeguard for global reclaim has a dilemma: when there are multiple eligible memcgs, should it continue or stop upon meeting the reclaim goal? If it continues, it overshoots and increases direct reclaim latency; if it stops, it loses fairness between memcgs it has taken memory away from and those it has yet to. With memcg LRU, the eviction, while ensuring eventual fairness, will stop upon meeting its goal. Therefore the current eviction fairness safeguard for global reclaim will not be needed. Note that memcg LRU only applies to global reclaim. For memcg reclaim, the eviction will continue, even if it is overshooting. This becomes unconditional due to code simplification. Link: https://lkml.kernel.org/r/20221222041905.2431096-4-yuzhao@google.com Signed-off-by: Yu Zhao <yuzhao@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Larabel <Michael@MichaelLarabel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Bug: 274865848 (cherry picked from commit `a579086c99`) Change-Id: I08ac1b3c90e29cafd0566785aaa4bcdb5db7d22c Signed-off-by: T.J. Mercier <tjmercier@google.com>	2023-03-30 10:23:50 +00:00

... 23 24 25 26 27 ...

1149366 Commits