linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-07 11:26:02 +09:00

Author	SHA1	Message	Date
Greg Kroah-Hartman	3ccfc59f82	Merge 5.10.24 into android12-5.10-lts Changes in 5.10.24 uapi: nfnetlink_cthelper.h: fix userspace compilation error powerpc/perf: Fix handling of privilege level checks in perf interrupt context powerpc/pseries: Don't enforce MSI affinity with kdump ethernet: alx: fix order of calls on resume crypto: mips/poly1305 - enable for all MIPS processors ath9k: fix transmitting to stations in dynamic SMPS mode net: Fix gro aggregation for udp encaps with zero csum net: check if protocol extracted by virtio_net_hdr_set_proto is correct net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0 net: l2tp: reduce log level of messages in receive path, add counter instead can: skb: can_skb_set_owner(): fix ref counting if socket was closed before setting skb ownership can: flexcan: assert FRZ bit in flexcan_chip_freeze() can: flexcan: enable RX FIFO after FRZ/HALT valid can: flexcan: invoke flexcan_chip_freeze() to enter freeze mode can: tcan4x5x: tcan4x5x_init(): fix initialization - clear MRAM before entering Normal Mode tcp: Fix sign comparison bug in getsockopt(TCP_ZEROCOPY_RECEIVE) tcp: add sanity tests to TCP_QUEUE_SEQ netfilter: nf_nat: undo erroneous tcp edemux lookup netfilter: x_tables: gpf inside xt_find_revision() net: always use icmp{,v6}_ndo_send from ndo_start_xmit net: phy: fix save wrong speed and duplex problem if autoneg is on selftests/bpf: Use the last page in test_snprintf_btf on s390 selftests/bpf: No need to drop the packet when there is no geneve opt selftests/bpf: Mask bpf_csum_diff() return value to 16 bits in test_verifier samples, bpf: Add missing munmap in xdpsock libbpf: Clear map_info before each bpf_obj_get_info_by_fd ibmvnic: Fix possibly uninitialized old_num_tx_queues variable warning. ibmvnic: always store valid MAC address mt76: dma: do not report truncated frames to mac80211 powerpc/603: Fix protection of user pages mapped with PROT_NONE mount: fix mounting of detached mounts onto targets that reside on shared mounts cifs: return proper error code in statfs(2) Revert "mm, slub: consider rest of partial list if acquire_slab() fails" docs: networking: drop special stable handling net: dsa: tag_rtl4_a: fix egress tags sh_eth: fix TRSCER mask for SH771x net: enetc: don't overwrite the RSS indirection table when initializing net: enetc: take the MDIO lock only once per NAPI poll cycle net: enetc: fix incorrect TPID when receiving 802.1ad tagged packets net: enetc: don't disable VLAN filtering in IFF_PROMISC mode net: enetc: force the RGMII speed and duplex instead of operating in inband mode net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr net: enetc: keep RX ring consumer index in sync with hardware net: ethernet: mtk-star-emac: fix wrong unmap in RX handling net/mlx4_en: update moderation when config reset net: stmmac: fix incorrect DMA channel intr enable setting of EQoS v4.10 nexthop: Do not flush blackhole nexthops when loopback goes down net: sched: avoid duplicates in classes dump net: mscc: ocelot: properly reject destination IP keys in VCAP IS1 net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10 net: usb: qmi_wwan: allow qmimux add/del with master up netdevsim: init u64 stats for 32bit hardware cipso,calipso: resolve a number of problems with the DOI refcounts net: stmmac: Fix VLAN filter delete timeout issue in Intel mGBE SGMII stmmac: intel: Fixes clock registration error seen for multiple interfaces net: lapbether: Remove netif_start_queue / netif_stop_queue net: davicom: Fix regulator not turned off on failed probe net: davicom: Fix regulator not turned off on driver removal net: enetc: allow hardware timestamping on TX queues with tc-etf enabled net: qrtr: fix error return code of qrtr_sendmsg() s390/qeth: fix memory leak after failed TX Buffer allocation r8169: fix r8168fp_adjust_ocp_cmd function ixgbe: fail to create xfrm offload of IPsec tunnel mode SA tools/resolve_btfids: Fix build error with older host toolchains perf build: Fix ccache usage in $(CC) when generating arch errno table net: stmmac: stop each tx channel independently net: stmmac: fix watchdog timeout during suspend/resume stress test net: stmmac: fix wrongly set buffer2 valid when sph unsupport ethtool: fix the check logic of at least one channel for RX/TX net: phy: make mdio_bus_phy_suspend/resume as __maybe_unused selftests: forwarding: Fix race condition in mirror installation mlxsw: spectrum_ethtool: Add an external speed to PTYS register perf traceevent: Ensure read cmdlines are null terminated. perf report: Fix -F for branch & mem modes net: hns3: fix query vlan mask value error for flow director net: hns3: fix bug when calculating the TCAM table info s390/cio: return -EFAULT if copy_to_user() fails again bnxt_en: reliably allocate IRQ table on reset to avoid crash gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk gpiolib: acpi: Allow to find GpioInt() resource by name and index gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2 gpio: fix gpio-device list corruption drm/compat: Clear bounce structures drm/amd/display: Add a backlight module option drm/amdgpu/display: use GFP_ATOMIC in dcn21_validate_bandwidth_fp() drm/amd/display: Fix nested FPU context in dcn21_validate_bandwidth() drm/amd/pm: bug fix for pcie dpm drm/amdgpu/display: simplify backlight setting drm/amdgpu/display: don't assert in set backlight function drm/amdgpu/display: handle aux backlight in backlight_get_brightness drm/shmem-helper: Check for purged buffers in fault handler drm/shmem-helper: Don't remove the offset in vm_area_struct pgoff drm: Use USB controller's DMA mask when importing dmabufs drm: meson_drv add shutdown function drm/shmem-helpers: vunmap: Don't put pages for dma-buf drm/i915: Wedge the GPU if command parser setup fails s390/cio: return -EFAULT if copy_to_user() fails s390/crypto: return -EFAULT if copy_to_user() fails qxl: Fix uninitialised struct field head.surface_id sh_eth: fix TRSCER mask for R7S9210 media: usbtv: Fix deadlock on suspend media: rkisp1: params: fix wrong bits settings media: v4l: vsp1: Fix uif null pointer access media: v4l: vsp1: Fix bru null pointer access media: rc: compile rc-cec.c into rc-core cifs: fix credit accounting for extra channel net: hns3: fix error mask definition of flow director s390/qeth: don't replace a fully completed async TX buffer s390/qeth: remove QETH_QDIO_BUF_HANDLED_DELAYED state s390/qeth: improve completion of pending TX buffers s390/qeth: fix notification for pending buffers during teardown net: dsa: implement a central TX reallocation procedure net: dsa: tag_ksz: don't allocate additional memory for padding/tagging net: dsa: trailer: don't allocate additional memory for padding/tagging net: dsa: tag_qca: let DSA core deal with TX reallocation net: dsa: tag_ocelot: let DSA core deal with TX reallocation net: dsa: tag_mtk: let DSA core deal with TX reallocation net: dsa: tag_lan9303: let DSA core deal with TX reallocation net: dsa: tag_edsa: let DSA core deal with TX reallocation net: dsa: tag_brcm: let DSA core deal with TX reallocation net: dsa: tag_dsa: let DSA core deal with TX reallocation net: dsa: tag_gswip: let DSA core deal with TX reallocation net: dsa: tag_ar9331: let DSA core deal with TX reallocation net: dsa: tag_mtk: fix 802.1ad VLAN egress enetc: Fix unused var build warning for CONFIG_OF net: enetc: initialize RFS/RSS memories for unused ports too ath11k: peer delete synchronization with firmware ath11k: start vdev if a bss peer is already created ath11k: fix AP mode for QCA6390 i2c: rcar: faster irq code to minimize HW race condition i2c: rcar: optimize cacheline to minimize HW race condition scsi: ufs: WB is only available on LUN #0 to #7 udf: fix silent AED tagLocation corruption iommu/vt-d: Clear PRQ overflow only when PRQ is empty mmc: mxs-mmc: Fix a resource leak in an error handling path in 'mxs_mmc_probe()' mmc: mediatek: fix race condition between msdc_request_timeout and irq mmc: sdhci-iproc: Add ACPI bindings for the RPi Platform: OLPC: Fix probe error handling powerpc/pci: Add ppc_md.discover_phbs() spi: stm32: make spurious and overrun interrupts visible powerpc: improve handling of unrecoverable system reset powerpc/perf: Record counter overflow always if SAMPLE_IP is unset HID: logitech-dj: add support for the new lightspeed connection iteration powerpc/64: Fix stack trace not displaying final frame iommu/amd: Fix performance counter initialization clk: qcom: gdsc: Implement NO_RET_PERIPH flag sparc32: Limit memblock allocation to low memory sparc64: Use arch_validate_flags() to validate ADI flag Input: applespi - don't wait for responses to commands indefinitely. PCI: xgene-msi: Fix race in installing chained irq handler PCI: mediatek: Add missing of_node_put() to fix reference leak drivers/base: build kunit tests without structleak plugin PCI/LINK: Remove bandwidth notification ext4: don't try to processed freed blocks until mballoc is initialized kbuild: clamp SUBLEVEL to 255 PCI: Fix pci_register_io_range() memory leak i40e: Fix memory leak in i40e_probe kasan: fix memory corruption in kasan_bitops_tags test s390/smp: __smp_rescan_cpus() - move cpumask away from stack drivers/base/memory: don't store phys_device in memory blocks sysctl.c: fix underflow value setting risk in vm_table scsi: libiscsi: Fix iscsi_prep_scsi_cmd_pdu() error handling scsi: target: core: Add cmd length set before cmd complete scsi: target: core: Prevent underflow for service actions clk: qcom: gpucc-msm8998: Add resets, cxc, fix flags on gpu_gx_gdsc mmc: sdhci: Update firmware interface API ARM: 9029/1: Make iwmmxt.S support Clang's integrated assembler ARM: assembler: introduce adr_l, ldr_l and str_l macros ARM: efistub: replace adrl pseudo-op with adr_l macro invocation ALSA: usb: Add Plantronics C320-M USB ctrl msg delay quirk ALSA: hda/hdmi: Cancel pending works before suspend ALSA: hda/conexant: Add quirk for mute LED control on HP ZBook G5 ALSA: hda/ca0132: Add Sound BlasterX AE-5 Plus support ALSA: hda: Drop the BATCH workaround for AMD controllers ALSA: hda: Flush pending unsolicited events before suspend ALSA: hda: Avoid spurious unsol event handling during S3/S4 ALSA: usb-audio: Fix "cannot get freq eq" errors on Dell AE515 sound bar ALSA: usb-audio: Apply the control quirk to Plantronics headsets ALSA: usb-audio: Disable USB autosuspend properly in setup_disable_autosuspend() ALSA: usb-audio: fix NULL ptr dereference in usb_audio_probe ALSA: usb-audio: fix use after free in usb_audio_disconnect Revert `95ebabde38` ("capabilities: Don't allow writing ambiguous v3 file capabilities") block: Discard page cache of zone reset target range block: Try to handle busy underlying device on discard arm64: kasan: fix page_alloc tagging with DEBUG_VIRTUAL arm64: mte: Map hotplugged memory as Normal Tagged arm64: perf: Fix 64-bit event counter read truncation s390/dasd: fix hanging DASD driver unbind s390/dasd: fix hanging IO request during DASD driver unbind software node: Fix node registration xen/events: reset affinity of 2-level event when tearing it down mmc: mmci: Add MMC_CAP_NEED_RSP_BUSY for the stm32 variants mmc: core: Fix partition switch time for eMMC mmc: cqhci: Fix random crash when remove mmc module/card cifs: do not send close in compound create+close requests Goodix Fingerprint device is not a modem USB: gadget: udc: s3c2410_udc: fix return value check in s3c2410_udc_probe() USB: gadget: u_ether: Fix a configfs return code usb: gadget: f_uac2: always increase endpoint max_packet_size by one audio slot usb: gadget: f_uac1: stop playback on function disable usb: dwc3: qcom: Add missing DWC3 OF node refcount decrement usb: dwc3: qcom: add URS Host support for sdm845 ACPI boot usb: dwc3: qcom: add ACPI device id for sc8180x usb: dwc3: qcom: Honor wakeup enabled/disabled state USB: usblp: fix a hang in poll() if disconnected usb: renesas_usbhs: Clear PIPECFG for re-enabling pipe with other EPNUM usb: xhci: do not perform Soft Retry for some xHCI hosts xhci: Improve detection of device initiated wake signal. usb: xhci: Fix ASMedia ASM1042A and ASM3242 DMA addressing xhci: Fix repeated xhci wake after suspend due to uncleared internal wake state USB: serial: io_edgeport: fix memory leak in edge_startup USB: serial: ch341: add new Product ID USB: serial: cp210x: add ID for Acuity Brands nLight Air Adapter USB: serial: cp210x: add some more GE USB IDs usbip: fix stub_dev to check for stream socket usbip: fix vhci_hcd to check for stream socket usbip: fix vudc to check for stream socket usbip: fix stub_dev usbip_sockfd_store() races leading to gpf usbip: fix vhci_hcd attach_store() races leading to gpf usbip: fix vudc usbip_sockfd_store races leading to gpf Revert "serial: max310x: rework RX interrupt handling" misc/pvpanic: Export module FDT device table misc: fastrpc: restrict user apps from sending kernel RPC messages staging: rtl8192u: fix ->ssid overflow in r8192_wx_set_scan() staging: rtl8188eu: prevent ->ssid overflow in rtw_wx_set_scan() staging: rtl8712: unterminated string leads to read overflow staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data() staging: ks7010: prevent buffer overflow in ks_wlan_set_scan() staging: rtl8712: Fix possible buffer overflow in r8712_sitesurvey_cmd staging: rtl8192e: Fix possible buffer overflow in _rtl92e_wx_set_scan staging: comedi: addi_apci_1032: Fix endian problem for COS sample staging: comedi: addi_apci_1500: Fix endian problem for command sample staging: comedi: adv_pci1710: Fix endian problem for AI command data staging: comedi: das6402: Fix endian problem for AI command data staging: comedi: das800: Fix endian problem for AI command data staging: comedi: dmm32at: Fix endian problem for AI command data staging: comedi: me4000: Fix endian problem for AI command data staging: comedi: pcl711: Fix endian problem for AI command data staging: comedi: pcl818: Fix endian problem for AI command data sh_eth: fix TRSCER mask for R7S72100 cpufreq: qcom-hw: fix dereferencing freed memory 'data' cpufreq: qcom-hw: Fix return value check in qcom_cpufreq_hw_cpu_init() arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory SUNRPC: Set memalloc_nofs_save() for sync tasks NFS: Don't revalidate the directory permissions on a lookup failure NFS: Don't gratuitously clear the inode cache when lookup failed NFSv4.2: fix return value of _nfs4_get_security_label() block: rsxx: fix error return code of rsxx_pci_probe() nvme-fc: fix racing controller reset and create association configfs: fix a use-after-free in __configfs_open_file arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds perf/core: Flush PMU internal buffers for per-CPU events perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event() powerpc/64s/exception: Clean up a missed SRR specifier seqlock,lockdep: Fix seqcount_latch_init() stop_machine: mark helpers __always_inline include/linux/sched/mm.h: use rcu_dereference in in_vfork() zram: fix return value on writeback_store linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP* sched/membarrier: fix missing local execution of ipi_sync_rq_state() efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table powerpc/64s: Fix instruction encoding for lis in ppc_function_entry() powerpc: Fix inverted SET_FULL_REGS bitop powerpc: Fix missing declaration of [en/dis]able_kernel_vsx() binfmt_misc: fix possible deadlock in bm_register_write x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2 x86/sev-es: Introduce ip_within_syscall_gap() helper x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack x86/entry: Move nmi entry/exit into common code x86/sev-es: Correctly track IRQ states in runtime #VC handler x86/sev-es: Use __copy_from_user_inatomic() x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls KVM: x86: Ensure deadline timer has truly expired before posting its IRQ KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged KVM: arm64: Fix range alignment when walking page tables KVM: arm64: Avoid corrupting vCPU context register in guest exit KVM: arm64: nvhe: Save the SPE context early KVM: arm64: Reject VM creation when the default IPA size is unsupported KVM: arm64: Fix exclusive limit for IPA size mm/userfaultfd: fix memory corruption due to writeprotect mm/madvise: replace ptrace attach requirement for process_madvise KVM: arm64: Ensure I-cache isolation between vcpus of a same VM mm/page_alloc.c: refactor initialization of struct page for holes in memory layout xen/events: don't unmask an event channel when an eoi is pending xen/events: avoid handling the same event on two cpus at the same time KVM: arm64: Fix nVHE hyp panic host context restore RDMA/umem: Use ib_dma_max_seg_size instead of dma_get_max_seg_size Linux 5.10.24 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ie53a3c1963066a18d41357b6be41cff00690bd40	2021-03-19 09:42:56 +01:00
Greg Kroah-Hartman	05d125f752	Linux 5.10.24 Tested-by: Salvatore Bonaccorso <carnil@debian.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Pavel Machek (CIP) <pavel@denx.de> Tested-by: Jason Self <jason@bluehome.net> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Hulk Robot <hulkrobot@huawei.com> Tested-by: Ross Schmidt <ross.schm.dev@gmail.com> Link: https://lore.kernel.org/r/20210315135541.921894249@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:37 +01:00
Christoph Hellwig	1c0899636d	RDMA/umem: Use ib_dma_max_seg_size instead of dma_get_max_seg_size commit `b116c70279` upstream. RDMA ULPs must not call DMA mapping APIs directly but instead use the ib_dma_* wrappers. Fixes: `0c16d9635e` ("RDMA/umem: Move to allocate SG table from pages") Link: https://lore.kernel.org/r/20201106181941.1878556-3-hch@lst.de Reported-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Cc: "Marciniszyn, Mike" <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:37 +01:00
Andrew Scull	1dbce9ba2a	KVM: arm64: Fix nVHE hyp panic host context restore Commit `c4b000c392` upstream. When panicking from the nVHE hyp and restoring the host context, x29 is expected to hold a pointer to the host context. This wasn't being done so fix it to make sure there's a valid pointer the host context being used. Rather than passing a boolean indicating whether or not the host context should be restored, instead pass the pointer to the host context. NULL is passed to indicate that no context should be restored. Fixes: `a2e102e20f` ("KVM: arm64: nVHE: Handle hyp panics") Cc: stable@vger.kernel.org # 5.10.y only Signed-off-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210219122406.1337626-1-ascull@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:37 +01:00
Juergen Gross	f67e5243d0	xen/events: avoid handling the same event on two cpus at the same time commit `b6622798bc` upstream. When changing the cpu affinity of an event it can happen today that (with some unlucky timing) the same event will be handled on the old and the new cpu at the same time. Avoid that by adding an "event active" flag to the per-event data and call the handler only if this flag isn't set. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Link: https://lore.kernel.org/r/20210306161833.4552-4-jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:37 +01:00
Juergen Gross	30cdb862e8	xen/events: don't unmask an event channel when an eoi is pending commit `25da4618af` upstream. An event channel should be kept masked when an eoi is pending for it. When being migrated to another cpu it might be unmasked, though. In order to avoid this keep three different flags for each event channel to be able to distinguish "normal" masking/unmasking from eoi related masking/unmasking and temporary masking. The event channel should only be able to generate an interrupt if all flags are cleared. Cc: stable@vger.kernel.org Fixes: `54c9de8989` ("xen/events: add a new "late EOI" evtchn framework") Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: Ross Lagerwall <ross.lagerwall@citrix.com> Link: https://lore.kernel.org/r/20210306161833.4552-3-jgross@suse.com [boris -- corrected Fixed tag format] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:37 +01:00
Mike Rapoport	4c84191cbc	mm/page_alloc.c: refactor initialization of struct page for holes in memory layout commit `0740a50b9b` upstream. There could be struct pages that are not backed by actual physical memory. This can happen when the actual memory bank is not a multiple of SECTION_SIZE or when an architecture does not register memory holes reserved by the firmware as memblock.memory. Such pages are currently initialized using init_unavailable_mem() function that iterates through PFNs in holes in memblock.memory and if there is a struct page corresponding to a PFN, the fields of this page are set to default values and it is marked as Reserved. init_unavailable_mem() does not take into account zone and node the page belongs to and sets both zone and node links in struct page to zero. Before commit `73a6e474cb` ("mm: memmap_init: iterate over memblock regions rather that check each PFN") the holes inside a zone were re-initialized during memmap_init() and got their zone/node links right. However, after that commit nothing updates the struct pages representing such holes. On a system that has firmware reserved holes in a zone above ZONE_DMA, for instance in a configuration below: # grep -A1 E820 /proc/iomem 7a17b000-7a216fff : Unknown E820 type 7a217000-7bffffff : System RAM unset zone link in struct page will trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); in set_pfnblock_flags_mask() when called with a struct page from a range other than E820_TYPE_RAM because there are pages in the range of ZONE_DMA32 but the unset zone link in struct page makes them appear as a part of ZONE_DMA. Interleave initialization of the unavailable pages with the normal initialization of memory map, so that zone and node information will be properly set on struct pages that are not backed by the actual memory. With this change the pages for holes inside a zone will get proper zone/node links and the pages that are not spanned by any node will get links to the adjacent zone/node. The holes between nodes will be prepended to the zone/node above the hole and the trailing pages in the last section that will be appended to the zone/node below. [akpm@linux-foundation.org: don't initialize static to zero, use %llu for u64] Link: https://lkml.kernel.org/r/20210225224351.7356-2-rppt@kernel.org Fixes: `73a6e474cb` ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reported-by: Qian Cai <cai@lca.pw> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Łukasz Majczak <lma@semihalf.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Sarvela, Tomi P" <tomi.p.sarvela@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:37 +01:00
Marc Zyngier	e7afadd0db	KVM: arm64: Ensure I-cache isolation between vcpus of a same VM Commit `01dc9262ff` upstream. It recently became apparent that the ARMv8 architecture has interesting rules regarding attributes being used when fetching instructions if the MMU is off at Stage-1. In this situation, the CPU is allowed to fetch from the PoC and allocate into the I-cache (unless the memory is mapped with the XN attribute at Stage-2). If we transpose this to vcpus sharing a single physical CPU, it is possible for a vcpu running with its MMU off to influence another vcpu running with its MMU on, as the latter is expected to fetch from the PoU (and self-patching code doesn't flush below that level). In order to solve this, reuse the vcpu-private TLB invalidation code to apply the same policy to the I-cache, nuking it every time the vcpu runs on a physical CPU that ran another vcpu of the same VM in the past. This involve renaming __kvm_tlb_flush_local_vmid() to __kvm_flush_cpu_context(), and inserting a local i-cache invalidation there. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210303164505.68492-1-maz@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:37 +01:00
Suren Baghdasaryan	518f98e390	mm/madvise: replace ptrace attach requirement for process_madvise commit `96cfe2c0fd` upstream. process_madvise currently requires ptrace attach capability. PTRACE_MODE_ATTACH gives one process complete control over another process. It effectively removes the security boundary between the two processes (in one direction). Granting ptrace attach capability even to a system process is considered dangerous since it creates an attack surface. This severely limits the usage of this API. The operations process_madvise can perform do not affect the correctness of the operation of the target process; they only affect where the data is physically located (and therefore, how fast it can be accessed). What we want is the ability for one process to influence another process in order to optimize performance across the entire system while leaving the security boundary intact. Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process performance. Link: https://lkml.kernel.org/r/20210303185807.2160264-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: <stable@vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:37 +01:00
Nadav Amit	2aaa79f694	mm/userfaultfd: fix memory corruption due to writeprotect commit `6ce64428d6` upstream. Userfaultfd self-test fails occasionally, indicating a memory corruption. Analyzing this problem indicates that there is a real bug since mmap_lock is only taken for read in mwriteprotect_range() and defers flushes, and since there is insufficient consideration of concurrent deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in wp_page_copy(), this flush takes place after the copy has already been performed, and therefore changes of the page are possible between the time of the copy and the time in which the PTE is flushed. To make matters worse, memory-unprotection using userfaultfd also poses a problem. Although memory unprotection is logically a promotion of PTE permissions, and therefore should not require a TLB flush, the current userrfaultfd code might actually cause a demotion of the architectural PTE permission: when userfaultfd_writeprotect() unprotects memory region, it unintentionally clears the RW-bit if it was already set. Note that this unprotecting a PTE that is not write-protected is a valid use-case: the userfaultfd monitor might ask to unprotect a region that holds both write-protected and write-unprotected PTEs. The scenario that happens in selftests/vm/userfaultfd is as follows: cpu0 cpu1 cpu2 ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-unprotect ] mwriteprotect_range() mmap_read_lock() change_protection() change_protection_range() ... change_pte_range() [ clear “write”-bit ] [ defer TLB flushes ] [ page-fault ] ... wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] ... set_pte_at_notify() A similar scenario can happen: cpu0 cpu1 cpu2 cpu3 ---- ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-protect ] [ deferred TLB flush ] userfaultfd_writeprotect() [ write-unprotect ] [ deferred TLB flush] [ page-fault ] wp_page_copy() cow_user_page() [ copy page ] ... [ write to page ] set_pte_at_notify() This race exists since commit `292924b260` ("userfaultfd: wp: apply _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent since commit `09854ba94c` ("mm: do_wp_page() simplification") which made wp_page_copy() more likely to take place, specifically if page_count(page) > 1. To resolve the aforementioned races, check whether there are pending flushes on uffd-write-protected VMAs, and if there are, perform a flush before doing the COW. Further optimizations will follow to avoid during uffd-write-unprotect unnecassary PTE write-protection and TLB flushes. Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com Fixes: `09854ba94c` ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit <namit@vmware.com> Suggested-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> [5.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:37 +01:00
Marc Zyngier	c3d70b1bf1	KVM: arm64: Fix exclusive limit for IPA size commit `262b003d05` upstream. When registering a memslot, we check the size and location of that memslot against the IPA size to ensure that we can provide guest access to the whole of the memory. Unfortunately, this check rejects memslot that end-up at the exact limit of the addressing capability for a given IPA size. For example, it refuses the creation of a 2GB memslot at 0x8000000 with a 32bit IPA space. Fix it by relaxing the check to accept a memslot reaching the limit of the IPA space. Fixes: `c3058d5da2` ("arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE") Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20210311100016.3830038-3-maz@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:36 +01:00
Marc Zyngier	ada8817ab6	KVM: arm64: Reject VM creation when the default IPA size is unsupported commit `7d717558dd` upstream. KVM/arm64 has forever used a 40bit default IPA space, partially due to its 32bit heritage (where the only choice is 40bit). However, there are implementations in the wild that have a cough much smaller cough IPA space, which leads to a misprogramming of VTCR_EL2, and a guest that is stuck on its first memory access if userspace dares to ask for the default IPA setting (which most VMMs do). Instead, blundly reject the creation of such VM, as we can't satisfy the requirements from userspace (with a one-off warning). Also clarify the boot warning, and document that the VM creation will fail when an unsupported IPA size is provided. Although this is an ABI change, it doesn't really change much for userspace: - the guest couldn't run before this change, but no error was returned. At least userspace knows what is happening. - a memory slot that was accepted because it did fit the default IPA space now doesn't even get a chance to be registered. The other thing that is left doing is to convince userspace to actually use the IPA space setting instead of relying on the antiquated default. Fixes: `233a7cb235` ("kvm: arm64: Allow tuning the physical address size for VM") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20210311100016.3830038-2-maz@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:36 +01:00
Suzuki K Poulose	eeba4e4cc5	KVM: arm64: nvhe: Save the SPE context early commit `b96b0c5de6` upstream. The nVHE KVM hyp drains and disables the SPE buffer, before entering the guest, as the EL1&0 translation regime is going to be loaded with that of the guest. But this operation is performed way too late, because : - The owning translation regime of the SPE buffer is transferred to EL2. (MDCR_EL2_E2PB == 0) - The guest Stage1 is loaded. Thus the flush could use the host EL1 virtual address, but use the EL2 translations instead of host EL1, for writing out any cached data. Fix this by moving the SPE buffer handling early enough. The restore path is doing the right thing. Fixes: `014c4c77aa` ("KVM: arm64: Improve debug register save/restore flow") Cc: stable@vger.kernel.org Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210302120345.3102874-1-suzuki.poulose@arm.com Message-Id: <20210305185254.3730990-2-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:36 +01:00
Will Deacon	a9779820bb	KVM: arm64: Avoid corrupting vCPU context register in guest exit commit `31948332d5` upstream. Commit `7db2153047` ("KVM: arm64: Restore hyp when panicking in guest context") tracks the currently running vCPU, clearing the pointer to NULL on exit from a guest. Unfortunately, the use of 'set_loaded_vcpu' clobbers x1 to point at the kvm_hyp_ctxt instead of the vCPU context, causing the subsequent RAS code to go off into the weeds when it saves the DISR assuming that the CPU context is embedded in a struct vCPU. Leave x1 alone and use x3 as a temporary register instead when clearing the vCPU on the guest exit path. Cc: Marc Zyngier <maz@kernel.org> Cc: Andrew Scull <ascull@google.com> Cc: <stable@vger.kernel.org> Fixes: `7db2153047` ("KVM: arm64: Restore hyp when panicking in guest context") Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210226181211.14542-1-will@kernel.org Message-Id: <20210305185254.3730990-3-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:36 +01:00
Jia He	17becbfca9	KVM: arm64: Fix range alignment when walking page tables commit `357ad203d4` upstream. When walking the page tables at a given level, and if the start address for the range isn't aligned for that level, we propagate the misalignment on each iteration at that level. This results in the walker ignoring a number of entries (depending on the original misalignment) on each subsequent iteration. Properly aligning the address before the next iteration addresses this issue. Cc: stable@vger.kernel.org Reported-by: Howard Zhang <Howard.Zhang@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Jia He <justin.he@arm.com> Fixes: `b1e57de62c` ("KVM: arm64: Add stand-alone page-table walker infrastructure") [maz: rewrite commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210303024225.2591-1-justin.he@arm.com Message-Id: <20210305185254.3730990-9-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:36 +01:00
Wanpeng Li	a688bf8cf5	KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged commit `d7eb79c629` upstream. # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 88 On-line CPU(s) list: 0-63 Off-line CPU(s) list: 64-87 # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-5.10.0-rc3-tlinux2-0050+ root=/dev/mapper/cl-root ro rd.lvm.lv=cl/root rhgb quiet console=ttyS0 LANG=en_US .UTF-8 no-kvmclock-vsyscall # echo 1 > /sys/devices/system/cpu/cpu76/online -bash: echo: write error: Cannot allocate memory The per-cpu vsyscall pvclock data pointer assigns either an element of the static array hv_clock_boot (#vCPU <= 64) or dynamically allocated memory hvclock_mem (vCPU > 64), the dynamically memory will not be allocated if kvmclock vsyscall is disabled, this can result in cpu hotpluged fails in kvmclock_setup_percpu() which returns -ENOMEM. It's broken for no-vsyscall and sometimes you end up with vsyscall disabled if the host does something strange. This patch fixes it by allocating this dynamically memory unconditionally even if vsyscall is disabled. Fixes: `6a1cac56f4` ("x86/kvm: Use __bss_decrypted attribute in shared variables") Reported-by: Zelin Deng <zelin.deng@linux.alibaba.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: stable@vger.kernel.org#v4.19-rc5+ Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1614130683-24137-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:36 +01:00
Sean Christopherson	4ab5d1b709	KVM: x86: Ensure deadline timer has truly expired before posting its IRQ commit `beda430177` upstream. When posting a deadline timer interrupt, open code the checks guarding __kvm_wait_lapic_expire() in order to skip the lapic_timer_int_injected() check in kvm_wait_lapic_expire(). The injection check will always fail since the interrupt has not yet be injected. Moving the call after injection would also be wrong as that wouldn't actually delay delivery of the IRQ if it is indeed sent via posted interrupt. Fixes: `010fd37fdd` ("KVM: LAPIC: Reduce world switch latency caused by timer_advance_ns") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305021808.3769732-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:36 +01:00
Andy Lutomirski	e40384fcd6	x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls commit `5d5675df79` upstream. On a 32-bit fast syscall that fails to read its arguments from user memory, the kernel currently does syscall exit work but not syscall entry work. This confuses audit and ptrace. For example: $ ./tools/testing/selftests/x86/syscall_arg_fault_32 ... strace: pid 264258: entering, ptrace_syscall_info.op == 2 ... This is a minimal fix intended for ease of backporting. A more complete cleanup is coming. Fixes: `0b085e68f4` ("x86/entry: Consolidate 32/64 bit syscall entry") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/8c82296ddf803b91f8d1e5eac89e5803ba54ab0e.1614884673.git.luto@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:36 +01:00
Joerg Roedel	a2bab396cb	x86/sev-es: Use __copy_from_user_inatomic() commit `bffe30dd9f` upstream. The #VC handler must run in atomic context and cannot sleep. This is a problem when it tries to fetch instruction bytes from user-space via copy_from_user(). Introduce a insn_fetch_from_user_inatomic() helper which uses __copy_from_user_inatomic() to safely copy the instruction bytes to kernel memory in the #VC handler. Fixes: `5e3427a7bc` ("x86/sev-es: Handle instruction fetches from user-space") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210303141716.29223-6-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:36 +01:00
Joerg Roedel	977b9f4190	x86/sev-es: Correctly track IRQ states in runtime #VC handler commit `62441a1fb5` upstream. Call irqentry_nmi_enter()/irqentry_nmi_exit() in the #VC handler to correctly track the IRQ state during its execution. Fixes: `0786138c78` ("x86/sev-es: Add a Runtime #VC Exception Handler") Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210303141716.29223-5-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:36 +01:00
Thomas Gleixner	2694244327	x86/entry: Move nmi entry/exit into common code commit `b6be002bcd` upstream. Lockdep state handling on NMI enter and exit is nothing specific to X86. It's not any different on other architectures. Also the extra state type is not necessary, irqentry_state_t can carry the necessary information as well. Move it to common code and extend irqentry_state_t to carry lockdep state. [ Ira: Make exit_rcu and lockdep a union as they are mutually exclusive between the IRQ and NMI exceptions, and add kernel documentation for struct irqentry_state_t ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201102205320.1458656-7-ira.weiny@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:36 +01:00
Joerg Roedel	752fbe0c8d	x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack commit `545ac14c16` upstream. The code in the NMI handler to adjust the #VC handler IST stack is needed in case an NMI hits when the #VC handler is still using its IST stack. But the check for this condition also needs to look if the regs->sp value is trusted, meaning it was not set by user-space. Extend the check to not use regs->sp when the NMI interrupted user-space code or the SYSCALL gap. Fixes: `315562c9af` ("x86/sev-es: Adjust #VC IST Stack on entering NMI handler") Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # 5.10+ Link: https://lkml.kernel.org/r/20210303141716.29223-3-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:35 +01:00
Joerg Roedel	871fd1e3ee	x86/sev-es: Introduce ip_within_syscall_gap() helper commit `78a81d88f6` upstream. Introduce a helper to check whether an exception came from the syscall gap and use it in the SEV-ES code. Extend the check to also cover the compatibility SYSCALL entry path. Fixes: `315562c9af` ("x86/sev-es: Adjust #VC IST Stack on entering NMI handler") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # 5.10+ Link: https://lkml.kernel.org/r/20210303141716.29223-2-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:35 +01:00
Josh Poimboeuf	d327d8632c	x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2 commit `e504e74cc3` upstream. KASAN reserves "redzone" areas between stack frames in order to detect stack overruns. A read or write to such an area triggers a KASAN "stack-out-of-bounds" BUG. Normally, the ORC unwinder stays in-bounds and doesn't access the redzone. But sometimes it can't find ORC metadata for a given instruction. This can happen for code which is missing ORC metadata, or for generated code. In such cases, the unwinder attempts to fall back to frame pointers, as a best-effort type thing. This fallback often works, but when it doesn't, the unwinder can get confused and go off into the weeds into the KASAN redzone, triggering the aforementioned KASAN BUG. But in this case, the unwinder's confusion is actually harmless and working as designed. It already has checks in place to prevent off-stack accesses, but those checks get short-circuited by the KASAN BUG. And a BUG is a lot more disruptive than a harmless unwinder warning. Disable the KASAN checks by using READ_ONCE_NOCHECK() for all stack accesses. This finishes the job started by commit `881125bfe6` ("x86/unwind: Disable KASAN checking in the ORC unwinder"), which only partially fixed the issue. Fixes: `ee9f8fce99` ("x86/unwind: Add the ORC unwinder") Reported-by: Ivan Babrou <ivan@cloudflare.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ivan Babrou <ivan@cloudflare.com> Cc: stable@kernel.org Link: https://lkml.kernel.org/r/9583327904ebbbeda399eca9c56d6c7085ac20fe.1612534649.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:35 +01:00
Lior Ribak	5ab9464a2a	binfmt_misc: fix possible deadlock in bm_register_write commit `e7850f4d84` upstream. There is a deadlock in bm_register_write: First, in the begining of the function, a lock is taken on the binfmt_misc root inode with inode_lock(d_inode(root)). Then, if the user used the MISC_FMT_OPEN_FILE flag, the function will call open_exec on the user-provided interpreter. open_exec will call a path lookup, and if the path lookup process includes the root of binfmt_misc, it will try to take a shared lock on its inode again, but it is already locked, and the code will get stuck in a deadlock To reproduce the bug: $ echo ":iiiii:E::ii::/proc/sys/fs/binfmt_misc/bla:F" > /proc/sys/fs/binfmt_misc/register backtrace of where the lock occurs (#5): 0 schedule () at ./arch/x86/include/asm/current.h:15 1 0xffffffff81b51237 in rwsem_down_read_slowpath (sem=0xffff888003b202e0, count=<optimized out>, state=state@entry=2) at kernel/locking/rwsem.c:992 2 0xffffffff81b5150a in __down_read_common (state=2, sem=<optimized out>) at kernel/locking/rwsem.c:1213 3 __down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1222 4 down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1355 5 0xffffffff811ee22a in inode_lock_shared (inode=<optimized out>) at ./include/linux/fs.h:783 6 open_last_lookups (op=0xffffc9000022fe34, file=0xffff888004098600, nd=0xffffc9000022fd10) at fs/namei.c:3177 7 path_openat (nd=nd@entry=0xffffc9000022fd10, op=op@entry=0xffffc9000022fe34, flags=flags@entry=65) at fs/namei.c:3366 8 0xffffffff811efe1c in do_filp_open (dfd=<optimized out>, pathname=pathname@entry=0xffff8880031b9000, op=op@entry=0xffffc9000022fe34) at fs/namei.c:3396 9 0xffffffff811e493f in do_open_execat (fd=fd@entry=-100, name=name@entry=0xffff8880031b9000, flags=<optimized out>, flags@entry=0) at fs/exec.c:913 10 0xffffffff811e4a92 in open_exec (name=<optimized out>) at fs/exec.c:948 11 0xffffffff8124aa84 in bm_register_write (file=<optimized out>, buffer=<optimized out>, count=19, ppos=<optimized out>) at fs/binfmt_misc.c:682 12 0xffffffff811decd2 in vfs_write (file=file@entry=0xffff888004098500, buf=buf@entry=0xa758d0 ":iiiii:E::ii::i:CF ", count=count@entry=19, pos=pos@entry=0xffffc9000022ff10) at fs/read_write.c:603 13 0xffffffff811defda in ksys_write (fd=<optimized out>, buf=0xa758d0 ":iiiii:E::ii::i:CF ", count=19) at fs/read_write.c:658 14 0xffffffff81b49813 in do_syscall_64 (nr=<optimized out>, regs=0xffffc9000022ff58) at arch/x86/entry/common.c:46 15 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120 To solve the issue, the open_exec call is moved to before the write lock is taken by bm_register_write Link: https://lkml.kernel.org/r/20210228224414.95962-1-liorribak@gmail.com Fixes: `948b701a60` ("binfmt_misc: add persistent opened binary handler for containers") Signed-off-by: Lior Ribak <liorribak@gmail.com> Acked-by: Helge Deller <deller@gmx.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:35 +01:00
Christophe Leroy	0e4750f69c	powerpc: Fix missing declaration of [en/dis]able_kernel_vsx() commit `bd73758803` upstream. Add stub instances of enable_kernel_vsx() and disable_kernel_vsx() when CONFIG_VSX is not set, to avoid following build failure. CC [M] drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o In file included from ./drivers/gpu/drm/amd/amdgpu/../display/dc/dm_services_types.h:29, from ./drivers/gpu/drm/amd/amdgpu/../display/dc/dm_services.h:37, from drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:27: drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c: In function 'dcn_bw_apply_registry_override': ./drivers/gpu/drm/amd/amdgpu/../display/dc/os_types.h:64:3: error: implicit declaration of function 'enable_kernel_vsx'; did you mean 'enable_kernel_fp'? [-Werror=implicit-function-declaration] 64 \| enable_kernel_vsx(); \ \| ^~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:640:2: note: in expansion of macro 'DC_FP_START' 640 \| DC_FP_START(); \| ^~~~~~~~~~~ ./drivers/gpu/drm/amd/amdgpu/../display/dc/os_types.h:75:3: error: implicit declaration of function 'disable_kernel_vsx'; did you mean 'disable_kernel_fp'? [-Werror=implicit-function-declaration] 75 \| disable_kernel_vsx(); \ \| ^~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:676:2: note: in expansion of macro 'DC_FP_END' 676 \| DC_FP_END(); \| ^~~~~~~~~ cc1: some warnings being treated as errors make[5]: *** [drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o] Error 1 This works because the caller is checking if VSX is available using cpu_has_feature(): #define DC_FP_START() { \ if (cpu_has_feature(CPU_FTR_VSX_COMP)) { \ preempt_disable(); \ enable_kernel_vsx(); \ } else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) { \ preempt_disable(); \ enable_kernel_altivec(); \ } else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) { \ preempt_disable(); \ enable_kernel_fp(); \ } \ When CONFIG_VSX is not selected, cpu_has_feature(CPU_FTR_VSX_COMP) constant folds to 'false' so the call to enable_kernel_vsx() is discarded and the build succeeds. Fixes: `16a9dea110` ("amdgpu: Enable initial DCN support on POWER") Cc: stable@vger.kernel.org # v5.6+ Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Incorporate some discussion comments into the change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8d7d285a027e9d21f5ff7f850fa71a2655b0c4af.1615279170.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:35 +01:00
Nicholas Piggin	1f372e8956	powerpc: Fix inverted SET_FULL_REGS bitop commit `73ac798818` upstream. This bit operation was inverted and set the low bit rather than cleared it, breaking the ability to ptrace non-volatile GPRs after exec. Fix. Only affects 64e and 32-bit. Fixes: `feb9df3462` ("powerpc/64s: Always has full regs, so remove remnant checks") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210308085530.3191843-1-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:35 +01:00
Naveen N. Rao	9776812ee8	powerpc/64s: Fix instruction encoding for lis in ppc_function_entry() commit `cea15316ce` upstream. 'lis r2,N' is 'addis r2,0,N' and the instruction encoding in the macro LIS_R2 is incorrect (it currently maps to 'addis r0,r2,N'). Fix the same. Fixes: `c71b7eff42` ("powerpc: Add ABIv2 support to ppc_function_entry") Cc: stable@vger.kernel.org # v3.16+ Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210304020411.16796-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:35 +01:00
Ard Biesheuvel	8571c66401	efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table commit `9e9888a0fe` upstream. The EFI_RT_PROPERTIES_TABLE contains a mask of runtime services that are available after ExitBootServices(). This mostly does not concern the EFI stub at all, given that it runs before that. However, there is one call that is made at runtime, which is the call to SetVirtualAddressMap() (which is not even callable at boot time to begin with) So add the missing handling of the RT_PROP table to ensure that we only call SetVirtualAddressMap() if it is not being advertised as unsupported by the firmware. Cc: <stable@vger.kernel.org> # v5.10+ Tested-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:35 +01:00
Mathieu Desnoyers	68b4378d91	sched/membarrier: fix missing local execution of ipi_sync_rq_state() commit `ce29ddc47b` upstream. The function sync_runqueues_membarrier_state() should copy the membarrier state from the @mm received as parameter to each runqueue currently running tasks using that mm. However, the use of smp_call_function_many() skips the current runqueue, which is unintended. Replace by a call to on_each_cpu_mask(). Fixes: `227a4aadc7` ("sched/membarrier: Fix p->mm->membarrier_state racy load") Reported-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org # 5.4.x+ Link: https://lore.kernel.org/r/74F1E842-4A84-47BF-B6C2-5407DFDD4A4A@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:35 +01:00
Arnd Bergmann	5f2f616343	linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP* commit `97e4910232` upstream. Separating compiler-clang.h from compiler-gcc.h inadventently dropped the definitions of the three HAVE_BUILTIN_BSWAP macros, which requires falling back to the open-coded version and hoping that the compiler detects it. Since all versions of clang support the __builtin_bswap interfaces, add back the flags and have the headers pick these up automatically. This results in a 4% improvement of compilation speed for arm defconfig. Note: it might also be worth revisiting which architectures set CONFIG_ARCH_USE_BUILTIN_BSWAP for one compiler or the other, today this is set on six architectures (arm32, csky, mips, powerpc, s390, x86), while another ten architectures define custom helpers (alpha, arc, ia64, m68k, mips, nios2, parisc, sh, sparc, xtensa), and the rest (arm64, h8300, hexagon, microblaze, nds32, openrisc, riscv) just get the unoptimized version and rely on the compiler to detect it. A long time ago, the compiler builtins were architecture specific, but nowadays, all compilers that are able to build the kernel have correct implementations of them, though some may not be as optimized as the inline asm versions. The patch that dropped the optimization landed in v4.19, so as discussed it would be fairly safe to backport this revert to stable kernels to the 4.19/5.4/5.10 stable kernels, but there is a remaining risk for regressions, and it has no known side-effects besides compile speed. Link: https://lkml.kernel.org/r/20210226161151.2629097-1-arnd@kernel.org Link: https://lore.kernel.org/lkml/20210225164513.3667778-1-arnd@kernel.org/ Fixes: `815f0ddb34` ("include/linux/compiler.h: make compiler-.h mutually exclusive") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Guo Ren <guoren@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Marco Elver <elver@google.com> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:35 +01:00
Minchan Kim	bc7c1b09f7	zram: fix return value on writeback_store commit `57e0076e65` upstream. writeback_store's return value is overwritten by submit_bio_wait's return value. Thus, writeback_store will return zero since there was no IO error. In the end, write syscall from userspace will see the zero as return value, which could make the process stall to keep trying the write until it will succeed. Link: https://lkml.kernel.org/r/20210312173949.2197662-1-minchan@kernel.org Fixes: 3b82a051c101("drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store") Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: John Dias <joaodias@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-17 17:06:34 +01:00
Matthew Wilcox (Oracle)	3cbe8f9193	include/linux/sched/mm.h: use rcu_dereference in in_vfork() [ Upstream commit `149fc78735` ] Fix a sparse warning by using rcu_dereference(). Technically this is a bug and a sufficiently aggressive compiler could reload the `real_parent' pointer outside the protection of the rcu lock (and access freed memory), but I think it's pretty unlikely to happen. Link: https://lkml.kernel.org/r/20210221194207.1351703-1-willy@infradead.org Fixes: `b18dc5f291` ("mm, oom: skip vforked tasks from being selected") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:34 +01:00
Arnd Bergmann	7da7542c04	stop_machine: mark helpers __always_inline [ Upstream commit `cbf78d8507` ] With clang-13, some functions only get partially inlined, with a specialized version referring to a global variable. This triggers a harmless build-time check for the intel-rng driver: WARNING: modpost: drivers/char/hw_random/intel-rng.o(.text+0xe): Section mismatch in reference from the function stop_machine() to the function .init.text:intel_rng_hw_init() The function stop_machine() references the function __init intel_rng_hw_init(). This is often because stop_machine lacks a __init annotation or the annotation of intel_rng_hw_init is wrong. In this instance, an easy workaround is to force the stop_machine() function to be inline, along with related interfaces that did not show the same behavior at the moment, but theoretically could. The combination of the two patches listed below triggers the behavior in clang-13, but individually these commits are correct. Link: https://lkml.kernel.org/r/20210225130153.1956990-1-arnd@kernel.org Fixes: `fe5595c074` ("stop_machine: Provide stop_machine_cpuslocked()") Fixes: `ee527cd3a2` ("Use stop_machine_run in the Intel RNG driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:34 +01:00
Peter Zijlstra	2a39eb7b86	seqlock,lockdep: Fix seqcount_latch_init() [ Upstream commit `4817a52b30` ] seqcount_init() must be a macro in order to preserve the static variable that is used for the lockdep key. Don't then wrap it in an inline function, which destroys that. Luckily there aren't many users of this function, but fix it before it becomes a problem. Fixes: `80793c3471` ("seqlock: Introduce seqcount_latch_t") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/YEeFEbNUVkZaXDp4@hirez.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:34 +01:00
Daniel Axtens	372734dc18	powerpc/64s/exception: Clean up a missed SRR specifier [ Upstream commit `c080a17330` ] Nick's patch cleaning up the SRR specifiers in exception-64s.S missed a single instance of EXC_HV_OR_STD. Clean that up. Caught by clang's integrated assembler. Fixes: `3f7fbd97d0` ("powerpc/64s/exception: Clean up SRR specifiers") Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210225031006.1204774-2-dja@axtens.net Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:34 +01:00
Anna-Maria Behnsen	df7dbfc24c	hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event() [ Upstream commit `46eb1701c0` ] hrtimer_force_reprogram() and hrtimer_interrupt() invokes __hrtimer_get_next_event() to find the earliest expiry time of hrtimer bases. __hrtimer_get_next_event() does not update cpu_base::[softirq_]_expires_next to preserve reprogramming logic. That needs to be done at the callsites. hrtimer_force_reprogram() updates cpu_base::softirq_expires_next only when the first expiring timer is a softirq timer and the soft interrupt is not activated. That's wrong because cpu_base::softirq_expires_next is left stale when the first expiring timer of all bases is a timer which expires in hard interrupt context. hrtimer_interrupt() does never update cpu_base::softirq_expires_next which is wrong too. That becomes a problem when clock_settime() sets CLOCK_REALTIME forward and the first soft expiring timer is in the CLOCK_REALTIME_SOFT base. Setting CLOCK_REALTIME forward moves the clock MONOTONIC based expiry time of that timer before the stale cpu_base::softirq_expires_next. cpu_base::softirq_expires_next is cached to make the check for raising the soft interrupt fast. In the above case the soft interrupt won't be raised until clock monotonic reaches the stale cpu_base::softirq_expires_next value. That's incorrect, but what's worse it that if the softirq timer becomes the first expiring timer of all clock bases after the hard expiry timer has been handled the reprogramming of the clockevent from hrtimer_interrupt() will result in an interrupt storm. That happens because the reprogramming does not use cpu_base::softirq_expires_next, it uses __hrtimer_get_next_event() which returns the actual expiry time. Once clock MONOTONIC reaches cpu_base::softirq_expires_next the soft interrupt is raised and the storm subsides. Change the logic in hrtimer_force_reprogram() to evaluate the soft and hard bases seperately, update softirq_expires_next and handle the case when a soft expiring timer is the first of all bases by comparing the expiry times and updating the required cpu base fields. Split this functionality into a separate function to be able to use it in hrtimer_interrupt() as well without copy paste. Fixes: `5da7016046` ("hrtimer: Implement support for softirq based hrtimers") Reported-by: Mikael Beckius <mikael.beckius@windriver.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mikael Beckius <mikael.beckius@windriver.com> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210223160240.27518-1-anna-maria@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:34 +01:00
Kan Liang	896846b815	perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR [ Upstream commit `afbef30149` ] To supply a PID/TID for large PEBS, it requires flushing the PEBS buffer in a context switch. For normal LBRs, a context switch can flip the address space and LBR entries are not tagged with an identifier, we need to wipe the LBR, even for per-cpu events. For LBR callstack, save/restore the stack is required during a context switch. Set PERF_ATTACH_SCHED_CB for the event with large PEBS & LBR. Fixes: `9c964efa43` ("perf/x86/intel: Drain the PEBS buffer during context switches") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20201130193842.10569-2-kan.liang@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:34 +01:00
Kan Liang	82ad50c112	perf/core: Flush PMU internal buffers for per-CPU events [ Upstream commit `a5398bffc0` ] Sometimes the PMU internal buffers have to be flushed for per-CPU events during a context switch, e.g., large PEBS. Otherwise, the perf tool may report samples in locations that do not belong to the process where the samples are processed in, because PEBS does not tag samples with PID/TID. The current code only flush the buffers for a per-task event. It doesn't check a per-CPU event. Add a new event state flag, PERF_ATTACH_SCHED_CB, to indicate that the PMU internal buffers have to be flushed for this event during a context switch. Add sched_cb_entry and perf_sched_cb_usages back to track the PMU/cpuctx which is required to be flushed. Only need to invoke the sched_task() for per-CPU events in this patch. The per-task events have been handled in perf_event_context_sched_in/out already. Fixes: `9c964efa43` ("perf/x86/intel: Drain the PEBS buffer during context switches") Reported-by: Gabriel Marin <gmx@google.com> Originally-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20201130193842.10569-1-kan.liang@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:34 +01:00
Ard Biesheuvel	3ebd4bd2eb	arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds [ Upstream commit `7ba8f2b2d6` ] 52-bit VA kernels can run on hardware that is only 48-bit capable, but configure the ID map as 52-bit by default. This was not a problem until recently, because the special T0SZ value for a 52-bit VA space was never programmed into the TCR register anwyay, and because a 52-bit ID map happens to use the same number of translation levels as a 48-bit one. This behavior was changed by commit `1401bef703` ("arm64: mm: Always update TCR_EL1 from __cpu_set_tcr_t0sz()"), which causes the unsupported T0SZ value for a 52-bit VA to be programmed into TCR_EL1. While some hardware simply ignores this, Mark reports that Amberwing systems choke on this, resulting in a broken boot. But even before that commit, the unsupported idmap_t0sz value was exposed to KVM and used to program TCR_EL2 incorrectly as well. Given that we already have to deal with address spaces being either 48-bit or 52-bit in size, the cleanest approach seems to be to simply default to a 48-bit VA ID map, and only switch to a 52-bit one if the placement of the kernel in DRAM requires it. This is guaranteed not to happen unless the system is actually 52-bit VA capable. Fixes: `90ec95cda9` ("arm64: mm: Introduce VA_BITS_MIN") Reported-by: Mark Salter <msalter@redhat.com> Link: http://lore.kernel.org/r/20210310003216.410037-1-msalter@redhat.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20210310171515.416643-2-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:34 +01:00
Daiyue Zhang	109720342e	configfs: fix a use-after-free in __configfs_open_file [ Upstream commit `14fbbc8297` ] Commit `b0841eefd9` ("configfs: provide exclusion between IO and removals") uses ->frag_dead to mark the fragment state, thus no bothering with extra refcount on config_item when opening a file. The configfs_get_config_item was removed in __configfs_open_file, but not with config_item_put. So the refcount on config_item will lost its balance, causing use-after-free issues in some occasions like this: Test: 1. Mount configfs on /config with read-only items: drwxrwx--- 289 root root 0 2021-04-01 11:55 /config drwxr-xr-x 2 root root 0 2021-04-01 11:54 /config/a --w--w--w- 1 root root 4096 2021-04-01 11:53 /config/a/1.txt ...... 2. Then run: for file in /config do echo $file grep -R 'key' $file done 3. __configfs_open_file will be called in parallel, the first one got called will do: if (file->f_mode & FMODE_READ) { if (!(inode->i_mode & S_IRUGO)) goto out_put_module; config_item_put(buffer->item); kref_put() package_details_release() kfree() the other one will run into use-after-free issues like this: BUG: KASAN: use-after-free in __configfs_open_file+0x1bc/0x3b0 Read of size 8 at addr fffffff155f02480 by task grep/13096 CPU: 0 PID: 13096 Comm: grep VIP: 00 Tainted: G W 4.14.116-kasan #1 TGID: 13096 Comm: grep Call trace: dump_stack+0x118/0x160 kasan_report+0x22c/0x294 __asan_load8+0x80/0x88 __configfs_open_file+0x1bc/0x3b0 configfs_open_file+0x28/0x34 do_dentry_open+0x2cc/0x5c0 vfs_open+0x80/0xe0 path_openat+0xd8c/0x2988 do_filp_open+0x1c4/0x2fc do_sys_open+0x23c/0x404 SyS_openat+0x38/0x48 Allocated by task 2138: kasan_kmalloc+0xe0/0x1ac kmem_cache_alloc_trace+0x334/0x394 packages_make_item+0x4c/0x180 configfs_mkdir+0x358/0x740 vfs_mkdir2+0x1bc/0x2e8 SyS_mkdirat+0x154/0x23c el0_svc_naked+0x34/0x38 Freed by task 13096: kasan_slab_free+0xb8/0x194 kfree+0x13c/0x910 package_details_release+0x524/0x56c kref_put+0xc4/0x104 config_item_put+0x24/0x34 __configfs_open_file+0x35c/0x3b0 configfs_open_file+0x28/0x34 do_dentry_open+0x2cc/0x5c0 vfs_open+0x80/0xe0 path_openat+0xd8c/0x2988 do_filp_open+0x1c4/0x2fc do_sys_open+0x23c/0x404 SyS_openat+0x38/0x48 el0_svc_naked+0x34/0x38 To fix this issue, remove the config_item_put in __configfs_open_file to balance the refcount of config_item. Fixes: `b0841eefd9` ("configfs: provide exclusion between IO and removals") Signed-off-by: Daiyue Zhang <zhangdaiyue1@huawei.com> Signed-off-by: Yi Chen <chenyi77@huawei.com> Signed-off-by: Ge Qiu <qiuge@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:34 +01:00
James Smart	6cf11f3a09	nvme-fc: fix racing controller reset and create association [ Upstream commit `f20ef34d71` ] Recent patch to prevent calling __nvme_fc_abort_outstanding_ios in interrupt context results in a possible race condition. A controller reset results in errored io completions, which schedules error work. The change of error work to a work element allows it to fire after the ctrl state transition to NVME_CTRL_CONNECTING, causing any outstanding io (used to initialize the controller) to fail and cause problems for connect_work. Add a state check to only schedule error work if not in the RESETTING state. Fixes: `19fce0470f` ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context") Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:34 +01:00
Jia-Ju Bai	d1d918492e	block: rsxx: fix error return code of rsxx_pci_probe() [ Upstream commit `df66617bfe` ] When create_singlethread_workqueue returns NULL to card->event_wq, no error return code of rsxx_pci_probe() is assigned. To fix this bug, st is assigned with -ENOMEM in this case. Fixes: `8722ff8cdb` ("block: IBM RamSan 70/80 device driver") Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Link: https://lore.kernel.org/r/20210310033017.4023-1-baijiaju1990@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:33 +01:00
Ondrej Mosnacek	caa86901c8	NFSv4.2: fix return value of _nfs4_get_security_label() [ Upstream commit `53cb245454` ] An xattr 'get' handler is expected to return the length of the value on success, yet _nfs4_get_security_label() (and consequently also nfs4_xattr_get_nfs4_label(), which is used as an xattr handler) returns just 0 on success. Fix this by returning label.len instead, which contains the length of the result. Fixes: `aa9c266962` ("NFS: Client implementation of Labeled-NFS") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Reviewed-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:33 +01:00
Trond Myklebust	e181960ec5	NFS: Don't gratuitously clear the inode cache when lookup failed [ Upstream commit `47397915ed` ] The fact that the lookup revalidation failed, does not mean that the inode contents have changed. Fixes: `5ceb9d7fda` ("NFS: Refactor nfs_lookup_revalidate()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:33 +01:00
Trond Myklebust	dd756d05be	NFS: Don't revalidate the directory permissions on a lookup failure [ Upstream commit `82e7ca1334` ] There should be no reason to expect the directory permissions to change just because the directory contents changed or a negative lookup timed out. So let's avoid doing a full call to nfs_mark_for_revalidate() in that case. Furthermore, if this is a negative dentry, and we haven't actually done a new lookup, then we have no reason yet to believe the directory has changed at all. So let's remove the gratuitous directory inode invalidation altogether when called from nfs_lookup_revalidate_negative(). Reported-by: Geert Jansen <gerardu@amazon.com> Fixes: `5ceb9d7fda` ("NFS: Refactor nfs_lookup_revalidate()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:33 +01:00
Benjamin Coddington	faa48b23d0	SUNRPC: Set memalloc_nofs_save() for sync tasks [ Upstream commit `f0940f4b32` ] We could recurse into NFS doing memory reclaim while sending a sync task, which might result in a deadlock. Set memalloc_nofs_save for sync task execution. Fixes: `a1231fda7e` ("SUNRPC: Set memalloc_nofs_save() on all rpciod/xprtiod jobs") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:33 +01:00
Anshuman Khandual	475a4307c1	arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory [ Upstream commit `eeb0753ba2` ] pfn_valid() validates a pfn but basically it checks for a valid struct page backing for that pfn. It should always return positive for memory ranges backed with struct page mapping. But currently pfn_valid() fails for all ZONE_DEVICE based memory types even though they have struct page mapping. pfn_valid() asserts that there is a memblock entry for a given pfn without MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is that they do not have memblock entries. Hence memblock_is_map_memory() will invariably fail via memblock_search() for a ZONE_DEVICE based address. This eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged into the system via memremap_pages() called from a driver, their respective memory sections will not have SECTION_IS_EARLY set. Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set for firmware reserved memory regions. memblock_is_map_memory() can just be skipped as its always going to be positive and that will be an optimization for the normal hotplug memory. Like ZONE_DEVICE based memory, all normal hotplugged memory too will not have SECTION_IS_EARLY set for their sections Skipping memblock_is_map_memory() for all non early memory sections would fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its performance for normal hotplug memory as well. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Acked-by: David Hildenbrand <david@redhat.com> Fixes: `73b20c84d4` ("arm64: mm: implement pte_devmap support") Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/1614921898-4099-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:33 +01:00
Wei Yongjun	e50ada5894	cpufreq: qcom-hw: Fix return value check in qcom_cpufreq_hw_cpu_init() [ Upstream commit `536eb97abe` ] In case of error, the function ioremap() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Fixes: `67fc209b52` ("cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:33 +01:00
Shawn Guo	7dfe37e9ea	cpufreq: qcom-hw: fix dereferencing freed memory 'data' [ Upstream commit `02fc409540` ] Commit `67fc209b52` ("cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks") introduces an issue of dereferencing freed memory 'data'. Fix it. Fixes: `67fc209b52` ("cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-03-17 17:06:33 +01:00

1 2 3 4 5 ...

975349 Commits