Commit Graph

2141 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
f0a557399c Merge 5.10.101 into android13-5.10
Changes in 5.10.101
	integrity: check the return value of audit_log_start()
	ima: Remove ima_policy file before directory
	ima: Allow template selection with ima_template[_fmt]= after ima_hash=
	ima: Do not print policy rule with inactive LSM labels
	mmc: sdhci-of-esdhc: Check for error num after setting mask
	can: isotp: fix potential CAN frame reception race in isotp_rcv()
	net: phy: marvell: Fix RGMII Tx/Rx delays setting in 88e1121-compatible PHYs
	net: phy: marvell: Fix MDI-x polarity setting in 88e1118-compatible PHYs
	NFS: Fix initialisation of nfs_client cl_flags field
	NFSD: Clamp WRITE offsets
	NFSD: Fix offset type in I/O trace points
	drm/amdgpu: Set a suitable dev_info.gart_page_size
	tracing: Propagate is_signed to expression
	NFS: change nfs_access_get_cached to only report the mask
	NFSv4 only print the label when its queried
	nfs: nfs4clinet: check the return value of kstrdup()
	NFSv4.1: Fix uninitialised variable in devicenotify
	NFSv4 remove zero number of fs_locations entries error check
	NFSv4 expose nfs_parse_server_name function
	NFSv4 handle port presence in fs_location server string
	x86/perf: Avoid warning for Arch LBR without XSAVE
	drm: panel-orientation-quirks: Add quirk for the 1Netbook OneXPlayer
	net: sched: Clarify error message when qdisc kind is unknown
	powerpc/fixmap: Fix VM debug warning on unmap
	scsi: target: iscsi: Make sure the np under each tpg is unique
	scsi: ufs: ufshcd-pltfrm: Check the return value of devm_kstrdup()
	scsi: qedf: Add stag_work to all the vports
	scsi: qedf: Fix refcount issue when LOGO is received during TMF
	scsi: pm8001: Fix bogus FW crash for maxcpus=1
	scsi: ufs: Treat link loss as fatal error
	scsi: myrs: Fix crash in error case
	PM: hibernate: Remove register_nosave_region_late()
	usb: dwc2: gadget: don't try to disable ep0 in dwc2_hsotg_suspend
	perf: Always wake the parent event
	nvme-pci: add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs
	net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()
	KVM: eventfd: Fix false positive RCU usage warning
	KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
	KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS
	KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode
	KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow
	riscv: fix build with binutils 2.38
	ARM: dts: imx23-evk: Remove MX23_PAD_SSP1_DETECT from hog group
	ARM: dts: Fix boot regression on Skomer
	ARM: socfpga: fix missing RESET_CONTROLLER
	nvme-tcp: fix bogus request completion when failing to send AER
	ACPI/IORT: Check node revision for PMCG resources
	PM: s2idle: ACPI: Fix wakeup interrupts handling
	drm/rockchip: vop: Correct RK3399 VOP register fields
	ARM: dts: Fix timer regression for beagleboard revision c
	ARM: dts: meson: Fix the UART compatible strings
	ARM: dts: meson8: Fix the UART device-tree schema validation
	ARM: dts: meson8b: Fix the UART device-tree schema validation
	staging: fbtft: Fix error path in fbtft_driver_module_init()
	ARM: dts: imx6qdl-udoo: Properly describe the SD card detect
	phy: xilinx: zynqmp: Fix bus width setting for SGMII
	ARM: dts: imx7ulp: Fix 'assigned-clocks-parents' typo
	usb: f_fs: Fix use-after-free for epfile
	gpio: aggregator: Fix calling into sleeping GPIO controllers
	drm/vc4: hdmi: Allow DBLCLK modes even if horz timing is odd.
	misc: fastrpc: avoid double fput() on failed usercopy
	netfilter: ctnetlink: disable helper autoassign
	arm64: dts: meson-g12b-odroid-n2: fix typo 'dio2133'
	ixgbevf: Require large buffers for build_skb on 82599VF
	drm/panel: simple: Assign data from panel_dpi_probe() correctly
	ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE
	gpio: sifive: use the correct register to read output values
	bonding: pair enable_port with slave_arr_updates
	net: dsa: mv88e6xxx: don't use devres for mdiobus
	net: dsa: ar9331: register the mdiobus under devres
	net: dsa: bcm_sf2: don't use devres for mdiobus
	net: dsa: felix: don't use devres for mdiobus
	net: dsa: lantiq_gswip: don't use devres for mdiobus
	ipmr,ip6mr: acquire RTNL before calling ip[6]mr_free_table() on failure path
	nfp: flower: fix ida_idx not being released
	net: do not keep the dst cache when uncloning an skb dst and its metadata
	net: fix a memleak when uncloning an skb dst and its metadata
	veth: fix races around rq->rx_notify_masked
	net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE
	tipc: rate limit warning for received illegal binding update
	net: amd-xgbe: disable interrupts during pci removal
	dpaa2-eth: unregister the netdev before disconnecting from the PHY
	ice: fix an error code in ice_cfg_phy_fec()
	ice: fix IPIP and SIT TSO offload
	net: mscc: ocelot: fix mutex lock error during ethtool stats read
	net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
	vt_ioctl: fix array_index_nospec in vt_setactivate
	vt_ioctl: add array_index_nospec to VT_ACTIVATE
	n_tty: wake up poll(POLLRDNORM) on receiving data
	eeprom: ee1004: limit i2c reads to I2C_SMBUS_BLOCK_MAX
	usb: dwc2: drd: fix soft connect when gadget is unconfigured
	Revert "usb: dwc2: drd: fix soft connect when gadget is unconfigured"
	net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup
	usb: ulpi: Move of_node_put to ulpi_dev_release
	usb: ulpi: Call of_node_put correctly
	usb: dwc3: gadget: Prevent core from processing stale TRBs
	usb: gadget: udc: renesas_usb3: Fix host to USB_ROLE_NONE transition
	USB: gadget: validate interface OS descriptor requests
	usb: gadget: rndis: check size of RNDIS_MSG_SET command
	usb: gadget: f_uac2: Define specific wTerminalType
	usb: raw-gadget: fix handling of dual-direction-capable endpoints
	USB: serial: ftdi_sio: add support for Brainboxes US-159/235/320
	USB: serial: option: add ZTE MF286D modem
	USB: serial: ch341: add support for GW Instek USB2.0-Serial devices
	USB: serial: cp210x: add NCR Retail IO box id
	USB: serial: cp210x: add CPI Bulk Coin Recycler id
	speakup-dectlk: Restore pitch setting
	phy: ti: Fix missing sentinel for clk_div_table
	hwmon: (dell-smm) Speed up setting of fan speed
	Makefile.extrawarn: Move -Wunaligned-access to W=1
	can: isotp: fix error path in isotp_sendmsg() to unlock wait queue
	scsi: lpfc: Remove NVMe support if kernel has NVME_FC disabled
	scsi: lpfc: Reduce log messages seen after firmware download
	arm64: dts: imx8mq: fix lcdif port node
	perf: Fix list corruption in perf_cgroup_switch()
	iommu: Fix potential use-after-free during probe
	Linux 5.10.101

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6105dcbfc0c7f1373020d378d2048e692dc502ab
2022-02-16 15:14:37 +01:00
Hou Wenlong
dc129275a7 KVM: eventfd: Fix false positive RCU usage warning
[ Upstream commit 6a0c61703e ]

Fix the following false positive warning:
 =============================
 WARNING: suspicious RCU usage
 5.16.0-rc4+ #57 Not tainted
 -----------------------------
 arch/x86/kvm/../../../virt/kvm/eventfd.c:484 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 3 locks held by fc_vcpu 0/330:
  #0: ffff8884835fc0b0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x88/0x6f0 [kvm]
  #1: ffffc90004c0bb68 (&kvm->srcu){....}-{0:0}, at: vcpu_enter_guest+0x600/0x1860 [kvm]
  #2: ffffc90004c0c1d0 (&kvm->irq_srcu){....}-{0:0}, at: kvm_notify_acked_irq+0x36/0x180 [kvm]

 stack backtrace:
 CPU: 26 PID: 330 Comm: fc_vcpu 0 Not tainted 5.16.0-rc4+
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x44/0x57
  kvm_notify_acked_gsi+0x6b/0x70 [kvm]
  kvm_notify_acked_irq+0x8d/0x180 [kvm]
  kvm_ioapic_update_eoi+0x92/0x240 [kvm]
  kvm_apic_set_eoi_accelerated+0x2a/0xe0 [kvm]
  handle_apic_eoi_induced+0x3d/0x60 [kvm_intel]
  vmx_handle_exit+0x19c/0x6a0 [kvm_intel]
  vcpu_enter_guest+0x66e/0x1860 [kvm]
  kvm_arch_vcpu_ioctl_run+0x438/0x7f0 [kvm]
  kvm_vcpu_ioctl+0x38a/0x6f0 [kvm]
  __x64_sys_ioctl+0x89/0xc0
  do_syscall_64+0x3a/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Since kvm_unregister_irq_ack_notifier() does synchronize_srcu(&kvm->irq_srcu),
kvm->irq_ack_notifier_list is protected by kvm->irq_srcu. In fact,
kvm->irq_srcu SRCU read lock is held in kvm_notify_acked_irq(), making it
a false positive warning. So use hlist_for_each_entry_srcu() instead of
hlist_for_each_entry_rcu().

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
Message-Id: <f98bac4f5052bad2c26df9ad50f7019e40434512.1643265976.git.houwenlong.hwl@antgroup.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16 12:54:20 +01:00
Greg Kroah-Hartman
7859e08c9c Merge 5.10.96 into android13-5.10
Changes in 5.10.96
	Bluetooth: refactor malicious adv data check
	media: venus: core: Drop second v4l2 device unregister
	net: sfp: ignore disabled SFP node
	net: stmmac: skip only stmmac_ptp_register when resume from suspend
	s390/module: fix loading modules with a lot of relocations
	s390/hypfs: include z/VM guests with access control group set
	bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
	scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
	udf: Restore i_lenAlloc when inode expansion fails
	udf: Fix NULL ptr deref when converting from inline format
	efi: runtime: avoid EFIv2 runtime services on Apple x86 machines
	PM: wakeup: simplify the output logic of pm_show_wakelocks()
	tracing/histogram: Fix a potential memory leak for kstrdup()
	tracing: Don't inc err_log entry count if entry allocation fails
	ceph: properly put ceph_string reference after async create attempt
	ceph: set pool_ns in new inode layout for async creates
	fsnotify: fix fsnotify hooks in pseudo filesystems
	Revert "KVM: SVM: avoid infinite loop on NPF from bad address"
	perf/x86/intel/uncore: Fix CAS_COUNT_WRITE issue for ICX
	drm/etnaviv: relax submit size limits
	KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS
	arm64: errata: Fix exec handling in erratum 1418040 workaround
	netfilter: nft_payload: do not update layer 4 checksum when mangling fragments
	serial: 8250: of: Fix mapped region size when using reg-offset property
	serial: stm32: fix software flow control transfer
	tty: n_gsm: fix SW flow control encoding/handling
	tty: Add support for Brainboxes UC cards.
	usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
	usb: xhci-plat: fix crash when suspend if remote wake enable
	usb: common: ulpi: Fix crash in ulpi_match()
	usb: gadget: f_sourcesink: Fix isoc transfer for USB_SPEED_SUPER_PLUS
	USB: core: Fix hang in usb_kill_urb by adding memory barriers
	usb: typec: tcpm: Do not disconnect while receiving VBUS off
	ucsi_ccg: Check DEV_INT bit only when starting CCG4
	jbd2: export jbd2_journal_[grab|put]_journal_head
	ocfs2: fix a deadlock when commit trans
	sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask
	x86/MCE/AMD: Allow thresholding interface updates after init
	powerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATs
	powerpc/32s: Fix kasan_init_region() for KASAN
	powerpc/32: Fix boot failure with GCC latent entropy plugin
	i40e: Increase delay to 1 s after global EMP reset
	i40e: Fix issue when maximum queues is exceeded
	i40e: Fix queues reservation for XDP
	i40e: Fix for failed to init adminq while VF reset
	i40e: fix unsigned stat widths
	usb: roles: fix include/linux/usb/role.h compile issue
	rpmsg: char: Fix race between the release of rpmsg_ctrldev and cdev
	rpmsg: char: Fix race between the release of rpmsg_eptdev and cdev
	scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()
	ipv6_tunnel: Rate limit warning messages
	net: fix information leakage in /proc/net/ptype
	hwmon: (lm90) Mark alert as broken for MAX6646/6647/6649
	hwmon: (lm90) Mark alert as broken for MAX6680
	ping: fix the sk_bound_dev_if match in ping_lookup
	ipv4: avoid using shared IP generator for connected sockets
	hwmon: (lm90) Reduce maximum conversion rate for G781
	NFSv4: Handle case where the lookup of a directory fails
	NFSv4: nfs_atomic_open() can race when looking up a non-regular file
	net-procfs: show net devices bound packet types
	drm/msm: Fix wrong size calculation
	drm/msm/dsi: Fix missing put_device() call in dsi_get_phy
	drm/msm/dsi: invalid parameter check in msm_dsi_phy_enable
	ipv6: annotate accesses to fn->fn_sernum
	NFS: Ensure the server has an up to date ctime before hardlinking
	NFS: Ensure the server has an up to date ctime before renaming
	powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06
	netfilter: conntrack: don't increment invalid counter on NF_REPEAT
	kernel: delete repeated words in comments
	perf: Fix perf_event_read_local() time
	sched/pelt: Relax the sync of util_sum with util_avg
	net: phy: broadcom: hook up soft_reset for BCM54616S
	phylib: fix potential use-after-free
	octeontx2-pf: Forward error codes to VF
	rxrpc: Adjust retransmission backoff
	efi/libstub: arm64: Fix image check alignment at entry
	hwmon: (lm90) Mark alert as broken for MAX6654
	powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending
	net: ipv4: Move ip_options_fragment() out of loop
	net: ipv4: Fix the warning for dereference
	ipv4: fix ip option filtering for locally generated fragments
	ibmvnic: init ->running_cap_crqs early
	ibmvnic: don't spin in tasklet
	video: hyperv_fb: Fix validation of screen resolution
	drm/msm/hdmi: Fix missing put_device() call in msm_hdmi_get_phy
	drm/msm/dpu: invalid parameter check in dpu_setup_dspp_pcc
	yam: fix a memory leak in yam_siocdevprivate()
	net: cpsw: Properly initialise struct page_pool_params
	net: hns3: handle empty unknown interrupt for VF
	Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
	net: bridge: vlan: fix single net device option dumping
	ipv4: raw: lock the socket in raw_bind()
	ipv4: tcp: send zero IPID in SYNACK messages
	ipv4: remove sparse error in ip_neigh_gw4()
	net: bridge: vlan: fix memory leak in __allowed_ingress
	dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config
	usr/include/Makefile: add linux/nfc.h to the compile-test coverage
	fsnotify: invalidate dcache before IN_DELETE event
	block: Fix wrong offset in bio_truncate()
	mtd: rawnand: mpc5121: Remove unused variable in ads5121_select_chip()
	Linux 5.10.96

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3bb702bbb2f32488f2ba15d40ed00982e028d7cb
2022-02-02 09:32:52 +01:00
Sean Christopherson
a2c8e1d9e4 Revert "KVM: SVM: avoid infinite loop on NPF from bad address"
commit 31c2558569 upstream.

Revert a completely broken check on an "invalid" RIP in SVM's workaround
for the DecodeAssists SMAP errata.  kvm_vcpu_gfn_to_memslot() obviously
expects a gfn, i.e. operates in the guest physical address space, whereas
RIP is a virtual (not even linear) address.  The "fix" worked for the
problematic KVM selftest because the test identity mapped RIP.

Fully revert the hack instead of trying to translate RIP to a GPA, as the
non-SEV case is now handled earlier, and KVM cannot access guest page
tables to translate RIP.

This reverts commit e72436bc3a.

Fixes: e72436bc3a ("KVM: SVM: avoid infinite loop on NPF from bad address")
Reported-by: Liam Merwick <liam.merwick@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-01 17:25:40 +01:00
Greg Kroah-Hartman
b206ff9364 Merge 5.10.88 into android13-5.10
Changes in 5.10.88
	KVM: selftests: Make sure kvm_create_max_vcpus test won't hit RLIMIT_NOFILE
	KVM: downgrade two BUG_ONs to WARN_ON_ONCE
	mac80211: fix regression in SSN handling of addba tx
	mac80211: mark TX-during-stop for TX in in_reconfig
	mac80211: send ADDBA requests using the tid/queue of the aggregation session
	mac80211: validate extended element ID is present
	firmware: arm_scpi: Fix string overflow in SCPI genpd driver
	bpf: Fix signed bounds propagation after mov32
	bpf: Make 32->64 bounds propagation slightly more robust
	bpf, selftests: Add test case trying to taint map value pointer
	virtio_ring: Fix querying of maximum DMA mapping size for virtio device
	vdpa: check that offsets are within bounds
	recordmcount.pl: look for jgnop instruction as well as bcrl on s390
	dm btree remove: fix use after free in rebalance_children()
	audit: improve robustness of the audit queue handling
	arm64: dts: imx8m: correct assigned clocks for FEC
	arm64: dts: imx8mp-evk: Improve the Ethernet PHY description
	arm64: dts: rockchip: remove mmc-hs400-enhanced-strobe from rk3399-khadas-edge
	arm64: dts: rockchip: fix rk3308-roc-cc vcc-sd supply
	arm64: dts: rockchip: fix rk3399-leez-p710 vcc3v3-lan supply
	arm64: dts: rockchip: fix audio-supply for Rock Pi 4
	mac80211: track only QoS data frames for admission control
	tee: amdtee: fix an IS_ERR() vs NULL bug
	ceph: fix duplicate increment of opened_inodes metric
	ceph: initialize pathlen variable in reconnect_caps_cb
	ARM: socfpga: dts: fix qspi node compatible
	clk: Don't parent clks until the parent is fully registered
	soc: imx: Register SoC device only on i.MX boards
	virtio/vsock: fix the transport to work with VMADDR_CID_ANY
	selftests: net: Correct ping6 expected rc from 2 to 1
	s390/kexec_file: fix error handling when applying relocations
	sch_cake: do not call cake_destroy() from cake_init()
	inet_diag: fix kernel-infoleak for UDP sockets
	net: hns3: fix use-after-free bug in hclgevf_send_mbx_msg
	selftests: Add duplicate config only for MD5 VRF tests
	selftests: Fix raw socket bind tests with VRF
	selftests: Fix IPv6 address bind tests
	dmaengine: st_fdma: fix MODULE_ALIAS
	net/sched: sch_ets: don't remove idle classes from the round-robin list
	selftest/net/forwarding: declare NETIFS p9 p10
	drm/ast: potential dereference of null pointer
	mac80211: agg-tx: don't schedule_and_wake_txq() under sta->lock
	mac80211: fix lookup when adding AddBA extension element
	flow_offload: return EOPNOTSUPP for the unsupported mpls action type
	rds: memory leak in __rds_conn_create()
	drm/amd/pm: fix a potential gpu_metrics_table memory leak
	mptcp: clear 'kern' flag from fallback sockets
	soc/tegra: fuse: Fix bitwise vs. logical OR warning
	igb: Fix removal of unicast MAC filters of VFs
	igbvf: fix double free in `igbvf_probe`
	igc: Fix typo in i225 LTR functions
	ixgbe: Document how to enable NBASE-T support
	ixgbe: set X550 MDIO speed before talking to PHY
	netdevsim: Zero-initialize memory for new map's value in function nsim_bpf_map_alloc
	net/packet: rx_owner_map depends on pg_vec
	sfc_ef100: potential dereference of null pointer
	net: Fix double 0x prefix print in SKB dump
	net/smc: Prevent smc_release() from long blocking
	net: systemport: Add global locking for descriptor lifecycle
	sit: do not call ipip6_dev_free() from sit_init_net()
	bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
	powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
	USB: gadget: bRequestType is a bitfield, not a enum
	Revert "usb: early: convert to readl_poll_timeout_atomic()"
	KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES
	tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous
	USB: NO_LPM quirk Lenovo USB-C to Ethernet Adapher(RTL8153-04)
	usb: dwc2: fix STM ID/VBUS detection startup delay in dwc2_driver_probe
	PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
	PCI/MSI: Mask MSI-X vectors only on success
	usb: xhci: Extend support for runtime power management for AMD's Yellow carp.
	USB: serial: cp210x: fix CP2105 GPIO registration
	USB: serial: option: add Telit FN990 compositions
	btrfs: fix memory leak in __add_inode_ref()
	btrfs: fix double free of anon_dev after failure to create subvolume
	zonefs: add MODULE_ALIAS_FS
	iocost: Fix divide-by-zero on donation from low hweight cgroup
	serial: 8250_fintek: Fix garbled text for console
	timekeeping: Really make sure wall_to_monotonic isn't positive
	libata: if T_LENGTH is zero, dma direction should be DMA_NONE
	drm/amdgpu: correct register access for RLC_JUMP_TABLE_RESTORE
	Input: touchscreen - avoid bitwise vs logical OR warning
	ARM: dts: imx6ull-pinfunc: Fix CSI_DATA07__ESAI_TX0 pad name
	xsk: Do not sleep in poll() when need_wakeup set
	media: mxl111sf: change mutex_init() location
	fuse: annotate lock in fuse_reverse_inval_entry()
	ovl: fix warning in ovl_create_real()
	scsi: scsi_debug: Don't call kcalloc() if size arg is zero
	scsi: scsi_debug: Fix type in min_t to avoid stack OOB
	scsi: scsi_debug: Sanity check block descriptor length in resp_mode_select()
	rcu: Mark accesses to rcu_state.n_force_qs
	bus: ti-sysc: Fix variable set but not used warning for reinit_modules
	Revert "xsk: Do not sleep in poll() when need_wakeup set"
	xen/blkfront: harden blkfront against event channel storms
	xen/netfront: harden netfront against event channel storms
	xen/console: harden hvc_xen against event channel storms
	xen/netback: fix rx queue stall detection
	xen/netback: don't queue unlimited number of packages
	Linux 5.10.88

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I39275293003563850699f749101fab2843299abc
2021-12-29 13:18:16 +01:00
Paolo Bonzini
49b7e49692 KVM: downgrade two BUG_ONs to WARN_ON_ONCE
[ Upstream commit 5f25e71e31 ]

This is not an unrecoverable situation.  Users of kvm_read_guest_offset_cached
and kvm_write_guest_offset_cached must expect the read/write to fail, and
therefore it is possible to just return early with an error value.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22 09:30:50 +01:00
Greg Kroah-Hartman
29428cf3ff Merge 5.10.84 into android13-5.10
Changes in 5.10.84
	NFSv42: Fix pagecache invalidation after COPY/CLONE
	can: j1939: j1939_tp_cmd_recv(): check the dst address of TP.CM_BAM
	ovl: simplify file splice
	ovl: fix deadlock in splice write
	gfs2: release iopen glock early in evict
	gfs2: Fix length of holes reported at end-of-file
	powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory"
	drm/sun4i: fix unmet dependency on RESET_CONTROLLER for PHY_SUN6I_MIPI_DPHY
	mac80211: do not access the IV when it was stripped
	net/smc: Transfer remaining wait queue entries during fallback
	atlantic: Fix OOB read and write in hw_atl_utils_fw_rpc_wait
	net: return correct error code
	platform/x86: thinkpad_acpi: Add support for dual fan control
	platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep
	s390/setup: avoid using memblock_enforce_memory_limit
	btrfs: check-integrity: fix a warning on write caching disabled disk
	thermal: core: Reset previous low and high trip during thermal zone init
	scsi: iscsi: Unblock session then wake up error handler
	drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
	drm/amd/amdgpu: fix potential memleak
	ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile
	ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port()
	ipv6: check return value of ipv6_skip_exthdr
	net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound
	net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
	perf inject: Fix ARM SPE handling
	perf hist: Fix memory leak of a perf_hpp_fmt
	perf report: Fix memory leaks around perf_tip()
	net/smc: Avoid warning of possible recursive locking
	ACPI: Add stubs for wakeup handler functions
	vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit
	kprobes: Limit max data_size of the kretprobe instances
	rt2x00: do not mark device gone on EPROTO errors during start
	ipmi: Move remove_work to dedicated workqueue
	cpufreq: Fix get_cpu_device() failure in add_cpu_dev_symlink()
	s390/pci: move pseudo-MMIO to prevent MIO overlap
	fget: check that the fd still exists after getting a ref to it
	sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl
	sata_fsl: fix warning in remove_proc_entry when rmmod sata_fsl
	ipv6: fix memory leak in fib6_rule_suppress
	drm/amd/display: Allow DSC on supported MST branch devices
	KVM: Disallow user memslot with size that exceeds "unsigned long"
	KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUEST
	KVM: x86: Use a stable condition around all VT-d PI paths
	KVM: arm64: Avoid setting the upper 32 bits of TCR_EL2 and CPTR_EL2 to 1
	KVM: X86: Use vcpu->arch.walk_mmu for kvm_mmu_invlpg()
	tracing/histograms: String compares should not care about signed values
	wireguard: selftests: increase default dmesg log size
	wireguard: allowedips: add missing __rcu annotation to satisfy sparse
	wireguard: selftests: actually test for routing loops
	wireguard: selftests: rename DEBUG_PI_LIST to DEBUG_PLIST
	wireguard: device: reset peer src endpoint when netns exits
	wireguard: receive: use ring buffer for incoming handshakes
	wireguard: receive: drop handshakes if queue lock is contended
	wireguard: ratelimiter: use kvcalloc() instead of kvzalloc()
	i2c: stm32f7: flush TX FIFO upon transfer errors
	i2c: stm32f7: recover the bus on access timeout
	i2c: stm32f7: stop dma transfer in case of NACK
	i2c: cbus-gpio: set atomic transfer callback
	natsemi: xtensa: fix section mismatch warnings
	tcp: fix page frag corruption on page fault
	net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings()
	net: mpls: Fix notifications when deleting a device
	siphash: use _unaligned version by default
	arm64: ftrace: add missing BTIs
	net/mlx4_en: Fix an use-after-free bug in mlx4_en_try_alloc_resources()
	selftests: net: Correct case name
	mt76: mt7915: fix NULL pointer dereference in mt7915_get_phy_mode
	ASoC: tegra: Fix wrong value type in ADMAIF
	ASoC: tegra: Fix wrong value type in I2S
	ASoC: tegra: Fix wrong value type in DMIC
	ASoC: tegra: Fix wrong value type in DSPK
	ASoC: tegra: Fix kcontrol put callback in ADMAIF
	ASoC: tegra: Fix kcontrol put callback in I2S
	ASoC: tegra: Fix kcontrol put callback in DMIC
	ASoC: tegra: Fix kcontrol put callback in DSPK
	ASoC: tegra: Fix kcontrol put callback in AHUB
	rxrpc: Fix rxrpc_peer leak in rxrpc_look_up_bundle()
	rxrpc: Fix rxrpc_local leak in rxrpc_lookup_peer()
	ALSA: intel-dsp-config: add quirk for CML devices based on ES8336 codec
	net: usb: lan78xx: lan78xx_phy_init(): use PHY_POLL instead of "0" if no IRQ is available
	net: marvell: mvpp2: Fix the computation of shared CPUs
	dpaa2-eth: destroy workqueue at the end of remove function
	net: annotate data-races on txq->xmit_lock_owner
	ipv4: convert fib_num_tclassid_users to atomic_t
	net/smc: fix wrong list_del in smc_lgr_cleanup_early
	net/rds: correct socket tunable error in rds_tcp_tune()
	net/smc: Keep smc_close_final rc during active close
	drm/msm/a6xx: Allocate enough space for GMU registers
	drm/msm: Do hw_init() before capturing GPU state
	atlantic: Increase delay for fw transactions
	atlatnic: enable Nbase-t speeds with base-t
	atlantic: Fix to display FW bundle version instead of FW mac version.
	atlantic: Add missing DIDs and fix 115c.
	Remove Half duplex mode speed capabilities.
	atlantic: Fix statistics logic for production hardware
	atlantic: Remove warn trace message.
	KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register
	KVM: VMX: Set failure code in prepare_vmcs02()
	x86/sev: Fix SEV-ES INS/OUTS instructions for word, dword, and qword
	x86/entry: Use the correct fence macro after swapgs in kernel CR3
	x86/xen: Add xenpv_restore_regs_and_return_to_usermode()
	sched/uclamp: Fix rq->uclamp_max not set on first enqueue
	x86/pv: Switch SWAPGS to ALTERNATIVE
	x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()
	parisc: Fix KBUILD_IMAGE for self-extracting kernel
	parisc: Fix "make install" on newer debian releases
	vgacon: Propagate console boot parameters before calling `vc_resize'
	xhci: Fix commad ring abort, write all 64 bits to CRCR register.
	USB: NO_LPM quirk Lenovo Powered USB-C Travel Hub
	usb: typec: tcpm: Wait in SNK_DEBOUNCED until disconnect
	x86/tsc: Add a timer to make sure TSC_adjust is always checked
	x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
	x86/64/mm: Map all kernel memory into trampoline_pgd
	tty: serial: msm_serial: Deactivate RX DMA for polling support
	serial: pl011: Add ACPI SBSA UART match id
	serial: tegra: Change lower tolerance baud rate limit for tegra20 and tegra30
	serial: core: fix transmit-buffer reset and memleak
	serial: 8250_pci: Fix ACCES entries in pci_serial_quirks array
	serial: 8250_pci: rewrite pericom_do_set_divisor()
	serial: 8250: Fix RTS modem control while in rs485 mode
	iwlwifi: mvm: retry init flow if failed
	parisc: Mark cr16 CPU clocksource unstable on all SMP machines
	net/tls: Fix authentication failure in CCM mode
	ipmi: msghandler: Make symbol 'remove_work_wq' static
	Linux 5.10.84

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I90caaa6bd343e4180abcf8904a06c7ccc7b7b582
2021-12-08 09:44:28 +01:00
Sean Christopherson
6a44f200f1 KVM: Disallow user memslot with size that exceeds "unsigned long"
commit 6b285a5587 upstream.

Reject userspace memslots whose size exceeds the storage capacity of an
"unsigned long".  KVM's uAPI takes the size as u64 to support large slots
on 64-bit hosts, but does not account for the size being truncated on
32-bit hosts in various flows.  The access_ok() check on the userspace
virtual address in particular casts the size to "unsigned long" and will
check the wrong number of bytes.

KVM doesn't actually support slots whose size doesn't fit in an "unsigned
long", e.g. KVM's internal kvm_memory_slot.npages is an "unsigned long",
not a "u64", and misc arch specific code follows that behavior.

Fixes: fa3d315a4c ("KVM: Validate userspace_addr of memslot when registered")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <20211104002531.1176691-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08 09:03:21 +01:00
Greg Kroah-Hartman
0f820fe5a9 Merge 5.10.72 into android13-5.10
Changes in 5.10.72
	spi: rockchip: handle zero length transfers without timing out
	platform/x86: touchscreen_dmi: Add info for the Chuwi HiBook (CWI514) tablet
	platform/x86: touchscreen_dmi: Update info for the Chuwi Hi10 Plus (CWI527) tablet
	nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN
	btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling
	btrfs: fix mount failure due to past and transient device flush error
	net: mdio: introduce a shutdown method to mdio device drivers
	xen-netback: correct success/error reporting for the SKB-with-fraglist case
	sparc64: fix pci_iounmap() when CONFIG_PCI is not set
	ext2: fix sleeping in atomic bugs on error
	scsi: sd: Free scsi_disk device via put_device()
	usb: testusb: Fix for showing the connection speed
	usb: dwc2: check return value after calling platform_get_resource()
	habanalabs/gaudi: fix LBW RR configuration
	selftests: be sure to make khdr before other targets
	selftests:kvm: fix get_warnings_count() ignoring fscanf() return warn
	nvme-fc: update hardware queues before using them
	nvme-fc: avoid race between time out and tear down
	thermal/drivers/tsens: Fix wrong check for tzd in irq handlers
	scsi: ses: Retry failed Send/Receive Diagnostic commands
	irqchip/gic: Work around broken Renesas integration
	smb3: correct smb3 ACL security descriptor
	tools/vm/page-types: remove dependency on opt_file for idle page tracking
	selftests: KVM: Align SMCCC call with the spec in steal_time
	KVM: do not shrink halt_poll_ns below grow_start
	kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[]
	KVM: x86: nSVM: restore int_vector in svm_clear_vintr
	perf/x86: Reset destroy callback on event init failure
	libata: Add ATA_HORKAGE_NO_NCQ_ON_ATI for Samsung 860 and 870 SSD.
	Linux 5.10.72

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I842a64f2a02b26c46b6d1151fe343497854c6300
2021-10-09 15:04:37 +02:00
Sergey Senozhatsky
6d0ff92059 KVM: do not shrink halt_poll_ns below grow_start
[ Upstream commit ae232ea460 ]

grow_halt_poll_ns() ignores values between 0 and
halt_poll_ns_grow_start (10000 by default). However,
when we shrink halt_poll_ns we may fall way below
halt_poll_ns_grow_start and endup with halt_poll_ns
values that don't make a lot of sense: like 1 or 9,
or 19.

VCPU1 trace (halt_poll_ns_shrink equals 2):

VCPU1 grow 10000
VCPU1 shrink 5000
VCPU1 shrink 2500
VCPU1 shrink 1250
VCPU1 shrink 625
VCPU1 shrink 312
VCPU1 shrink 156
VCPU1 shrink 78
VCPU1 shrink 39
VCPU1 shrink 19
VCPU1 shrink 9
VCPU1 shrink 4

Mirror what grow_halt_poll_ns() does and set halt_poll_ns
to 0 as soon as new shrink-ed halt_poll_ns value falls
below halt_poll_ns_grow_start.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210902031100.252080-1-senozhatsky@chromium.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-09 14:40:58 +02:00
Marc Zyngier
41ddd2df53 UPSTREAM: KVM: Get rid of kvm_get_pfn()
Nobody is using kvm_get_pfn() anymore. Get rid of it.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210726153552.1535838-7-maz@kernel.org
(cherry picked from commit 36c3ce6c0d)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: I8cda22ef8cb326ea4b7eb5fa39c1037d37467cf9
2021-09-14 10:07:12 +01:00
Marc Zyngier
5c4d36c61b UPSTREAM: KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap()
Now that arm64 has stopped using kvm_is_transparent_hugepage(),
we can remove it, as well as PageTransCompoundMap() which was
only used by the former.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210726153552.1535838-5-maz@kernel.org
(cherry picked from commit 205d76ff06)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 198418208
Change-Id: Ie478579e2d07b6df633851b23de984ff1f68c5d2
2021-09-14 10:07:11 +01:00
Paolo Bonzini
32f55c25ee KVM: Do not leak memory for duplicate debugfs directories
commit 85cd39af14 upstream.

KVM creates a debugfs directory for each VM in order to store statistics
about the virtual machine.  The directory name is built from the process
pid and a VM fd.  While generally unique, it is possible to keep a
file descriptor alive in a way that causes duplicate directories, which
manifests as these messages:

  [  471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present!

Even though this should not happen in practice, it is more or less
expected in the case of KVM for testcases that call KVM_CREATE_VM and
close the resulting file descriptor repeatedly and in parallel.

When this happens, debugfs_create_dir() returns an error but
kvm_create_vm_debugfs() goes on to allocate stat data structs which are
later leaked.  The slow memory leak was spotted by syzkaller, where it
caused OOM reports.

Since the issue only affects debugfs, do a lookup before calling
debugfs_create_dir, so that the message is downgraded and rate-limited.
While at it, ensure kvm->debugfs_dentry is NULL rather than an error
if it is not created.  This fixes kvm_destroy_vm_debugfs, which was not
checking IS_ERR_OR_NULL correctly.

Cc: stable@vger.kernel.org
Fixes: 536a6f88c4 ("KVM: Create debugfs dir and stat files for each VM")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-12 13:22:17 +02:00
Paolo Bonzini
52acb6c147 KVM: add missing compat KVM_CLEAR_DIRTY_LOG
commit 8750f9bbda upstream.

The arguments to the KVM_CLEAR_DIRTY_LOG ioctl include a pointer,
therefore it needs a compat ioctl implementation.  Otherwise,
32-bit userspace fails to invoke it on 64-bit kernels; for x86
it might work fine by chance if the padding is zero, but not
on big-endian architectures.

Reported-by: Thomas Sattler
Cc: stable@vger.kernel.org
Fixes: 2a31b9db15 ("kvm: introduce manual dirty log reprotect")
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-04 12:46:39 +02:00
Kefeng Wang
679837dc0a KVM: mmio: Fix use-after-free Read in kvm_vm_ioctl_unregister_coalesced_mmio
commit 23fa2e46a5 upstream.

BUG: KASAN: use-after-free in kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183
Read of size 8 at addr ffff0000c03a2500 by task syz-executor083/4269

CPU: 5 PID: 4269 Comm: syz-executor083 Not tainted 5.10.0 #7
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
 show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x110/0x164 lib/dump_stack.c:118
 print_address_description+0x78/0x5c8 mm/kasan/report.c:385
 __kasan_report mm/kasan/report.c:545 [inline]
 kasan_report+0x148/0x1e4 mm/kasan/report.c:562
 check_memory_region_inline mm/kasan/generic.c:183 [inline]
 __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
 kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183
 kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl fs/ioctl.c:739 [inline]
 __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
 el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
 do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670

Allocated by task 4269:
 stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
 kasan_save_stack mm/kasan/common.c:48 [inline]
 kasan_set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
 kmem_cache_alloc_trace include/linux/slab.h:450 [inline]
 kmalloc include/linux/slab.h:552 [inline]
 kzalloc include/linux/slab.h:664 [inline]
 kvm_vm_ioctl_register_coalesced_mmio+0x78/0x1cc arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:146
 kvm_vm_ioctl+0x7e8/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3746
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl fs/ioctl.c:739 [inline]
 __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
 el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
 do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670

Freed by task 4269:
 stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
 kasan_save_stack mm/kasan/common.c:48 [inline]
 kasan_set_track+0x38/0x6c mm/kasan/common.c:56
 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
 __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
 kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
 slab_free_hook mm/slub.c:1544 [inline]
 slab_free_freelist_hook mm/slub.c:1577 [inline]
 slab_free mm/slub.c:3142 [inline]
 kfree+0x104/0x38c mm/slub.c:4124
 coalesced_mmio_destructor+0x94/0xa4 arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:102
 kvm_iodevice_destructor include/kvm/iodev.h:61 [inline]
 kvm_io_bus_unregister_dev+0x248/0x280 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:4374
 kvm_vm_ioctl_unregister_coalesced_mmio+0x158/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:186
 kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl fs/ioctl.c:739 [inline]
 __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
 el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
 do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670

If kvm_io_bus_unregister_dev() return -ENOMEM, we already call kvm_iodevice_destructor()
inside this function to delete 'struct kvm_coalesced_mmio_dev *dev' from list
and free the dev, but kvm_iodevice_destructor() is called again, it will lead
the above issue.

Let's check the the return value of kvm_io_bus_unregister_dev(), only call
kvm_iodevice_destructor() if the return value is 0.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Message-Id: <20210626070304.143456-1-wangkefeng.wang@huawei.com>
Cc: stable@vger.kernel.org
Fixes: 5d3c4c7938 ("KVM: Stop looking for coalesced MMIO zones if the bus is destroyed", 2021-04-20)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-20 16:05:35 +02:00
Nicholas Piggin
dd8ed6c9bc KVM: do not allow mapping valid but non-reference-counted pages
commit f8be156be1 upstream.

It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on valid pages if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).

This addresses CVE-2021-22543.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30 08:47:25 -04:00
Zhu Lingshan
bc439b4b6a Revert "irqbypass: do not start cons/prod when failed connect"
commit e44b49f623 upstream.

This reverts commit a979a6aa00.

The reverted commit may cause VM freeze on arm64 with GICv4,
where stopping a consumer is implemented by suspending the VM.
Should the connect fail, the VM will not be resumed, which
is a bit of a problem.

It also erroneously calls the producer destructor unconditionally,
which is unexpected.

Reported-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Suggested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
[maz: tags and cc-stable, commit message update]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Fixes: a979a6aa00 ("irqbypass: do not start cons/prod when failed connect")
Link: https://lore.kernel.org/r/3a2c66d6-6ca0-8478-d24b-61e8e3241b20@hisilicon.com
Link: https://lore.kernel.org/r/20210508071152.722425-1-lingshan.zhu@intel.com
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-03 09:00:34 +02:00
Benjamin Segall
ce76392523 kvm: exit halt polling on need_resched() as well
commit 262de4102c upstream.

single_task_running() is usually more general than need_resched()
but CFS_BANDWIDTH throttling will use resched_task() when there
is just one task to get the task to block. This was causing
long-need_resched warnings and was likely allowing VMs to
overrun their quota when halt polling.

Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org>
Message-Id: <20210429162233.116849-1-venkateshs@chromium.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19 10:13:12 +02:00
David Matlack
21756f878e kvm: Cap halt polling at kvm->max_halt_poll_ns
commit 258785ef08 upstream.

When growing halt-polling, there is no check that the poll time exceeds
the per-VM limit. It's possible for vcpu->halt_poll_ns to grow past
kvm->max_halt_poll_ns and stay there until a halt which takes longer
than kvm->halt_poll_ns.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org>
Message-Id: <20210506152442.4010298-1-venkateshs@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19 10:12:51 +02:00
Sean Christopherson
2a20592baf KVM: Stop looking for coalesced MMIO zones if the bus is destroyed
commit 5d3c4c7938 upstream.

Abort the walk of coalesced MMIO zones if kvm_io_bus_unregister_dev()
fails to allocate memory for the new instance of the bus.  If it can't
instantiate a new bus, unregister_dev() destroys all devices _except_ the
target device.   But, it doesn't tell the caller that it obliterated the
bus and invoked the destructor for all devices that were on the bus.  In
the coalesced MMIO case, this can result in a deleted list entry
dereference due to attempting to continue iterating on coalesced_zones
after future entries (in the walk) have been deleted.

Opportunistically add curly braces to the for-loop, which encompasses
many lines but sneaks by without braces due to the guts being a single
if statement.

Fixes: f65886606c ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable@vger.kernel.org
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210412222050.876100-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14 09:50:04 +02:00
Sean Christopherson
03c6cccedd KVM: Destroy I/O bus devices on unregister failure _after_ sync'ing SRCU
commit 2ee3757424 upstream.

If allocating a new instance of an I/O bus fails when unregistering a
device, wait to destroy the device until after all readers are guaranteed
to see the new null bus.  Destroying devices before the bus is nullified
could lead to use-after-free since readers expect the devices on their
reference of the bus to remain valid.

Fixes: f65886606c ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210412222050.876100-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14 09:50:04 +02:00
Sean Christopherson
3320aa64c3 KVM: Use kvm_pfn_t for local PFN variable in hva_to_pfn_remapped()
commit a9545779ee upstream.

Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving
a so called "remapped" hva/pfn pair.  In theory, the hva could resolve to
a pfn in high memory on a 32-bit kernel.

This bug was inadvertantly exposed by commit bd2fae8da7 ("KVM: do not
assume PTE is writable after follow_pfn"), which added an error PFN value
to the mix, causing gcc to comlain about overflowing the unsigned long.

  arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function ‘hva_to_pfn_remapped’:
  include/linux/kvm_host.h:89:30: error: conversion from ‘long long unsigned int’
                                  to ‘long unsigned int’ changes value from
                                  ‘9218868437227405314’ to ‘2’ [-Werror=overflow]
   89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2)
      |                              ^
virt/kvm/kvm_main.c:1935:9: note: in expansion of macro ‘KVM_PFN_ERR_RO_FAULT’

Cc: stable@vger.kernel.org
Fixes: add6a0cd1c ("KVM: MMU: try to fix up page faults before giving up")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210208201940.1258328-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-26 10:13:01 +01:00
Paolo Bonzini
a42150f1c9 mm: provide a saner PTE walking API for modules
commit 9fd6dad126 upstream.

Currently, the follow_pfn function is exported for modules but
follow_pte is not.  However, follow_pfn is very easy to misuse,
because it does not provide protections (so most of its callers
assume the page is writable!) and because it returns after having
already unlocked the page table lock.

Provide instead a simplified version of follow_pte that does
not have the pmdpp and range arguments.  The older version
survives as follow_invalidate_pte() for use by fs/dax.c.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-26 10:13:01 +01:00
Paolo Bonzini
83d42c2586 KVM: do not assume PTE is writable after follow_pfn
commit bd2fae8da7 upstream.

In order to convert an HVA to a PFN, KVM usually tries to use
the get_user_pages family of functinso.  This however is not
possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.

In doing this however KVM loses the information on whether the
PFN is writable.  That is usually not a problem because the main
use of VM_IO vmas with KVM is for BARs in PCI device assignment,
however it is a bug.  To fix it, use follow_pte and check pte_write
while under the protection of the PTE lock.  The information can
be used to fail hva_to_pfn_remapped or passed back to the
caller via *writable.

Usage of follow_pfn was introduced in commit add6a0cd1c ("KVM: MMU: try to fix
up page faults before giving up", 2016-07-05); however, even older version
have the same issue, all the way back to commit 2e2e3738af ("KVM:
Handle vma regions with no backing page", 2008-07-20), as they also did
not check whether the PFN was writable.

Fixes: 2e2e3738af ("KVM: Handle vma regions with no backing page")
Reported-by: David Stevens <stevensd@google.com>
Cc: 3pvd@google.com
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-26 10:13:01 +01:00
Marc Zyngier
256a0040c6 KVM: Forbid the use of tagged userspace addresses for memslots
commit 139bc8a614 upstream.

The use of a tagged address could be pretty confusing for the
whole memslot infrastructure as well as the MMU notifiers.

Forbid it altogether, as it never quite worked the first place.

Cc: stable@vger.kernel.org
Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-03 23:28:41 +01:00
Lai Jiangshan
ffee6772c4 kvm: check tlbs_dirty directly
commit 88bf56d04b upstream.

In kvm_mmu_notifier_invalidate_range_start(), tlbs_dirty is used as:
        need_tlb_flush |= kvm->tlbs_dirty;
with need_tlb_flush's type being int and tlbs_dirty's type being long.

It means that tlbs_dirty is always used as int and the higher 32 bits
is useless.  We need to check tlbs_dirty in a correct way and this
change checks it directly without propagating it to need_tlb_flush.

Note: it's _extremely_ unlikely this neglecting of higher 32 bits can
cause problems in practice.  It would require encountering tlbs_dirty
on a 4 billion count boundary, and KVM would need to be using shadow
paging or be running a nested guest.

Cc: stable@vger.kernel.org
Fixes: a4ee1ca4a3 ("KVM: MMU: delay flush all tlbs on sync_page path")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20201217154118.16497-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:22 +01:00
Ben Gardon
a6a0b05da9 kvm: x86/mmu: Support dirty logging for the TDP MMU
Dirty logging is a key feature of the KVM MMU and must be supported by
the TDP MMU. Add support for both the write protection and PML dirty
logging modes.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-16-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-23 03:42:13 -04:00
Peter Xu
9e9eb226b9 KVM: Cache as_id in kvm_memory_slot
Cache the address space ID just like the slot ID.  It will be used in
order to fill in the dirty ring entries.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201014182700.2888246-7-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 18:17:01 -04:00
Rustam Kovhaev
871c433bae KVM: use struct_size() and flex_array_size() helpers in kvm_io_bus_unregister_dev()
Make use of the struct_size() helper to avoid any potential type
mistakes and protect against potential integer overflows
Make use of the flex_array_size() helper to calculate the size of a
flexible array member within an enclosing structure

Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
Message-Id: <20200918120500.954436-1-rkovhaev@gmail.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:17 -04:00
Yi Li
2fc4f15dac kvm/eventfd: move wildcard calculation outside loop
There is no need to calculate wildcard in each iteration
since wildcard is not changed.

Signed-off-by: Yi Li <yili@winhong.com>
Message-Id: <20200911055652.3041762-1-yili@winhong.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:08 -04:00
Rustam Kovhaev
f65886606c KVM: fix memory leak in kvm_io_bus_unregister_dev()
when kmalloc() fails in kvm_io_bus_unregister_dev(), before removing
the bus, we should iterate over all other devices linked to it and call
kvm_iodevice_destructor() for them

Fixes: 90db10434b ("KVM: kvm_io_bus_unregister_dev() should never fail")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+f196caa45793d6374707@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=f196caa45793d6374707
Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200907185535.233114-1-rkovhaev@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11 13:15:11 -04:00
Paolo Bonzini
1b67fd086d Merge tag 'kvmarm-fixes-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for Linux 5.9, take #1

- Multiple stolen time fixes, with a new capability to match x86
- Fix for hugetlbfs mappings when PUD and PMD are the same level
- Fix for hugetlbfs mappings when PTE mappings are enforced
  (dirty logging, for example)
- Fix tracing output of 64bit values
2020-09-11 13:12:11 -04:00
Will Deacon
fdfe7cbd58 KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()
The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.

Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.

Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-2-will@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21 18:03:47 -04:00
Linus Torvalds
9ad57f6dfc Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - most of the rest of MM (memcg, hugetlb, vmscan, proc, compaction,
   mempolicy, oom-kill, hugetlbfs, migration, thp, cma, util,
   memory-hotplug, cleanups, uaccess, migration, gup, pagemap),

 - various other subsystems (alpha, misc, sparse, bitmap, lib, bitops,
   checkpatch, autofs, minix, nilfs, ufs, fat, signals, kmod, coredump,
   exec, kdump, rapidio, panic, kcov, kgdb, ipc).

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (164 commits)
  mm/gup: remove task_struct pointer for all gup code
  mm: clean up the last pieces of page fault accountings
  mm/xtensa: use general page fault accounting
  mm/x86: use general page fault accounting
  mm/sparc64: use general page fault accounting
  mm/sparc32: use general page fault accounting
  mm/sh: use general page fault accounting
  mm/s390: use general page fault accounting
  mm/riscv: use general page fault accounting
  mm/powerpc: use general page fault accounting
  mm/parisc: use general page fault accounting
  mm/openrisc: use general page fault accounting
  mm/nios2: use general page fault accounting
  mm/nds32: use general page fault accounting
  mm/mips: use general page fault accounting
  mm/microblaze: use general page fault accounting
  mm/m68k: use general page fault accounting
  mm/ia64: use general page fault accounting
  mm/hexagon: use general page fault accounting
  mm/csky: use general page fault accounting
  ...
2020-08-12 11:24:12 -07:00
Peter Xu
64019a2e46 mm/gup: remove task_struct pointer for all gup code
After the cleanup of page fault accounting, gup does not need to pass
task_struct around any more.  Remove that parameter in the whole gup
stack.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:58:04 -07:00
Linus Torvalds
57b0779392 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:

 - IRQ bypass support for vdpa and IFC

 - MLX5 vdpa driver

 - Endianness fixes for virtio drivers

 - Misc other fixes

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (71 commits)
  vdpa/mlx5: fix up endian-ness for mtu
  vdpa: Fix pointer math bug in vdpasim_get_config()
  vdpa/mlx5: Fix pointer math in mlx5_vdpa_get_config()
  vdpa/mlx5: fix memory allocation failure checks
  vdpa/mlx5: Fix uninitialised variable in core/mr.c
  vdpa_sim: init iommu lock
  virtio_config: fix up warnings on parisc
  vdpa/mlx5: Add VDPA driver for supported mlx5 devices
  vdpa/mlx5: Add shared memory registration code
  vdpa/mlx5: Add support library for mlx5 VDPA implementation
  vdpa/mlx5: Add hardware descriptive header file
  vdpa: Modify get_vq_state() to return error code
  net/vdpa: Use struct for set/get vq state
  vdpa: remove hard coded virtq num
  vdpasim: support batch updating
  vhost-vdpa: support IOTLB batching hints
  vhost-vdpa: support get/set backend features
  vhost: generialize backend features setting/getting
  vhost-vdpa: refine ioctl pre-processing
  vDPA: dont change vq irq after DRIVER_OK
  ...
2020-08-11 14:34:17 -07:00
Linus Torvalds
97d052ea3f Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Thomas Gleixner:
 "A set of locking fixes and updates:

   - Untangle the header spaghetti which causes build failures in
     various situations caused by the lockdep additions to seqcount to
     validate that the write side critical sections are non-preemptible.

   - The seqcount associated lock debug addons which were blocked by the
     above fallout.

     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict
     per CPU seqcounts. As the lock is not part of the seqcount, lockdep
     cannot validate that the lock is held.

     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored
     and write_seqcount_begin() has a lockdep assertion to validate that
     the lock is held.

     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API
     is unchanged and determines the type at compile time with the help
     of _Generic which is possible now that the minimal GCC version has
     been moved up.

     Adding this lockdep coverage unearthed a handful of seqcount bugs
     which have been addressed already independent of this.

     While generally useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if
     the writers are serialized by an associated lock, which leads to
     the well known reader preempts writer livelock. RT prevents this by
     storing the associated lock pointer independent of lockdep in the
     seqcount and changing the reader side to block on the lock when a
     reader detects that a writer is in the write side critical section.

   - Conversion of seqcount usage sites to associated types and
     initializers"

* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/seqlock, headers: Untangle the spaghetti monster
  locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header
  x86/headers: Remove APIC headers from <asm/smp.h>
  seqcount: More consistent seqprop names
  seqcount: Compress SEQCNT_LOCKNAME_ZERO()
  seqlock: Fold seqcount_LOCKNAME_init() definition
  seqlock: Fold seqcount_LOCKNAME_t definition
  seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
  hrtimer: Use sequence counter with associated raw spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  vfs: Use sequence counter with associated spinlock
  timekeeping: Use sequence counter with associated raw spinlock
  xfrm: policy: Use sequence counters with associated lock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  sched: tasks: Use sequence counter with associated spinlock
  ...
2020-08-10 19:07:44 -07:00
Linus Torvalds
921d2597ab Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "s390:
   - implement diag318

  x86:
   - Report last CPU for debugging
   - Emulate smaller MAXPHYADDR in the guest than in the host
   - .noinstr and tracing fixes from Thomas
   - nested SVM page table switching optimization and fixes

  Generic:
   - Unify shadow MMU cache data structures across architectures"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
  KVM: SVM: Fix sev_pin_memory() error handling
  KVM: LAPIC: Set the TDCR settable bits
  KVM: x86: Specify max TDP level via kvm_configure_mmu()
  KVM: x86/mmu: Rename max_page_level to max_huge_page_level
  KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR
  KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch
  KVM: x86: Pull the PGD's level from the MMU instead of recalculating it
  KVM: VMX: Make vmx_load_mmu_pgd() static
  KVM: x86/mmu: Add separate helper for shadow NPT root page role calc
  KVM: VMX: Drop a duplicate declaration of construct_eptp()
  KVM: nSVM: Correctly set the shadow NPT root level in its MMU role
  KVM: Using macros instead of magic values
  MIPS: KVM: Fix build error caused by 'kvm_run' cleanup
  KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF
  KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
  KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
  KVM: VMX: Add guest physical address check in EPT violation and misconfig
  KVM: VMX: introduce vmx_need_pf_intercept
  KVM: x86: update exception bitmap on CPUID changes
  KVM: x86: rename update_bp_intercept to update_exception_bitmap
  ...
2020-08-06 12:59:31 -07:00
Zhu Lingshan
a979a6aa00 irqbypass: do not start cons/prod when failed connect
If failed to connect, there is no need to start consumer nor
producer.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200731065533.4144-7-lingshan.zhu@intel.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05 11:08:42 -04:00
Ahmed S. Darwish
5c73b9a2b1 kvm/eventfd: Use sequence counter with associated spinlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/20200720155530.1173732-24-a.darwish@linutronix.de
2020-07-29 16:14:29 +02:00
Thomas Gleixner
935ace2fb5 entry: Provide infrastructure for work before transitioning to guest mode
Entering a guest is similar to exiting to user space. Pending work like
handling signals, rescheduling, task work etc. needs to be handled before
that.

Provide generic infrastructure to avoid duplication of the same handling
code all over the place.

The transfer to guest mode handling is different from the exit to usermode
handling, e.g. vs. rseq and live patching, so a separate function is used.

The initial list of work items handled is:

    TIF_SIGPENDING, TIF_NEED_RESCHED, TIF_NOTIFY_RESUME

Architecture specific TIF flags can be added via defines in the
architecture specific include files.

The calling convention is also different from the syscall/interrupt entry
functions as KVM invokes this from the outer vcpu_run() loop with
interrupts and preemption enabled. To prevent missing a pending work item
it invokes a check for pending TIF work from interrupt disabled code right
before transitioning to guest mode. The lockdep, RCU and tracing state
handling is also done directly around the switch to and from guest mode.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200722220519.833296398@linutronix.de
2020-07-24 15:03:42 +02:00
Sean Christopherson
6926f95acc KVM: Move x86's MMU memory cache helpers to common KVM code
Move x86's memory cache helpers to common KVM code so that they can be
reused by arm64 and MIPS in future patches.

Suggested-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-16-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:42 -04:00
Vitaly Kuznetsov
995decb6c4 KVM: x86: take as_id into account when checking PGD
OVMF booted guest running on shadow pages crashes on TRIPLE FAULT after
enabling paging from SMM. The crash is triggered from mmu_check_root() and
is caused by kvm_is_visible_gfn() searching through memslots with as_id = 0
while vCPU may be in a different context (address space).

Introduce kvm_vcpu_is_visible_gfn() and use it from mmu_check_root().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200708140023.1476020-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 07:08:37 -04:00
Vitaly Kuznetsov
e8c22266e6 KVM: async_pf: change kvm_setup_async_pf()/kvm_arch_setup_async_pf() return type to bool
Unlike normal 'int' functions returning '0' on success, kvm_setup_async_pf()/
kvm_arch_setup_async_pf() return '1' when a job to handle page fault
asynchronously was scheduled and '0' otherwise. To avoid the confusion
change return type to 'bool'.

No functional change intended.

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200615121334.91300-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:36 -04:00
Paolo Bonzini
1393b4aaf9 kvm: use more precise cast and do not drop __user
Sparse complains on a call to get_compat_sigset, fix it.  The "if"
right above explains that sigmask_arg->sigset is basically a
compat_sigset_t.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-02 05:39:31 -04:00
Linus Torvalds
52cd0d972f Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
 "The guest side of the asynchronous page fault work has been delayed to
  5.9 in order to sync with Thomas's interrupt entry rework, but here's
  the rest of the KVM updates for this merge window.

  MIPS:
   - Loongson port

  PPC:
   - Fixes

  ARM:
   - Fixes

  x86:
   - KVM_SET_USER_MEMORY_REGION optimizations
   - Fixes
   - Selftest fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (62 commits)
  KVM: x86: do not pass poisoned hva to __kvm_set_memory_region
  KVM: selftests: fix sync_with_host() in smm_test
  KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected
  KVM: async_pf: Cleanup kvm_setup_async_pf()
  kvm: i8254: remove redundant assignment to pointer s
  KVM: x86: respect singlestep when emulating instruction
  KVM: selftests: Don't probe KVM_CAP_HYPERV_ENLIGHTENED_VMCS when nested VMX is unsupported
  KVM: selftests: do not substitute SVM/VMX check with KVM_CAP_NESTED_STATE check
  KVM: nVMX: Consult only the "basic" exit reason when routing nested exit
  KVM: arm64: Move hyp_symbol_addr() to kvm_asm.h
  KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
  KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts
  KVM: arm64: Remove host_cpu_context member from vcpu structure
  KVM: arm64: Stop sparse from moaning at __hyp_this_cpu_ptr
  KVM: arm64: Handle PtrAuth traps early
  KVM: x86: Unexport x86_fpu_cache and make it static
  KVM: selftests: Ignore KVM 5-level paging support for VM_MODE_PXXV48_4K
  KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
  KVM: arm64: Stop save/restoring ACTLR_EL1
  KVM: arm64: Add emulation for 32bit guests accessing ACTLR2
  ...
2020-06-12 11:05:52 -07:00
Vitaly Kuznetsov
2a18b7e7cd KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected
'Page not present' event may or may not get injected depending on
guest's state. If the event wasn't injected, there is no need to
inject the corresponding 'page ready' event as the guest may get
confused. E.g. Linux thinks that the corresponding 'page not present'
event wasn't delivered *yet* and allocates a 'dummy entry' for it.
This entry is never freed.

Note, 'wakeup all' events have no corresponding 'page not present'
event and always get injected.

s390 seems to always be able to inject 'page not present', the
change is effectively a nop.

Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610175532.779793-2-vkuznets@redhat.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=208081
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-11 12:35:19 -04:00
Vitaly Kuznetsov
7863e346e1 KVM: async_pf: Cleanup kvm_setup_async_pf()
schedule_work() returns 'false' only when the work is already on the queue
and this can't happen as kvm_setup_async_pf() always allocates a new one.
Also, to avoid potential race, it makes sense to to schedule_work() at the
very end after we've added it to the queue.

While on it, do some minor cleanup. gfn_to_pfn_async() mentioned in a
comment does not currently exist and, moreover, we can check
kvm_is_error_hva() at the very beginning, before we try to allocate work so
'retry_sync' label can go away completely.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610175532.779793-1-vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-11 12:35:19 -04:00
Michel Lespinasse
d8ed45c5dc mmap locking API: use coccinelle to convert mmap_sem rwsem call sites
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Mike Rapoport
e31cf2f4ca mm: don't include asm/pgtable.h if linux/mm.h is already included
Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once.  For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
        return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
        return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
functions involving page table manipulations, e.g.  pte_alloc() and
pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
in the files that include <linux/mm.h>.

The include statements in such cases are remove with a simple loop:

	for f in $(git grep -l "include <linux/mm.h>") ; do
		sed -i -e '/include <asm\/pgtable.h>/ d' $f
	done

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00