Changes in 5.10.101
integrity: check the return value of audit_log_start()
ima: Remove ima_policy file before directory
ima: Allow template selection with ima_template[_fmt]= after ima_hash=
ima: Do not print policy rule with inactive LSM labels
mmc: sdhci-of-esdhc: Check for error num after setting mask
can: isotp: fix potential CAN frame reception race in isotp_rcv()
net: phy: marvell: Fix RGMII Tx/Rx delays setting in 88e1121-compatible PHYs
net: phy: marvell: Fix MDI-x polarity setting in 88e1118-compatible PHYs
NFS: Fix initialisation of nfs_client cl_flags field
NFSD: Clamp WRITE offsets
NFSD: Fix offset type in I/O trace points
drm/amdgpu: Set a suitable dev_info.gart_page_size
tracing: Propagate is_signed to expression
NFS: change nfs_access_get_cached to only report the mask
NFSv4 only print the label when its queried
nfs: nfs4clinet: check the return value of kstrdup()
NFSv4.1: Fix uninitialised variable in devicenotify
NFSv4 remove zero number of fs_locations entries error check
NFSv4 expose nfs_parse_server_name function
NFSv4 handle port presence in fs_location server string
x86/perf: Avoid warning for Arch LBR without XSAVE
drm: panel-orientation-quirks: Add quirk for the 1Netbook OneXPlayer
net: sched: Clarify error message when qdisc kind is unknown
powerpc/fixmap: Fix VM debug warning on unmap
scsi: target: iscsi: Make sure the np under each tpg is unique
scsi: ufs: ufshcd-pltfrm: Check the return value of devm_kstrdup()
scsi: qedf: Add stag_work to all the vports
scsi: qedf: Fix refcount issue when LOGO is received during TMF
scsi: pm8001: Fix bogus FW crash for maxcpus=1
scsi: ufs: Treat link loss as fatal error
scsi: myrs: Fix crash in error case
PM: hibernate: Remove register_nosave_region_late()
usb: dwc2: gadget: don't try to disable ep0 in dwc2_hsotg_suspend
perf: Always wake the parent event
nvme-pci: add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs
net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()
KVM: eventfd: Fix false positive RCU usage warning
KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS
KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode
KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow
riscv: fix build with binutils 2.38
ARM: dts: imx23-evk: Remove MX23_PAD_SSP1_DETECT from hog group
ARM: dts: Fix boot regression on Skomer
ARM: socfpga: fix missing RESET_CONTROLLER
nvme-tcp: fix bogus request completion when failing to send AER
ACPI/IORT: Check node revision for PMCG resources
PM: s2idle: ACPI: Fix wakeup interrupts handling
drm/rockchip: vop: Correct RK3399 VOP register fields
ARM: dts: Fix timer regression for beagleboard revision c
ARM: dts: meson: Fix the UART compatible strings
ARM: dts: meson8: Fix the UART device-tree schema validation
ARM: dts: meson8b: Fix the UART device-tree schema validation
staging: fbtft: Fix error path in fbtft_driver_module_init()
ARM: dts: imx6qdl-udoo: Properly describe the SD card detect
phy: xilinx: zynqmp: Fix bus width setting for SGMII
ARM: dts: imx7ulp: Fix 'assigned-clocks-parents' typo
usb: f_fs: Fix use-after-free for epfile
gpio: aggregator: Fix calling into sleeping GPIO controllers
drm/vc4: hdmi: Allow DBLCLK modes even if horz timing is odd.
misc: fastrpc: avoid double fput() on failed usercopy
netfilter: ctnetlink: disable helper autoassign
arm64: dts: meson-g12b-odroid-n2: fix typo 'dio2133'
ixgbevf: Require large buffers for build_skb on 82599VF
drm/panel: simple: Assign data from panel_dpi_probe() correctly
ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE
gpio: sifive: use the correct register to read output values
bonding: pair enable_port with slave_arr_updates
net: dsa: mv88e6xxx: don't use devres for mdiobus
net: dsa: ar9331: register the mdiobus under devres
net: dsa: bcm_sf2: don't use devres for mdiobus
net: dsa: felix: don't use devres for mdiobus
net: dsa: lantiq_gswip: don't use devres for mdiobus
ipmr,ip6mr: acquire RTNL before calling ip[6]mr_free_table() on failure path
nfp: flower: fix ida_idx not being released
net: do not keep the dst cache when uncloning an skb dst and its metadata
net: fix a memleak when uncloning an skb dst and its metadata
veth: fix races around rq->rx_notify_masked
net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE
tipc: rate limit warning for received illegal binding update
net: amd-xgbe: disable interrupts during pci removal
dpaa2-eth: unregister the netdev before disconnecting from the PHY
ice: fix an error code in ice_cfg_phy_fec()
ice: fix IPIP and SIT TSO offload
net: mscc: ocelot: fix mutex lock error during ethtool stats read
net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
vt_ioctl: fix array_index_nospec in vt_setactivate
vt_ioctl: add array_index_nospec to VT_ACTIVATE
n_tty: wake up poll(POLLRDNORM) on receiving data
eeprom: ee1004: limit i2c reads to I2C_SMBUS_BLOCK_MAX
usb: dwc2: drd: fix soft connect when gadget is unconfigured
Revert "usb: dwc2: drd: fix soft connect when gadget is unconfigured"
net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup
usb: ulpi: Move of_node_put to ulpi_dev_release
usb: ulpi: Call of_node_put correctly
usb: dwc3: gadget: Prevent core from processing stale TRBs
usb: gadget: udc: renesas_usb3: Fix host to USB_ROLE_NONE transition
USB: gadget: validate interface OS descriptor requests
usb: gadget: rndis: check size of RNDIS_MSG_SET command
usb: gadget: f_uac2: Define specific wTerminalType
usb: raw-gadget: fix handling of dual-direction-capable endpoints
USB: serial: ftdi_sio: add support for Brainboxes US-159/235/320
USB: serial: option: add ZTE MF286D modem
USB: serial: ch341: add support for GW Instek USB2.0-Serial devices
USB: serial: cp210x: add NCR Retail IO box id
USB: serial: cp210x: add CPI Bulk Coin Recycler id
speakup-dectlk: Restore pitch setting
phy: ti: Fix missing sentinel for clk_div_table
hwmon: (dell-smm) Speed up setting of fan speed
Makefile.extrawarn: Move -Wunaligned-access to W=1
can: isotp: fix error path in isotp_sendmsg() to unlock wait queue
scsi: lpfc: Remove NVMe support if kernel has NVME_FC disabled
scsi: lpfc: Reduce log messages seen after firmware download
arm64: dts: imx8mq: fix lcdif port node
perf: Fix list corruption in perf_cgroup_switch()
iommu: Fix potential use-after-free during probe
Linux 5.10.101
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6105dcbfc0c7f1373020d378d2048e692dc502ab
Changes in 5.10.96
Bluetooth: refactor malicious adv data check
media: venus: core: Drop second v4l2 device unregister
net: sfp: ignore disabled SFP node
net: stmmac: skip only stmmac_ptp_register when resume from suspend
s390/module: fix loading modules with a lot of relocations
s390/hypfs: include z/VM guests with access control group set
bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
udf: Restore i_lenAlloc when inode expansion fails
udf: Fix NULL ptr deref when converting from inline format
efi: runtime: avoid EFIv2 runtime services on Apple x86 machines
PM: wakeup: simplify the output logic of pm_show_wakelocks()
tracing/histogram: Fix a potential memory leak for kstrdup()
tracing: Don't inc err_log entry count if entry allocation fails
ceph: properly put ceph_string reference after async create attempt
ceph: set pool_ns in new inode layout for async creates
fsnotify: fix fsnotify hooks in pseudo filesystems
Revert "KVM: SVM: avoid infinite loop on NPF from bad address"
perf/x86/intel/uncore: Fix CAS_COUNT_WRITE issue for ICX
drm/etnaviv: relax submit size limits
KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS
arm64: errata: Fix exec handling in erratum 1418040 workaround
netfilter: nft_payload: do not update layer 4 checksum when mangling fragments
serial: 8250: of: Fix mapped region size when using reg-offset property
serial: stm32: fix software flow control transfer
tty: n_gsm: fix SW flow control encoding/handling
tty: Add support for Brainboxes UC cards.
usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
usb: xhci-plat: fix crash when suspend if remote wake enable
usb: common: ulpi: Fix crash in ulpi_match()
usb: gadget: f_sourcesink: Fix isoc transfer for USB_SPEED_SUPER_PLUS
USB: core: Fix hang in usb_kill_urb by adding memory barriers
usb: typec: tcpm: Do not disconnect while receiving VBUS off
ucsi_ccg: Check DEV_INT bit only when starting CCG4
jbd2: export jbd2_journal_[grab|put]_journal_head
ocfs2: fix a deadlock when commit trans
sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask
x86/MCE/AMD: Allow thresholding interface updates after init
powerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATs
powerpc/32s: Fix kasan_init_region() for KASAN
powerpc/32: Fix boot failure with GCC latent entropy plugin
i40e: Increase delay to 1 s after global EMP reset
i40e: Fix issue when maximum queues is exceeded
i40e: Fix queues reservation for XDP
i40e: Fix for failed to init adminq while VF reset
i40e: fix unsigned stat widths
usb: roles: fix include/linux/usb/role.h compile issue
rpmsg: char: Fix race between the release of rpmsg_ctrldev and cdev
rpmsg: char: Fix race between the release of rpmsg_eptdev and cdev
scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()
ipv6_tunnel: Rate limit warning messages
net: fix information leakage in /proc/net/ptype
hwmon: (lm90) Mark alert as broken for MAX6646/6647/6649
hwmon: (lm90) Mark alert as broken for MAX6680
ping: fix the sk_bound_dev_if match in ping_lookup
ipv4: avoid using shared IP generator for connected sockets
hwmon: (lm90) Reduce maximum conversion rate for G781
NFSv4: Handle case where the lookup of a directory fails
NFSv4: nfs_atomic_open() can race when looking up a non-regular file
net-procfs: show net devices bound packet types
drm/msm: Fix wrong size calculation
drm/msm/dsi: Fix missing put_device() call in dsi_get_phy
drm/msm/dsi: invalid parameter check in msm_dsi_phy_enable
ipv6: annotate accesses to fn->fn_sernum
NFS: Ensure the server has an up to date ctime before hardlinking
NFS: Ensure the server has an up to date ctime before renaming
powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06
netfilter: conntrack: don't increment invalid counter on NF_REPEAT
kernel: delete repeated words in comments
perf: Fix perf_event_read_local() time
sched/pelt: Relax the sync of util_sum with util_avg
net: phy: broadcom: hook up soft_reset for BCM54616S
phylib: fix potential use-after-free
octeontx2-pf: Forward error codes to VF
rxrpc: Adjust retransmission backoff
efi/libstub: arm64: Fix image check alignment at entry
hwmon: (lm90) Mark alert as broken for MAX6654
powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending
net: ipv4: Move ip_options_fragment() out of loop
net: ipv4: Fix the warning for dereference
ipv4: fix ip option filtering for locally generated fragments
ibmvnic: init ->running_cap_crqs early
ibmvnic: don't spin in tasklet
video: hyperv_fb: Fix validation of screen resolution
drm/msm/hdmi: Fix missing put_device() call in msm_hdmi_get_phy
drm/msm/dpu: invalid parameter check in dpu_setup_dspp_pcc
yam: fix a memory leak in yam_siocdevprivate()
net: cpsw: Properly initialise struct page_pool_params
net: hns3: handle empty unknown interrupt for VF
Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
net: bridge: vlan: fix single net device option dumping
ipv4: raw: lock the socket in raw_bind()
ipv4: tcp: send zero IPID in SYNACK messages
ipv4: remove sparse error in ip_neigh_gw4()
net: bridge: vlan: fix memory leak in __allowed_ingress
dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config
usr/include/Makefile: add linux/nfc.h to the compile-test coverage
fsnotify: invalidate dcache before IN_DELETE event
block: Fix wrong offset in bio_truncate()
mtd: rawnand: mpc5121: Remove unused variable in ads5121_select_chip()
Linux 5.10.96
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3bb702bbb2f32488f2ba15d40ed00982e028d7cb
commit 31c2558569 upstream.
Revert a completely broken check on an "invalid" RIP in SVM's workaround
for the DecodeAssists SMAP errata. kvm_vcpu_gfn_to_memslot() obviously
expects a gfn, i.e. operates in the guest physical address space, whereas
RIP is a virtual (not even linear) address. The "fix" worked for the
problematic KVM selftest because the test identity mapped RIP.
Fully revert the hack instead of trying to translate RIP to a GPA, as the
non-SEV case is now handled earlier, and KVM cannot access guest page
tables to translate RIP.
This reverts commit e72436bc3a.
Fixes: e72436bc3a ("KVM: SVM: avoid infinite loop on NPF from bad address")
Reported-by: Liam Merwick <liam.merwick@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.10.88
KVM: selftests: Make sure kvm_create_max_vcpus test won't hit RLIMIT_NOFILE
KVM: downgrade two BUG_ONs to WARN_ON_ONCE
mac80211: fix regression in SSN handling of addba tx
mac80211: mark TX-during-stop for TX in in_reconfig
mac80211: send ADDBA requests using the tid/queue of the aggregation session
mac80211: validate extended element ID is present
firmware: arm_scpi: Fix string overflow in SCPI genpd driver
bpf: Fix signed bounds propagation after mov32
bpf: Make 32->64 bounds propagation slightly more robust
bpf, selftests: Add test case trying to taint map value pointer
virtio_ring: Fix querying of maximum DMA mapping size for virtio device
vdpa: check that offsets are within bounds
recordmcount.pl: look for jgnop instruction as well as bcrl on s390
dm btree remove: fix use after free in rebalance_children()
audit: improve robustness of the audit queue handling
arm64: dts: imx8m: correct assigned clocks for FEC
arm64: dts: imx8mp-evk: Improve the Ethernet PHY description
arm64: dts: rockchip: remove mmc-hs400-enhanced-strobe from rk3399-khadas-edge
arm64: dts: rockchip: fix rk3308-roc-cc vcc-sd supply
arm64: dts: rockchip: fix rk3399-leez-p710 vcc3v3-lan supply
arm64: dts: rockchip: fix audio-supply for Rock Pi 4
mac80211: track only QoS data frames for admission control
tee: amdtee: fix an IS_ERR() vs NULL bug
ceph: fix duplicate increment of opened_inodes metric
ceph: initialize pathlen variable in reconnect_caps_cb
ARM: socfpga: dts: fix qspi node compatible
clk: Don't parent clks until the parent is fully registered
soc: imx: Register SoC device only on i.MX boards
virtio/vsock: fix the transport to work with VMADDR_CID_ANY
selftests: net: Correct ping6 expected rc from 2 to 1
s390/kexec_file: fix error handling when applying relocations
sch_cake: do not call cake_destroy() from cake_init()
inet_diag: fix kernel-infoleak for UDP sockets
net: hns3: fix use-after-free bug in hclgevf_send_mbx_msg
selftests: Add duplicate config only for MD5 VRF tests
selftests: Fix raw socket bind tests with VRF
selftests: Fix IPv6 address bind tests
dmaengine: st_fdma: fix MODULE_ALIAS
net/sched: sch_ets: don't remove idle classes from the round-robin list
selftest/net/forwarding: declare NETIFS p9 p10
drm/ast: potential dereference of null pointer
mac80211: agg-tx: don't schedule_and_wake_txq() under sta->lock
mac80211: fix lookup when adding AddBA extension element
flow_offload: return EOPNOTSUPP for the unsupported mpls action type
rds: memory leak in __rds_conn_create()
drm/amd/pm: fix a potential gpu_metrics_table memory leak
mptcp: clear 'kern' flag from fallback sockets
soc/tegra: fuse: Fix bitwise vs. logical OR warning
igb: Fix removal of unicast MAC filters of VFs
igbvf: fix double free in `igbvf_probe`
igc: Fix typo in i225 LTR functions
ixgbe: Document how to enable NBASE-T support
ixgbe: set X550 MDIO speed before talking to PHY
netdevsim: Zero-initialize memory for new map's value in function nsim_bpf_map_alloc
net/packet: rx_owner_map depends on pg_vec
sfc_ef100: potential dereference of null pointer
net: Fix double 0x prefix print in SKB dump
net/smc: Prevent smc_release() from long blocking
net: systemport: Add global locking for descriptor lifecycle
sit: do not call ipip6_dev_free() from sit_init_net()
bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
USB: gadget: bRequestType is a bitfield, not a enum
Revert "usb: early: convert to readl_poll_timeout_atomic()"
KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES
tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous
USB: NO_LPM quirk Lenovo USB-C to Ethernet Adapher(RTL8153-04)
usb: dwc2: fix STM ID/VBUS detection startup delay in dwc2_driver_probe
PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
PCI/MSI: Mask MSI-X vectors only on success
usb: xhci: Extend support for runtime power management for AMD's Yellow carp.
USB: serial: cp210x: fix CP2105 GPIO registration
USB: serial: option: add Telit FN990 compositions
btrfs: fix memory leak in __add_inode_ref()
btrfs: fix double free of anon_dev after failure to create subvolume
zonefs: add MODULE_ALIAS_FS
iocost: Fix divide-by-zero on donation from low hweight cgroup
serial: 8250_fintek: Fix garbled text for console
timekeeping: Really make sure wall_to_monotonic isn't positive
libata: if T_LENGTH is zero, dma direction should be DMA_NONE
drm/amdgpu: correct register access for RLC_JUMP_TABLE_RESTORE
Input: touchscreen - avoid bitwise vs logical OR warning
ARM: dts: imx6ull-pinfunc: Fix CSI_DATA07__ESAI_TX0 pad name
xsk: Do not sleep in poll() when need_wakeup set
media: mxl111sf: change mutex_init() location
fuse: annotate lock in fuse_reverse_inval_entry()
ovl: fix warning in ovl_create_real()
scsi: scsi_debug: Don't call kcalloc() if size arg is zero
scsi: scsi_debug: Fix type in min_t to avoid stack OOB
scsi: scsi_debug: Sanity check block descriptor length in resp_mode_select()
rcu: Mark accesses to rcu_state.n_force_qs
bus: ti-sysc: Fix variable set but not used warning for reinit_modules
Revert "xsk: Do not sleep in poll() when need_wakeup set"
xen/blkfront: harden blkfront against event channel storms
xen/netfront: harden netfront against event channel storms
xen/console: harden hvc_xen against event channel storms
xen/netback: fix rx queue stall detection
xen/netback: don't queue unlimited number of packages
Linux 5.10.88
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I39275293003563850699f749101fab2843299abc
[ Upstream commit 5f25e71e31 ]
This is not an unrecoverable situation. Users of kvm_read_guest_offset_cached
and kvm_write_guest_offset_cached must expect the read/write to fail, and
therefore it is possible to just return early with an error value.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 5.10.84
NFSv42: Fix pagecache invalidation after COPY/CLONE
can: j1939: j1939_tp_cmd_recv(): check the dst address of TP.CM_BAM
ovl: simplify file splice
ovl: fix deadlock in splice write
gfs2: release iopen glock early in evict
gfs2: Fix length of holes reported at end-of-file
powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory"
drm/sun4i: fix unmet dependency on RESET_CONTROLLER for PHY_SUN6I_MIPI_DPHY
mac80211: do not access the IV when it was stripped
net/smc: Transfer remaining wait queue entries during fallback
atlantic: Fix OOB read and write in hw_atl_utils_fw_rpc_wait
net: return correct error code
platform/x86: thinkpad_acpi: Add support for dual fan control
platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep
s390/setup: avoid using memblock_enforce_memory_limit
btrfs: check-integrity: fix a warning on write caching disabled disk
thermal: core: Reset previous low and high trip during thermal zone init
scsi: iscsi: Unblock session then wake up error handler
drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
drm/amd/amdgpu: fix potential memleak
ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile
ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port()
ipv6: check return value of ipv6_skip_exthdr
net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound
net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
perf inject: Fix ARM SPE handling
perf hist: Fix memory leak of a perf_hpp_fmt
perf report: Fix memory leaks around perf_tip()
net/smc: Avoid warning of possible recursive locking
ACPI: Add stubs for wakeup handler functions
vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit
kprobes: Limit max data_size of the kretprobe instances
rt2x00: do not mark device gone on EPROTO errors during start
ipmi: Move remove_work to dedicated workqueue
cpufreq: Fix get_cpu_device() failure in add_cpu_dev_symlink()
s390/pci: move pseudo-MMIO to prevent MIO overlap
fget: check that the fd still exists after getting a ref to it
sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl
sata_fsl: fix warning in remove_proc_entry when rmmod sata_fsl
ipv6: fix memory leak in fib6_rule_suppress
drm/amd/display: Allow DSC on supported MST branch devices
KVM: Disallow user memslot with size that exceeds "unsigned long"
KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUEST
KVM: x86: Use a stable condition around all VT-d PI paths
KVM: arm64: Avoid setting the upper 32 bits of TCR_EL2 and CPTR_EL2 to 1
KVM: X86: Use vcpu->arch.walk_mmu for kvm_mmu_invlpg()
tracing/histograms: String compares should not care about signed values
wireguard: selftests: increase default dmesg log size
wireguard: allowedips: add missing __rcu annotation to satisfy sparse
wireguard: selftests: actually test for routing loops
wireguard: selftests: rename DEBUG_PI_LIST to DEBUG_PLIST
wireguard: device: reset peer src endpoint when netns exits
wireguard: receive: use ring buffer for incoming handshakes
wireguard: receive: drop handshakes if queue lock is contended
wireguard: ratelimiter: use kvcalloc() instead of kvzalloc()
i2c: stm32f7: flush TX FIFO upon transfer errors
i2c: stm32f7: recover the bus on access timeout
i2c: stm32f7: stop dma transfer in case of NACK
i2c: cbus-gpio: set atomic transfer callback
natsemi: xtensa: fix section mismatch warnings
tcp: fix page frag corruption on page fault
net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings()
net: mpls: Fix notifications when deleting a device
siphash: use _unaligned version by default
arm64: ftrace: add missing BTIs
net/mlx4_en: Fix an use-after-free bug in mlx4_en_try_alloc_resources()
selftests: net: Correct case name
mt76: mt7915: fix NULL pointer dereference in mt7915_get_phy_mode
ASoC: tegra: Fix wrong value type in ADMAIF
ASoC: tegra: Fix wrong value type in I2S
ASoC: tegra: Fix wrong value type in DMIC
ASoC: tegra: Fix wrong value type in DSPK
ASoC: tegra: Fix kcontrol put callback in ADMAIF
ASoC: tegra: Fix kcontrol put callback in I2S
ASoC: tegra: Fix kcontrol put callback in DMIC
ASoC: tegra: Fix kcontrol put callback in DSPK
ASoC: tegra: Fix kcontrol put callback in AHUB
rxrpc: Fix rxrpc_peer leak in rxrpc_look_up_bundle()
rxrpc: Fix rxrpc_local leak in rxrpc_lookup_peer()
ALSA: intel-dsp-config: add quirk for CML devices based on ES8336 codec
net: usb: lan78xx: lan78xx_phy_init(): use PHY_POLL instead of "0" if no IRQ is available
net: marvell: mvpp2: Fix the computation of shared CPUs
dpaa2-eth: destroy workqueue at the end of remove function
net: annotate data-races on txq->xmit_lock_owner
ipv4: convert fib_num_tclassid_users to atomic_t
net/smc: fix wrong list_del in smc_lgr_cleanup_early
net/rds: correct socket tunable error in rds_tcp_tune()
net/smc: Keep smc_close_final rc during active close
drm/msm/a6xx: Allocate enough space for GMU registers
drm/msm: Do hw_init() before capturing GPU state
atlantic: Increase delay for fw transactions
atlatnic: enable Nbase-t speeds with base-t
atlantic: Fix to display FW bundle version instead of FW mac version.
atlantic: Add missing DIDs and fix 115c.
Remove Half duplex mode speed capabilities.
atlantic: Fix statistics logic for production hardware
atlantic: Remove warn trace message.
KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register
KVM: VMX: Set failure code in prepare_vmcs02()
x86/sev: Fix SEV-ES INS/OUTS instructions for word, dword, and qword
x86/entry: Use the correct fence macro after swapgs in kernel CR3
x86/xen: Add xenpv_restore_regs_and_return_to_usermode()
sched/uclamp: Fix rq->uclamp_max not set on first enqueue
x86/pv: Switch SWAPGS to ALTERNATIVE
x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()
parisc: Fix KBUILD_IMAGE for self-extracting kernel
parisc: Fix "make install" on newer debian releases
vgacon: Propagate console boot parameters before calling `vc_resize'
xhci: Fix commad ring abort, write all 64 bits to CRCR register.
USB: NO_LPM quirk Lenovo Powered USB-C Travel Hub
usb: typec: tcpm: Wait in SNK_DEBOUNCED until disconnect
x86/tsc: Add a timer to make sure TSC_adjust is always checked
x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
x86/64/mm: Map all kernel memory into trampoline_pgd
tty: serial: msm_serial: Deactivate RX DMA for polling support
serial: pl011: Add ACPI SBSA UART match id
serial: tegra: Change lower tolerance baud rate limit for tegra20 and tegra30
serial: core: fix transmit-buffer reset and memleak
serial: 8250_pci: Fix ACCES entries in pci_serial_quirks array
serial: 8250_pci: rewrite pericom_do_set_divisor()
serial: 8250: Fix RTS modem control while in rs485 mode
iwlwifi: mvm: retry init flow if failed
parisc: Mark cr16 CPU clocksource unstable on all SMP machines
net/tls: Fix authentication failure in CCM mode
ipmi: msghandler: Make symbol 'remove_work_wq' static
Linux 5.10.84
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I90caaa6bd343e4180abcf8904a06c7ccc7b7b582
commit 6b285a5587 upstream.
Reject userspace memslots whose size exceeds the storage capacity of an
"unsigned long". KVM's uAPI takes the size as u64 to support large slots
on 64-bit hosts, but does not account for the size being truncated on
32-bit hosts in various flows. The access_ok() check on the userspace
virtual address in particular casts the size to "unsigned long" and will
check the wrong number of bytes.
KVM doesn't actually support slots whose size doesn't fit in an "unsigned
long", e.g. KVM's internal kvm_memory_slot.npages is an "unsigned long",
not a "u64", and misc arch specific code follows that behavior.
Fixes: fa3d315a4c ("KVM: Validate userspace_addr of memslot when registered")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <20211104002531.1176691-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.10.72
spi: rockchip: handle zero length transfers without timing out
platform/x86: touchscreen_dmi: Add info for the Chuwi HiBook (CWI514) tablet
platform/x86: touchscreen_dmi: Update info for the Chuwi Hi10 Plus (CWI527) tablet
nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN
btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling
btrfs: fix mount failure due to past and transient device flush error
net: mdio: introduce a shutdown method to mdio device drivers
xen-netback: correct success/error reporting for the SKB-with-fraglist case
sparc64: fix pci_iounmap() when CONFIG_PCI is not set
ext2: fix sleeping in atomic bugs on error
scsi: sd: Free scsi_disk device via put_device()
usb: testusb: Fix for showing the connection speed
usb: dwc2: check return value after calling platform_get_resource()
habanalabs/gaudi: fix LBW RR configuration
selftests: be sure to make khdr before other targets
selftests:kvm: fix get_warnings_count() ignoring fscanf() return warn
nvme-fc: update hardware queues before using them
nvme-fc: avoid race between time out and tear down
thermal/drivers/tsens: Fix wrong check for tzd in irq handlers
scsi: ses: Retry failed Send/Receive Diagnostic commands
irqchip/gic: Work around broken Renesas integration
smb3: correct smb3 ACL security descriptor
tools/vm/page-types: remove dependency on opt_file for idle page tracking
selftests: KVM: Align SMCCC call with the spec in steal_time
KVM: do not shrink halt_poll_ns below grow_start
kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[]
KVM: x86: nSVM: restore int_vector in svm_clear_vintr
perf/x86: Reset destroy callback on event init failure
libata: Add ATA_HORKAGE_NO_NCQ_ON_ATI for Samsung 860 and 870 SSD.
Linux 5.10.72
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I842a64f2a02b26c46b6d1151fe343497854c6300
[ Upstream commit ae232ea460 ]
grow_halt_poll_ns() ignores values between 0 and
halt_poll_ns_grow_start (10000 by default). However,
when we shrink halt_poll_ns we may fall way below
halt_poll_ns_grow_start and endup with halt_poll_ns
values that don't make a lot of sense: like 1 or 9,
or 19.
VCPU1 trace (halt_poll_ns_shrink equals 2):
VCPU1 grow 10000
VCPU1 shrink 5000
VCPU1 shrink 2500
VCPU1 shrink 1250
VCPU1 shrink 625
VCPU1 shrink 312
VCPU1 shrink 156
VCPU1 shrink 78
VCPU1 shrink 39
VCPU1 shrink 19
VCPU1 shrink 9
VCPU1 shrink 4
Mirror what grow_halt_poll_ns() does and set halt_poll_ns
to 0 as soon as new shrink-ed halt_poll_ns value falls
below halt_poll_ns_grow_start.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210902031100.252080-1-senozhatsky@chromium.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 85cd39af14 upstream.
KVM creates a debugfs directory for each VM in order to store statistics
about the virtual machine. The directory name is built from the process
pid and a VM fd. While generally unique, it is possible to keep a
file descriptor alive in a way that causes duplicate directories, which
manifests as these messages:
[ 471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present!
Even though this should not happen in practice, it is more or less
expected in the case of KVM for testcases that call KVM_CREATE_VM and
close the resulting file descriptor repeatedly and in parallel.
When this happens, debugfs_create_dir() returns an error but
kvm_create_vm_debugfs() goes on to allocate stat data structs which are
later leaked. The slow memory leak was spotted by syzkaller, where it
caused OOM reports.
Since the issue only affects debugfs, do a lookup before calling
debugfs_create_dir, so that the message is downgraded and rate-limited.
While at it, ensure kvm->debugfs_dentry is NULL rather than an error
if it is not created. This fixes kvm_destroy_vm_debugfs, which was not
checking IS_ERR_OR_NULL correctly.
Cc: stable@vger.kernel.org
Fixes: 536a6f88c4 ("KVM: Create debugfs dir and stat files for each VM")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8750f9bbda upstream.
The arguments to the KVM_CLEAR_DIRTY_LOG ioctl include a pointer,
therefore it needs a compat ioctl implementation. Otherwise,
32-bit userspace fails to invoke it on 64-bit kernels; for x86
it might work fine by chance if the padding is zero, but not
on big-endian architectures.
Reported-by: Thomas Sattler
Cc: stable@vger.kernel.org
Fixes: 2a31b9db15 ("kvm: introduce manual dirty log reprotect")
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8be156be1 upstream.
It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.
Fix this by only taking a reference on valid pages if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).
This addresses CVE-2021-22543.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d3c4c7938 upstream.
Abort the walk of coalesced MMIO zones if kvm_io_bus_unregister_dev()
fails to allocate memory for the new instance of the bus. If it can't
instantiate a new bus, unregister_dev() destroys all devices _except_ the
target device. But, it doesn't tell the caller that it obliterated the
bus and invoked the destructor for all devices that were on the bus. In
the coalesced MMIO case, this can result in a deleted list entry
dereference due to attempting to continue iterating on coalesced_zones
after future entries (in the walk) have been deleted.
Opportunistically add curly braces to the for-loop, which encompasses
many lines but sneaks by without braces due to the guts being a single
if statement.
Fixes: f65886606c ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable@vger.kernel.org
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210412222050.876100-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ee3757424 upstream.
If allocating a new instance of an I/O bus fails when unregistering a
device, wait to destroy the device until after all readers are guaranteed
to see the new null bus. Destroying devices before the bus is nullified
could lead to use-after-free since readers expect the devices on their
reference of the bus to remain valid.
Fixes: f65886606c ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210412222050.876100-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9545779ee upstream.
Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving
a so called "remapped" hva/pfn pair. In theory, the hva could resolve to
a pfn in high memory on a 32-bit kernel.
This bug was inadvertantly exposed by commit bd2fae8da7 ("KVM: do not
assume PTE is writable after follow_pfn"), which added an error PFN value
to the mix, causing gcc to comlain about overflowing the unsigned long.
arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function ‘hva_to_pfn_remapped’:
include/linux/kvm_host.h:89:30: error: conversion from ‘long long unsigned int’
to ‘long unsigned int’ changes value from
‘9218868437227405314’ to ‘2’ [-Werror=overflow]
89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2)
| ^
virt/kvm/kvm_main.c:1935:9: note: in expansion of macro ‘KVM_PFN_ERR_RO_FAULT’
Cc: stable@vger.kernel.org
Fixes: add6a0cd1c ("KVM: MMU: try to fix up page faults before giving up")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210208201940.1258328-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9fd6dad126 upstream.
Currently, the follow_pfn function is exported for modules but
follow_pte is not. However, follow_pfn is very easy to misuse,
because it does not provide protections (so most of its callers
assume the page is writable!) and because it returns after having
already unlocked the page table lock.
Provide instead a simplified version of follow_pte that does
not have the pmdpp and range arguments. The older version
survives as follow_invalidate_pte() for use by fs/dax.c.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd2fae8da7 upstream.
In order to convert an HVA to a PFN, KVM usually tries to use
the get_user_pages family of functinso. This however is not
possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
In doing this however KVM loses the information on whether the
PFN is writable. That is usually not a problem because the main
use of VM_IO vmas with KVM is for BARs in PCI device assignment,
however it is a bug. To fix it, use follow_pte and check pte_write
while under the protection of the PTE lock. The information can
be used to fail hva_to_pfn_remapped or passed back to the
caller via *writable.
Usage of follow_pfn was introduced in commit add6a0cd1c ("KVM: MMU: try to fix
up page faults before giving up", 2016-07-05); however, even older version
have the same issue, all the way back to commit 2e2e3738af ("KVM:
Handle vma regions with no backing page", 2008-07-20), as they also did
not check whether the PFN was writable.
Fixes: 2e2e3738af ("KVM: Handle vma regions with no backing page")
Reported-by: David Stevens <stevensd@google.com>
Cc: 3pvd@google.com
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 88bf56d04b upstream.
In kvm_mmu_notifier_invalidate_range_start(), tlbs_dirty is used as:
need_tlb_flush |= kvm->tlbs_dirty;
with need_tlb_flush's type being int and tlbs_dirty's type being long.
It means that tlbs_dirty is always used as int and the higher 32 bits
is useless. We need to check tlbs_dirty in a correct way and this
change checks it directly without propagating it to need_tlb_flush.
Note: it's _extremely_ unlikely this neglecting of higher 32 bits can
cause problems in practice. It would require encountering tlbs_dirty
on a 4 billion count boundary, and KVM would need to be using shadow
paging or be running a nested guest.
Cc: stable@vger.kernel.org
Fixes: a4ee1ca4a3 ("KVM: MMU: delay flush all tlbs on sync_page path")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20201217154118.16497-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make use of the struct_size() helper to avoid any potential type
mistakes and protect against potential integer overflows
Make use of the flex_array_size() helper to calculate the size of a
flexible array member within an enclosing structure
Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
Message-Id: <20200918120500.954436-1-rkovhaev@gmail.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM/arm64 fixes for Linux 5.9, take #1
- Multiple stolen time fixes, with a new capability to match x86
- Fix for hugetlbfs mappings when PUD and PMD are the same level
- Fix for hugetlbfs mappings when PTE mappings are enforced
(dirty logging, for example)
- Fix tracing output of 64bit values
The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.
Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.
Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-2-will@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Merge more updates from Andrew Morton:
- most of the rest of MM (memcg, hugetlb, vmscan, proc, compaction,
mempolicy, oom-kill, hugetlbfs, migration, thp, cma, util,
memory-hotplug, cleanups, uaccess, migration, gup, pagemap),
- various other subsystems (alpha, misc, sparse, bitmap, lib, bitops,
checkpatch, autofs, minix, nilfs, ufs, fat, signals, kmod, coredump,
exec, kdump, rapidio, panic, kcov, kgdb, ipc).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (164 commits)
mm/gup: remove task_struct pointer for all gup code
mm: clean up the last pieces of page fault accountings
mm/xtensa: use general page fault accounting
mm/x86: use general page fault accounting
mm/sparc64: use general page fault accounting
mm/sparc32: use general page fault accounting
mm/sh: use general page fault accounting
mm/s390: use general page fault accounting
mm/riscv: use general page fault accounting
mm/powerpc: use general page fault accounting
mm/parisc: use general page fault accounting
mm/openrisc: use general page fault accounting
mm/nios2: use general page fault accounting
mm/nds32: use general page fault accounting
mm/mips: use general page fault accounting
mm/microblaze: use general page fault accounting
mm/m68k: use general page fault accounting
mm/ia64: use general page fault accounting
mm/hexagon: use general page fault accounting
mm/csky: use general page fault accounting
...
Pull virtio updates from Michael Tsirkin:
- IRQ bypass support for vdpa and IFC
- MLX5 vdpa driver
- Endianness fixes for virtio drivers
- Misc other fixes
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (71 commits)
vdpa/mlx5: fix up endian-ness for mtu
vdpa: Fix pointer math bug in vdpasim_get_config()
vdpa/mlx5: Fix pointer math in mlx5_vdpa_get_config()
vdpa/mlx5: fix memory allocation failure checks
vdpa/mlx5: Fix uninitialised variable in core/mr.c
vdpa_sim: init iommu lock
virtio_config: fix up warnings on parisc
vdpa/mlx5: Add VDPA driver for supported mlx5 devices
vdpa/mlx5: Add shared memory registration code
vdpa/mlx5: Add support library for mlx5 VDPA implementation
vdpa/mlx5: Add hardware descriptive header file
vdpa: Modify get_vq_state() to return error code
net/vdpa: Use struct for set/get vq state
vdpa: remove hard coded virtq num
vdpasim: support batch updating
vhost-vdpa: support IOTLB batching hints
vhost-vdpa: support get/set backend features
vhost: generialize backend features setting/getting
vhost-vdpa: refine ioctl pre-processing
vDPA: dont change vq irq after DRIVER_OK
...
Pull locking updates from Thomas Gleixner:
"A set of locking fixes and updates:
- Untangle the header spaghetti which causes build failures in
various situations caused by the lockdep additions to seqcount to
validate that the write side critical sections are non-preemptible.
- The seqcount associated lock debug addons which were blocked by the
above fallout.
seqcount writers contrary to seqlock writers must be externally
serialized, which usually happens via locking - except for strict
per CPU seqcounts. As the lock is not part of the seqcount, lockdep
cannot validate that the lock is held.
This new debug mechanism adds the concept of associated locks.
sequence count has now lock type variants and corresponding
initializers which take a pointer to the associated lock used for
writer serialization. If lockdep is enabled the pointer is stored
and write_seqcount_begin() has a lockdep assertion to validate that
the lock is held.
Aside of the type and the initializer no other code changes are
required at the seqcount usage sites. The rest of the seqcount API
is unchanged and determines the type at compile time with the help
of _Generic which is possible now that the minimal GCC version has
been moved up.
Adding this lockdep coverage unearthed a handful of seqcount bugs
which have been addressed already independent of this.
While generally useful this comes with a Trojan Horse twist: On RT
kernels the write side critical section can become preemtible if
the writers are serialized by an associated lock, which leads to
the well known reader preempts writer livelock. RT prevents this by
storing the associated lock pointer independent of lockdep in the
seqcount and changing the reader side to block on the lock when a
reader detects that a writer is in the write side critical section.
- Conversion of seqcount usage sites to associated types and
initializers"
* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
locking/seqlock, headers: Untangle the spaghetti monster
locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header
x86/headers: Remove APIC headers from <asm/smp.h>
seqcount: More consistent seqprop names
seqcount: Compress SEQCNT_LOCKNAME_ZERO()
seqlock: Fold seqcount_LOCKNAME_init() definition
seqlock: Fold seqcount_LOCKNAME_t definition
seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
hrtimer: Use sequence counter with associated raw spinlock
kvm/eventfd: Use sequence counter with associated spinlock
userfaultfd: Use sequence counter with associated spinlock
NFSv4: Use sequence counter with associated spinlock
iocost: Use sequence counter with associated spinlock
raid5: Use sequence counter with associated spinlock
vfs: Use sequence counter with associated spinlock
timekeeping: Use sequence counter with associated raw spinlock
xfrm: policy: Use sequence counters with associated lock
netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
netfilter: conntrack: Use sequence counter with associated spinlock
sched: tasks: Use sequence counter with associated spinlock
...
Pull KVM updates from Paolo Bonzini:
"s390:
- implement diag318
x86:
- Report last CPU for debugging
- Emulate smaller MAXPHYADDR in the guest than in the host
- .noinstr and tracing fixes from Thomas
- nested SVM page table switching optimization and fixes
Generic:
- Unify shadow MMU cache data structures across architectures"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
KVM: SVM: Fix sev_pin_memory() error handling
KVM: LAPIC: Set the TDCR settable bits
KVM: x86: Specify max TDP level via kvm_configure_mmu()
KVM: x86/mmu: Rename max_page_level to max_huge_page_level
KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR
KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch
KVM: x86: Pull the PGD's level from the MMU instead of recalculating it
KVM: VMX: Make vmx_load_mmu_pgd() static
KVM: x86/mmu: Add separate helper for shadow NPT root page role calc
KVM: VMX: Drop a duplicate declaration of construct_eptp()
KVM: nSVM: Correctly set the shadow NPT root level in its MMU role
KVM: Using macros instead of magic values
MIPS: KVM: Fix build error caused by 'kvm_run' cleanup
KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF
KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
KVM: VMX: Add guest physical address check in EPT violation and misconfig
KVM: VMX: introduce vmx_need_pf_intercept
KVM: x86: update exception bitmap on CPUID changes
KVM: x86: rename update_bp_intercept to update_exception_bitmap
...
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/20200720155530.1173732-24-a.darwish@linutronix.de
Entering a guest is similar to exiting to user space. Pending work like
handling signals, rescheduling, task work etc. needs to be handled before
that.
Provide generic infrastructure to avoid duplication of the same handling
code all over the place.
The transfer to guest mode handling is different from the exit to usermode
handling, e.g. vs. rseq and live patching, so a separate function is used.
The initial list of work items handled is:
TIF_SIGPENDING, TIF_NEED_RESCHED, TIF_NOTIFY_RESUME
Architecture specific TIF flags can be added via defines in the
architecture specific include files.
The calling convention is also different from the syscall/interrupt entry
functions as KVM invokes this from the outer vcpu_run() loop with
interrupts and preemption enabled. To prevent missing a pending work item
it invokes a check for pending TIF work from interrupt disabled code right
before transitioning to guest mode. The lockdep, RCU and tracing state
handling is also done directly around the switch to and from guest mode.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200722220519.833296398@linutronix.de
OVMF booted guest running on shadow pages crashes on TRIPLE FAULT after
enabling paging from SMM. The crash is triggered from mmu_check_root() and
is caused by kvm_is_visible_gfn() searching through memslots with as_id = 0
while vCPU may be in a different context (address space).
Introduce kvm_vcpu_is_visible_gfn() and use it from mmu_check_root().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200708140023.1476020-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Unlike normal 'int' functions returning '0' on success, kvm_setup_async_pf()/
kvm_arch_setup_async_pf() return '1' when a job to handle page fault
asynchronously was scheduled and '0' otherwise. To avoid the confusion
change return type to 'bool'.
No functional change intended.
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200615121334.91300-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sparse complains on a call to get_compat_sigset, fix it. The "if"
right above explains that sigmask_arg->sigset is basically a
compat_sigset_t.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull more KVM updates from Paolo Bonzini:
"The guest side of the asynchronous page fault work has been delayed to
5.9 in order to sync with Thomas's interrupt entry rework, but here's
the rest of the KVM updates for this merge window.
MIPS:
- Loongson port
PPC:
- Fixes
ARM:
- Fixes
x86:
- KVM_SET_USER_MEMORY_REGION optimizations
- Fixes
- Selftest fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (62 commits)
KVM: x86: do not pass poisoned hva to __kvm_set_memory_region
KVM: selftests: fix sync_with_host() in smm_test
KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected
KVM: async_pf: Cleanup kvm_setup_async_pf()
kvm: i8254: remove redundant assignment to pointer s
KVM: x86: respect singlestep when emulating instruction
KVM: selftests: Don't probe KVM_CAP_HYPERV_ENLIGHTENED_VMCS when nested VMX is unsupported
KVM: selftests: do not substitute SVM/VMX check with KVM_CAP_NESTED_STATE check
KVM: nVMX: Consult only the "basic" exit reason when routing nested exit
KVM: arm64: Move hyp_symbol_addr() to kvm_asm.h
KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts
KVM: arm64: Remove host_cpu_context member from vcpu structure
KVM: arm64: Stop sparse from moaning at __hyp_this_cpu_ptr
KVM: arm64: Handle PtrAuth traps early
KVM: x86: Unexport x86_fpu_cache and make it static
KVM: selftests: Ignore KVM 5-level paging support for VM_MODE_PXXV48_4K
KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
KVM: arm64: Stop save/restoring ACTLR_EL1
KVM: arm64: Add emulation for 32bit guests accessing ACTLR2
...
'Page not present' event may or may not get injected depending on
guest's state. If the event wasn't injected, there is no need to
inject the corresponding 'page ready' event as the guest may get
confused. E.g. Linux thinks that the corresponding 'page not present'
event wasn't delivered *yet* and allocates a 'dummy entry' for it.
This entry is never freed.
Note, 'wakeup all' events have no corresponding 'page not present'
event and always get injected.
s390 seems to always be able to inject 'page not present', the
change is effectively a nop.
Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610175532.779793-2-vkuznets@redhat.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=208081
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
schedule_work() returns 'false' only when the work is already on the queue
and this can't happen as kvm_setup_async_pf() always allocates a new one.
Also, to avoid potential race, it makes sense to to schedule_work() at the
very end after we've added it to the queue.
While on it, do some minor cleanup. gfn_to_pfn_async() mentioned in a
comment does not currently exist and, moreover, we can check
kvm_is_error_hva() at the very beginning, before we try to allocate work so
'retry_sync' label can go away completely.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610175532.779793-1-vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>