Changes in 5.10.83
bpf: Fix toctou on read-only map's constant scalar tracking
ACPI: Get acpi_device's parent from the parent field
USB: serial: option: add Telit LE910S1 0x9200 composition
USB: serial: option: add Fibocom FM101-GL variants
usb: dwc2: gadget: Fix ISOC flow for elapsed frames
usb: dwc2: hcd_queue: Fix use of floating point literal
usb: dwc3: gadget: Ignore NoStream after End Transfer
usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer
usb: dwc3: gadget: Fix null pointer exception
net: nexthop: fix null pointer dereference when IPv6 is not enabled
usb: chipidea: ci_hdrc_imx: fix potential error pointer dereference in probe
usb: typec: fusb302: Fix masking of comparator and bc_lvl interrupts
usb: hub: Fix usb enumeration issue due to address0 race
usb: hub: Fix locking issues with address0_mutex
binder: fix test regression due to sender_euid change
ALSA: ctxfi: Fix out-of-range access
ALSA: hda/realtek: Add quirk for ASRock NUC Box 1100
ALSA: hda/realtek: Fix LED on HP ProBook 435 G7
media: cec: copy sequence field for the reply
Revert "parisc: Fix backtrace to always include init funtion names"
HID: wacom: Use "Confidence" flag to prevent reporting invalid contacts
staging/fbtft: Fix backlight
staging: greybus: Add missing rwsem around snd_ctl_remove() calls
staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect()
fuse: release pipe buf after last use
xen: don't continue xenstore initialization in case of errors
xen: detect uninitialized xenbus in xenbus_init
KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
tracing/uprobe: Fix uprobe_perf_open probes iteration
tracing: Fix pid filtering when triggers are attached
mmc: sdhci-esdhc-imx: disable CMDQ support
mmc: sdhci: Fix ADMA for PAGE_SIZE >= 64KiB
mdio: aspeed: Fix "Link is Down" issue
powerpc/32: Fix hardlockup on vmap stack overflow
PCI: aardvark: Deduplicate code in advk_pcie_rd_conf()
PCI: aardvark: Update comment about disabling link training
PCI: aardvark: Implement re-issuing config requests on CRS response
PCI: aardvark: Simplify initialization of rootcap on virtual bridge
PCI: aardvark: Fix link training
proc/vmcore: fix clearing user buffer by properly using clear_user()
netfilter: ctnetlink: fix filtering with CTA_TUPLE_REPLY
netfilter: ctnetlink: do not erase error code with EINVAL
netfilter: ipvs: Fix reuse connection if RS weight is 0
netfilter: flowtable: fix IPv6 tunnel addr match
ARM: dts: BCM5301X: Fix I2C controller interrupt
ARM: dts: BCM5301X: Add interrupt properties to GPIO node
ARM: dts: bcm2711: Fix PCIe interrupts
ASoC: qdsp6: q6routing: Conditionally reset FrontEnd Mixer
ASoC: qdsp6: q6asm: fix q6asm_dai_prepare error handling
ASoC: topology: Add missing rwsem around snd_ctl_remove() calls
ASoC: codecs: wcd934x: return error code correctly from hw_params
net: ieee802154: handle iftypes as u32
firmware: arm_scmi: pm: Propagate return value to caller
NFSv42: Don't fail clone() unless the OP_CLONE operation failed
ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE
drm/nouveau/acr: fix a couple NULL vs IS_ERR() checks
scsi: mpt3sas: Fix kernel panic during drive powercycle test
drm/vc4: fix error code in vc4_create_object()
net: marvell: prestera: fix double free issue on err path
iavf: Prevent changing static ITR values if adaptive moderation is on
ALSA: intel-dsp-config: add quirk for JSL devices based on ES8336 codec
mptcp: fix delack timer
firmware: smccc: Fix check for ARCH_SOC_ID not implemented
ipv6: fix typos in __ip6_finish_output()
nfp: checking parameter process for rx-usecs/tx-usecs is invalid
net: stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume
net: stmmac: retain PTP clock time during SIOCSHWTSTAMP ioctls
net: ipv6: add fib6_nh_release_dsts stub
net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group
ice: fix vsi->txq_map sizing
ice: avoid bpf_prog refcount underflow
scsi: core: sysfs: Fix setting device state to SDEV_RUNNING
scsi: scsi_debug: Zero clear zones at reset write pointer
erofs: fix deadlock when shrink erofs slab
net/smc: Ensure the active closing peer first closes clcsock
mlxsw: Verify the accessed index doesn't exceed the array length
mlxsw: spectrum: Protect driver from buggy firmware
net: marvell: mvpp2: increase MTU limit when XDP enabled
nvmet-tcp: fix incomplete data digest send
net/ncsi : Add payload to be 32-bit aligned to fix dropped packets
PM: hibernate: use correct mode for swsusp_close()
drm/amd/display: Set plane update flags for all planes in reset
tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows
lan743x: fix deadlock in lan743x_phy_link_status_change()
net: phylink: Force link down and retrigger resolve on interface change
net: phylink: Force retrigger in case of latched link-fail indicator
net/smc: Fix NULL pointer dereferencing in smc_vlan_by_tcpsk()
net/smc: Fix loop in smc_listen
nvmet: use IOCB_NOWAIT only if the filesystem supports it
igb: fix netpoll exit with traffic
MIPS: loongson64: fix FTLB configuration
MIPS: use 3-level pgtable for 64KB page size on MIPS_VA_BITS_48
tls: splice_read: fix record type check
tls: fix replacing proto_ops
net/sched: sch_ets: don't peek at classes beyond 'nbands'
net: vlan: fix underflow for the real_dev refcnt
net/smc: Don't call clcsock shutdown twice when smc shutdown
net: hns3: fix VF RSS failed problem after PF enable multi-TCs
net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
tcp: correctly handle increased zerocopy args struct size
sched/scs: Reset task stack state in bringup_cpu()
f2fs: set SBI_NEED_FSCK flag when inconsistent node block found
ceph: properly handle statfs on multifs setups
smb3: do not error on fsync when readonly
iommu/amd: Clarify AMD IOMMUv2 initialization messages
vhost/vsock: fix incorrect used length reported to the guest
tracing: Check pid filtering when creating events
xen: sync include/xen/interface/io/ring.h with Xen's newest version
xen/blkfront: read response from backend only once
xen/blkfront: don't take local copy of a request from the ring page
xen/blkfront: don't trust the backend response data blindly
xen/netfront: read response from backend only once
xen/netfront: don't read data from request on the ring page
xen/netfront: disentangle tx_skb_freelist
xen/netfront: don't trust the backend response data blindly
tty: hvc: replace BUG_ON() with negative return value
s390/mm: validate VMA in PGSTE manipulation functions
shm: extend forced shm destroy to support objects from several IPC nses
net: stmmac: platform: fix build warning when with !CONFIG_PM_SLEEP
drm/amdgpu/gfx9: switch to golden tsc registers for renoir+
Linux 5.10.83
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I934dc727030cfb60b31525252df104436ff00ae0
[ Upstream commit dce1ca0525 ]
To hot unplug a CPU, the idle task on that CPU calls a few layers of C
code before finally leaving the kernel. When KASAN is in use, poisoned
shadow is left around for each of the active stack frames, and when
shadow call stacks are in use. When shadow call stacks (SCS) are in use
the task's saved SCS SP is left pointing at an arbitrary point within
the task's shadow call stack.
When a CPU is offlined than onlined back into the kernel, this stale
state can adversely affect execution. Stale KASAN shadow can alias new
stackframes and result in bogus KASAN warnings. A stale SCS SP is
effectively a memory leak, and prevents a portion of the shadow call
stack being used. Across a number of hotplug cycles the idle task's
entire shadow call stack can become unusable.
We previously fixed the KASAN issue in commit:
e1b77c9298 ("sched/kasan: remove stale KASAN poison after hotplug")
... by removing any stale KASAN stack poison immediately prior to
onlining a CPU.
Subsequently in commit:
f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
... the refactoring left the KASAN and SCS cleanup in one-time idle
thread initialization code rather than something invoked prior to each
CPU being onlined, breaking both as above.
We fixed SCS (but not KASAN) in commit:
63acd42c0d ("sched/scs: Reset the shadow stack when idle_task_exit")
... but as this runs in the context of the idle task being offlined it's
potentially fragile.
To fix these consistently and more robustly, reset the SCS SP and KASAN
shadow of a CPU's idle task immediately before we online that CPU in
bringup_cpu(). This ensures the idle task always has a consistent state
when it is running, and removes the need to so so when exiting an idle
task.
Whenever any thread is created, dup_task_struct() will give the task a
stack which is free of KASAN shadow, and initialize the task's SCS SP,
so there's no need to specially initialize either for idle thread within
init_idle(), as this was only necessary to handle hotplug cycles.
I've tested this on arm64 with:
* gcc 11.1.0, defconfig +KASAN_INLINE, KASAN_STACK
* clang 12.0.0, defconfig +KASAN_INLINE, KASAN_STACK, SHADOW_CALL_STACK
... offlining and onlining CPUS with:
| while true; do
| for C in /sys/devices/system/cpu/cpu*/online; do
| echo 0 > $C;
| echo 1 > $C;
| done
| done
Fixes: f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Link: https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 5.10.51
drm/mxsfb: Don't select DRM_KMS_FB_HELPER
drm/zte: Don't select DRM_KMS_FB_HELPER
drm/ast: Fixed CVE for DP501
drm/amd/display: fix HDCP reset sequence on reinitialize
drm/amd/amdgpu/sriov disable all ip hw status by default
drm/vc4: fix argument ordering in vc4_crtc_get_margins()
drm/bridge: nwl-dsi: Force a full modeset when crtc_state->active is changed to be true
net: pch_gbe: Use proper accessors to BE data in pch_ptp_match()
drm/amd/display: fix use_max_lb flag for 420 pixel formats
clk: renesas: rcar-usb2-clock-sel: Fix error handling in .probe()
hugetlb: clear huge pte during flush function on mips platform
atm: iphase: fix possible use-after-free in ia_module_exit()
mISDN: fix possible use-after-free in HFC_cleanup()
atm: nicstar: Fix possible use-after-free in nicstar_cleanup()
net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
drm/mediatek: Fix PM reference leak in mtk_crtc_ddp_hw_init()
net: mdio: ipq8064: add regmap config to disable REGCACHE
drm/bridge: lt9611: Add missing MODULE_DEVICE_TABLE
reiserfs: add check for invalid 1st journal block
drm/virtio: Fix double free on probe failure
net: mdio: provide shim implementation of devm_of_mdiobus_register
net/sched: cls_api: increase max_reclassify_loop
pinctrl: equilibrium: Add missing MODULE_DEVICE_TABLE
drm/scheduler: Fix hang when sched_entity released
drm/sched: Avoid data corruptions
udf: Fix NULL pointer dereference in udf_symlink function
drm/vc4: Fix clock source for VEC PixelValve on BCM2711
drm/vc4: hdmi: Fix PM reference leak in vc4_hdmi_encoder_pre_crtc_co()
e100: handle eeprom as little endian
igb: handle vlan types with checker enabled
igb: fix assignment on big endian machines
drm/bridge: cdns: Fix PM reference leak in cdns_dsi_transfer()
clk: renesas: r8a77995: Add ZA2 clock
net/mlx5e: IPsec/rep_tc: Fix rep_tc_update_skb drops IPsec packet
net/mlx5: Fix lag port remapping logic
drm: rockchip: add missing registers for RK3188
drm: rockchip: add missing registers for RK3066
net: stmmac: the XPCS obscures a potential "PHY not found" error
RDMA/rtrs: Change MAX_SESS_QUEUE_DEPTH
clk: tegra: Fix refcounting of gate clocks
clk: tegra: Ensure that PLLU configuration is applied properly
drm: bridge: cdns-mhdp8546: Fix PM reference leak in
virtio-net: Add validation for used length
ipv6: use prandom_u32() for ID generation
MIPS: cpu-probe: Fix FPU detection on Ingenic JZ4760(B)
MIPS: ingenic: Select CPU_SUPPORTS_CPUFREQ && MIPS_EXTERNAL_TIMER
drm/amd/display: Avoid HDCP over-read and corruption
drm/amdgpu: remove unsafe optimization to drop preamble ib
net: tcp better handling of reordering then loss cases
RDMA/cxgb4: Fix missing error code in create_qp()
dm space maps: don't reset space map allocation cursor when committing
dm writecache: don't split bios when overwriting contiguous cache content
dm: Fix dm_accept_partial_bio() relative to zone management commands
net: bridge: mrp: Update ring transitions.
pinctrl: mcp23s08: fix race condition in irq handler
ice: set the value of global config lock timeout longer
ice: fix clang warning regarding deadcode.DeadStores
virtio_net: Remove BUG() to avoid machine dead
net: mscc: ocelot: check return value after calling platform_get_resource()
net: bcmgenet: check return value after calling platform_get_resource()
net: mvpp2: check return value after calling platform_get_resource()
net: micrel: check return value after calling platform_get_resource()
net: moxa: Use devm_platform_get_and_ioremap_resource()
drm/amd/display: Fix DCN 3.01 DSCCLK validation
drm/amd/display: Update scaling settings on modeset
drm/amd/display: Release MST resources on switch from MST to SST
drm/amd/display: Set DISPCLK_MAX_ERRDET_CYCLES to 7
drm/amd/display: Fix off-by-one error in DML
net: phy: realtek: add delay to fix RXC generation issue
selftests: Clean forgotten resources as part of cleanup()
net: sgi: ioc3-eth: check return value after calling platform_get_resource()
drm/amdkfd: use allowed domain for vmbo validation
fjes: check return value after calling platform_get_resource()
selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC
r8169: avoid link-up interrupt issue on RTL8106e if user enables ASPM
drm/amd/display: Verify Gamma & Degamma LUT sizes in amdgpu_dm_atomic_check
xfrm: Fix error reporting in xfrm_state_construct.
dm writecache: commit just one block, not a full page
wlcore/wl12xx: Fix wl12xx get_mac error if device is in ELP
wl1251: Fix possible buffer overflow in wl1251_cmd_scan
cw1200: add missing MODULE_DEVICE_TABLE
drm/amdkfd: fix circular locking on get_wave_state
drm/amdkfd: Fix circular lock in nocpsch path
bpf: Fix up register-based shifts in interpreter to silence KUBSAN
ice: fix incorrect payload indicator on PTYPE
ice: mark PTYPE 2 as reserved
mt76: mt7615: fix fixed-rate tx status reporting
net: fix mistake path for netdev_features_strings
net: ipa: Add missing of_node_put() in ipa_firmware_load()
net: sched: fix error return code in tcf_del_walker()
io_uring: fix false WARN_ONCE
drm/amdgpu: fix bad address translation for sienna_cichlid
drm/amdkfd: Walk through list with dqm lock hold
mt76: mt7915: fix IEEE80211_HE_PHY_CAP7_MAX_NC for station mode
rtl8xxxu: Fix device info for RTL8192EU devices
MIPS: add PMD table accounting into MIPS'pmd_alloc_one
net: fec: add ndo_select_queue to fix TX bandwidth fluctuations
atm: nicstar: use 'dma_free_coherent' instead of 'kfree'
atm: nicstar: register the interrupt handler in the right place
vsock: notify server to shutdown when client has pending signal
RDMA/rxe: Don't overwrite errno from ib_umem_get()
iwlwifi: mvm: don't change band on bound PHY contexts
iwlwifi: mvm: fix error print when session protection ends
iwlwifi: pcie: free IML DMA memory allocation
iwlwifi: pcie: fix context info freeing
sfc: avoid double pci_remove of VFs
sfc: error code if SRIOV cannot be disabled
wireless: wext-spy: Fix out-of-bounds warning
cfg80211: fix default HE tx bitrate mask in 2G band
mac80211: consider per-CPU statistics if present
mac80211_hwsim: add concurrent channels scanning support over virtio
IB/isert: Align target max I/O size to initiator size
media, bpf: Do not copy more entries than user space requested
net: ip: avoid OOM kills with large UDP sends over loopback
RDMA/cma: Fix rdma_resolve_route() memory leak
Bluetooth: btusb: Fixed too many in-token issue for Mediatek Chip.
Bluetooth: Fix the HCI to MGMT status conversion table
Bluetooth: Fix alt settings for incoming SCO with transparent coding format
Bluetooth: Shutdown controller after workqueues are flushed or cancelled
Bluetooth: btusb: Add a new QCA_ROME device (0cf3:e500)
Bluetooth: L2CAP: Fix invalid access if ECRED Reconfigure fails
Bluetooth: L2CAP: Fix invalid access on ECRED Connection response
Bluetooth: btusb: Add support USB ALT 3 for WBS
Bluetooth: mgmt: Fix the command returns garbage parameter value
Bluetooth: btusb: fix bt fiwmare downloading failure issue for qca btsoc.
sched/fair: Ensure _sum and _avg values stay consistent
bpf: Fix false positive kmemleak report in bpf_ringbuf_area_alloc()
flow_offload: action should not be NULL when it is referenced
sctp: validate from_addr_param return
sctp: add size validation when walking chunks
MIPS: loongsoon64: Reserve memory below starting pfn to prevent Oops
MIPS: set mips32r5 for virt extensions
selftests/resctrl: Fix incorrect parsing of option "-t"
MIPS: MT extensions are not available on MIPS32r1
ath11k: unlock on error path in ath11k_mac_op_add_interface()
arm64: dts: rockchip: add rk3328 dwc3 usb controller node
arm64: dts: rockchip: Enable USB3 for rk3328 Rock64
loop: fix I/O error on fsync() in detached loop devices
mm,hwpoison: return -EBUSY when migration fails
io_uring: simplify io_remove_personalities()
io_uring: Convert personality_idr to XArray
io_uring: convert io_buffer_idr to XArray
scsi: iscsi: Fix race condition between login and sync thread
scsi: iscsi: Fix iSCSI cls conn state
powerpc/mm: Fix lockup on kernel exec fault
powerpc/barrier: Avoid collision with clang's __lwsync macro
powerpc/powernv/vas: Release reference to tgid during window close
drm/amdgpu: Update NV SIMD-per-CU to 2
drm/amdgpu: enable sdma0 tmz for Raven/Renoir(V2)
drm/radeon: Add the missed drm_gem_object_put() in radeon_user_framebuffer_create()
drm/radeon: Call radeon_suspend_kms() in radeon_pci_shutdown() for Loongson64
drm/vc4: txp: Properly set the possible_crtcs mask
drm/vc4: crtc: Skip the TXP
drm/vc4: hdmi: Prevent clock unbalance
drm/dp: Handle zeroed port counts in drm_dp_read_downstream_info()
drm/rockchip: dsi: remove extra component_del() call
drm/amd/display: fix incorrrect valid irq check
pinctrl/amd: Add device HID for new AMD GPIO controller
drm/amd/display: Reject non-zero src_y and src_x for video planes
drm/tegra: Don't set allow_fb_modifiers explicitly
drm/msm/mdp4: Fix modifier support enabling
drm/arm/malidp: Always list modifiers
drm/nouveau: Don't set allow_fb_modifiers explicitly
drm/i915/display: Do not zero past infoframes.vsc
mmc: sdhci-acpi: Disable write protect detection on Toshiba Encore 2 WT8-B
mmc: sdhci: Fix warning message when accessing RPMB in HS400 mode
mmc: core: clear flags before allowing to retune
mmc: core: Allow UHS-I voltage switch for SDSC cards if supported
ata: ahci_sunxi: Disable DIPM
arm64: tlb: fix the TTL value of tlb_get_level
cpu/hotplug: Cure the cpusets trainwreck
clocksource/arm_arch_timer: Improve Allwinner A64 timer workaround
fpga: stratix10-soc: Add missing fpga_mgr_free() call
ASoC: tegra: Set driver_name=tegra for all machine drivers
i40e: fix PTP on 5Gb links
qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute
ipmi/watchdog: Stop watchdog timer when the current action is 'none'
thermal/drivers/int340x/processor_thermal: Fix tcc setting
ubifs: Fix races between xattr_{set|get} and listxattr operations
power: supply: ab8500: Fix an old bug
mfd: syscon: Free the allocated name field of struct regmap_config
nvmem: core: add a missing of_node_put
lkdtm/bugs: XFAIL UNALIGNED_LOAD_STORE_WRITE
selftests/lkdtm: Fix expected text for CR4 pinning
extcon: intel-mrfld: Sync hardware and software state on init
seq_buf: Fix overflow in seq_buf_putmem_hex()
rq-qos: fix missed wake-ups in rq_qos_throttle try two
tracing: Simplify & fix saved_tgids logic
tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT
ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe
coresight: Propagate symlink failure
coresight: tmc-etf: Fix global-out-of-bounds in tmc_update_etf_buffer()
dm zoned: check zone capacity
dm writecache: flush origin device when writing and cache is full
dm btree remove: assign new_root only when removal succeeds
PCI: Leave Apple Thunderbolt controllers on for s2idle or standby
PCI: aardvark: Fix checking for PIO Non-posted Request
PCI: aardvark: Implement workaround for the readback value of VEND_ID
media: subdev: disallow ioctl for saa6588/davinci
media: dtv5100: fix control-request directions
media: zr364xx: fix memory leak in zr364xx_start_readpipe
media: gspca/sq905: fix control-request direction
media: gspca/sunplus: fix zero-length control requests
media: uvcvideo: Fix pixel format change for Elgato Cam Link 4K
io_uring: fix clear IORING_SETUP_R_DISABLED in wrong function
dm writecache: write at least 4k when committing
pinctrl: mcp23s08: Fix missing unlock on error in mcp23s08_irq()
drm/ast: Remove reference to struct drm_device.pdev
jfs: fix GPF in diFree
smackfs: restrict bytes count in smk_set_cipso()
ext4: fix memory leak in ext4_fill_super
f2fs: fix to avoid racing on fsync_entry_slab by multi filesystem instances
Linux 5.10.51
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5e35964209241936feebc57badf4435dd69fcb91
commit b22afcdf04 upstream.
Alexey and Joshua tried to solve a cpusets related hotplug problem which is
user space visible and results in unexpected behaviour for some time after
a CPU has been plugged in and the corresponding uevent was delivered.
cpusets delegate the hotplug work (rebuilding cpumasks etc.) to a
workqueue. This is done because the cpusets code has already a lock
nesting of cgroups_mutex -> cpu_hotplug_lock. A synchronous callback or
waiting for the work to finish with cpu_hotplug_lock held can and will
deadlock because that results in the reverse lock order.
As a consequence the uevent can be delivered before cpusets have consistent
state which means that a user space invocation of sched_setaffinity() to
move a task to the plugged CPU fails up to the point where the scheduled
work has been processed.
The same is true for CPU unplug, but that does not create user observable
failure (yet).
It's still inconsistent to claim that an operation is finished before it
actually is and that's the real issue at hand. uevents just make it
reliably observable.
Obviously the problem should be fixed in cpusets/cgroups, but untangling
that is pretty much impossible because according to the changelog of the
commit which introduced this 8 years ago:
3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
the lock order cgroups_mutex -> cpu_hotplug_lock is a design decision and
the whole code is built around that.
So bite the bullet and invoke the relevant cpuset function, which waits for
the work to finish, in _cpu_up/down() after dropping cpu_hotplug_lock and
only when tasks are not frozen by suspend/hibernate because that would
obviously wait forever.
Waiting there with cpu_add_remove_lock, which is protecting the present
and possible CPU maps, held is not a problem at all because neither work
queues nor cpusets/cgroups have any lockchains related to that lock.
Waiting in the hotplug machinery is not problematic either because there
are already state callbacks which wait for hardware queues to drain. It
makes the operations slightly slower, but hotplug is slow anyway.
This ensures that state is consistent before returning from a hotplug
up/down operation. It's still inconsistent during the operation, but that's
a different story.
Add a large comment which explains why this is done and why this is not a
dump ground for the hack of the day to work around half thought out locking
schemes. Document also the implications vs. hotplug operations and
serialization or the lack of it.
Thanks to Alexy and Joshua for analyzing why this temporary
sched_setaffinity() failure happened.
Fixes: 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
Reported-by: Alexey Klimov <aklimov@redhat.com>
Reported-by: Joshua Baker <jobaker@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexey Klimov <aklimov@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tuowcnv3.ffs@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cpu_down() checks for num_active_cpus() to ensure that at least one
cpu is left active. If there are two online CPUs, but one of these
is paused this check will fail indicating that only one active
CPU is available. This will prevent the online but inactive cpu
from being offlined.
Correct cpu_down() to ensure that if there is only one active CPU
and that is the CPU being requested, the offline is blocked, allowing
the second to last CPU that is inactive but online to be offlined.
Bug: 182362445
Change-Id: I5b26cb6c5fdba4f2e69e5201e25bfe987d30c405
Signed-off-by: Stephen Dickey <dickey@codeaurora.org>
It is possible that all the 32 bit CPUs are paused in
the system, which is not ideal for quickly launching
32 bit apps.
Detect if a pause operation is about to pause the
last 32 bit CPU, and prevent it from happening.
Bug: 175896474
Change-Id: I21b4dad7ba9f3ef9be460137098e6fb2c0e336e6
Signed-off-by: Stephen Dickey <dickey@codeaurora.org>
Include a vendor hook for cpu_up and cpu_down to force the
rebuilding of scheduling domains prior to issuing a new
cpu up/down. Include a Kernel Export for
cpuset_wait_for_hotplug such that vendor hooks may refer
to this functionality, to ensure scheduling domains are
complete.
Bug: 176152285
Change-Id: I778dbc5e4f9d613f39b8c61f244c0f33020a3dd3
Signed-off-by: Stephen Dickey <dickey@codeaurora.org>
Add a tracepoint for pause and resume which measures the
duration of time to perform the entire operation, the
cpus acted upon with this event, and the current state
of the active cpu mask. This should be sufficient
for testing pause performance.
Bug: 175959069
Change-Id: I9fc269c7d09ac78ec31612d3c552044b72b0e6e3
Signed-off-by: Stephen Dickey <dickey@codeaurora.org>
Incorporate a vendor hook in the resume cpus path
so that vendor specific activities may take place.
Bug: 161210528
Change-Id: I74d03247491b004e891dbcfe06a478d00a95ba9f
Signed-off-by: Stephen Dickey <dickey@codeaurora.org>
In the resume_cpus() path, cpus cannot be taken
advantage of until the cpus write lock is acquired,
and cpus are activated and domains rebuilt. This
can incurr significant delay in the unpause operation.
Additionally, if scheduled through the kworker thread,
the wait time for rebuilding sched domains becomes
large due to a busy system that can prevent the kworker
from executing.
Activate the cpus and call the cpuset_hotplug_workfn
directly within resume_cpus prior to getting the cpus
write lock, thereby eliminating delays associated
with scheduling this activity.
Bug: 161210528
Change-Id: Ie2521f28ed9078b22d421d792f08413016d4dd62
Signed-off-by: Stephen Dickey <dickey@codeaurora.org>
Signed-off-by: Todd Kjos <tkjos@google.com>
paused_cpus intending to force CPUs to go idle as quickly as possible,
adding a migration step, to drain the rq from any running task.
Two steps are actually needed. The first one, "lazy", will run before the
cpu_active_mask has been synced. The second one will run after. It is
possible for another CPU, to observe an outdated version of that mask and
to enqueue a task on a rq that has just been marked inactive. The second
migration is there to catch any of those spurious move, while the first
one will drain the rq as quickly as possible to let the CPU reach an idle
state.
Bug: 161210528
Change-Id: Ie26c2e4c42665dd61d41a899a84536e56bf2b887
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
pause_cpus intends to have a way to force a CPU to go idle and to resume
as quickly as possible, with as little disruption as possible on the
system. This is a way of saving energy or meet thermal constraints, for
which a full CPU hotunplug is too slow. A paused CPU is simply deactivated
from the scheduler point of view. This corresponds to the first hotunplug
step.
Each pause operation still needs some heavy synchronization. Allowing to
pause several CPUs in one go mitigate that issue.
Paused CPUs can be resumed with resume_cpus(), which also takes a cpumask
as an input.
Few limitations:
* It isn't possible to pause a CPU which is running SCHED_DEADLINE task.
* A paused CPU will be removed from any cpuset it is part of. Resuming
the CPU won't put back this CPU in the cpuset if using cgroup1.
Cgroup2 doesn't have this limitation.
* per-CPU kthreads are still allowed to run on a paused CPU.
Bug: 161210528
Change-Id: I1f5cb28190f8ec979bb8640a89b022f2f7266bcf
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
In the event of a partial _cpu_down, (i.e. _cpu_down(target) where
target > CPUHP_AP_OFFLINE), the cpu_online_mask won't be aligned with
cpu_active_mask. This is an issue when trying to offline the last CPU
from cpu_active_mask, while num_online_cpus() > 1.
Protect against this case by checking num_active_cpus() instead of
num_online_cpus().
Bug: 161210528
Change-Id: Ibe7d9ef69e5f91e99be0d98076614a7654bda094
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Commit c6e5f9d7cf ("ANDROID: cpu-hotplug: Always use real time
scheduling when hotplugging a CPU") tried to speed-up hotplug of
SCHED_NORMAL tasks by temporarily elevating them to SCHED_FIFO. But
while at it, it also prevented hotplug from SCHED_IDLE, SCHED_BATCH or
SCHED_DEADLINE for no apparent reason.
Since this is a userspace-visible change, and is unlikely to actually be
needed, change the patch logic to only optimize for SCHED_NORMAL tasks
and leave the others untouched.
Bug: 169238689
Fixes: c6e5f9d7cf ("ANDROID: cpu-hotplug: Always use real time
scheduling when hotplugging a CPU")
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I4d9e88b15fee56e7d234826e2eaea306a69328bb
CPU hotplug operations take place in preemptible context. This leaves
the hotplugging thread at the mercy of overall system load and CPU
availability. If the hotplugging thread does not get an opportunity
to execute after it has already begun a hotplug operation, CPUs can
end up being stuck in a quasi online state. In the worst case a CPU
can be stuck in a state where the migration thread is parked while
another task is executing and changing affinity in a loop. This
combination can result in unbounded execution time for the running
task until the hotplugging thread gets the chance to run to complete
the hotplug operation.
Fix the said problem by ensuring that hotplug can only occur from
threads belonging to the RT sched class. This allows the hotplugging
thread priority on the CPU no matter what the system load or the
number of available CPUs are. If a SCHED_NORMAL task attempts to
hotplug a CPU, we temporarily elevate it's scheduling policy to RT.
Furthermore, we disallow hotplugging operations to begin if the
calling task belongs to the idle and deadline classes or those that
use the SCHED_BATCH policy.
Bug: 169238689
Change-Id: Idbb1384626e6ddff46c0d2ce752eee68396c78af
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[psodagud@codeaurora.org: Fixed compilation issues]
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Pull scheduler updates from Ingo Molnar:
"The changes in this cycle are:
- Optimize the task wakeup CPU selection logic, to improve
scalability and reduce wakeup latency spikes
- PELT enhancements
- CFS bandwidth handling fixes
- Optimize the wakeup path by remove rq->wake_list and replacing it
with ->ttwu_pending
- Optimize IPI cross-calls by making flush_smp_call_function_queue()
process sync callbacks first.
- Misc fixes and enhancements"
* tag 'sched-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
irq_work: Define irq_work_single() on !CONFIG_IRQ_WORK too
sched/headers: Split out open-coded prototypes into kernel/sched/smp.h
sched: Replace rq::wake_list
sched: Add rq::ttwu_pending
irq_work, smp: Allow irq_work on call_single_queue
smp: Optimize send_call_function_single_ipi()
smp: Move irq_work_run() out of flush_smp_call_function_queue()
smp: Optimize flush_smp_call_function_queue()
sched: Fix smp_call_function_single_async() usage for ILB
sched/core: Offload wakee task activation if it the wakee is descheduling
sched/core: Optimize ttwu() spinning on p->on_cpu
sched: Defend cfs and rt bandwidth quota against overflow
sched/cpuacct: Fix charge cpuacct.usage_sys
sched/fair: Replace zero-length array with flexible-array
sched/pelt: Sync util/runnable_sum with PELT window when propagating
sched/cpuacct: Use __this_cpu_add() instead of this_cpu_ptr()
sched/fair: Optimize enqueue_task_fair()
sched: Make scheduler_ipi inline
sched: Clean up scheduler_ipi()
sched/core: Simplify sched_init()
...
The single user could have called freeze_secondary_cpus() directly.
Since this function was a source of confusion, remove it as it's
just a pointless wrapper.
While at it, rename enable_nonboot_cpus() to thaw_secondary_cpus() to
preserve the naming symmetry.
Done automatically via:
git grep -l enable_nonboot_cpus | xargs sed -i 's/enable_nonboot_cpus/thaw_secondary_cpus/g'
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lkml.kernel.org/r/20200430114004.17477-1-qais.yousef@arm.com
In the CPU-offline process, it calls mmdrop() after idle entry and the
subsequent call to cpuhp_report_idle_dead(). Once execution passes the
call to rcu_report_dead(), RCU is ignoring the CPU, which results in
lockdep complaining when mmdrop() uses RCU from either memcg or
debugobjects below.
Fix it by cleaning up the active_mm state from BP instead. Every arch
which has CONFIG_HOTPLUG_CPU should have already called idle_task_exit()
from AP. The only exception is parisc because it switches them to
&init_mm unconditionally (see smp_boot_one_cpu() and smp_cpu_init()),
but the patch will still work there because it calls mmgrab(&init_mm) in
smp_cpu_init() and then should call mmdrop(&init_mm) in finish_cpu().
WARNING: suspicious RCU usage
-----------------------------
kernel/workqueue.c:710 RCU or wq_pool_mutex should be held!
other info that might help us debug this:
RCU used illegally from offline CPU!
Call Trace:
dump_stack+0xf4/0x164 (unreliable)
lockdep_rcu_suspicious+0x140/0x164
get_work_pool+0x110/0x150
__queue_work+0x1bc/0xca0
queue_work_on+0x114/0x120
css_release+0x9c/0xc0
percpu_ref_put_many+0x204/0x230
free_pcp_prepare+0x264/0x570
free_unref_page+0x38/0xf0
__mmdrop+0x21c/0x2c0
idle_task_exit+0x170/0x1b0
pnv_smp_cpu_kill_self+0x38/0x2e0
cpu_die+0x48/0x64
arch_cpu_idle_dead+0x30/0x50
do_idle+0x2f4/0x470
cpu_startup_entry+0x38/0x40
start_secondary+0x7a8/0xa80
start_secondary_resume+0x10/0x14
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://lkml.kernel.org/r/20200401214033.8448-1-cai@lca.pw
In a quest to make the huge -rc1 merge easier to handle and bisect,
merge the first chunk of 5.7-rc1 patches into android-mainline.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib54436e9515660a4c0c25c49c21bfb399eb57921
Pull core SMP updates from Thomas Gleixner:
"CPU (hotplug) updates:
- Support for locked CSD objects in smp_call_function_single_async()
which allows to simplify callsites in the scheduler core and MIPS
- Treewide consolidation of CPU hotplug functions which ensures the
consistency between the sysfs interface and kernel state. The low
level functions cpu_up/down() are now confined to the core code and
not longer accessible from random code"
* tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()
cpu/hotplug: Hide cpu_up/down()
cpu/hotplug: Move bringup of secondary CPUs out of smp_init()
torture: Replace cpu_up/down() with add/remove_cpu()
firmware: psci: Replace cpu_up/down() with add/remove_cpu()
xen/cpuhotplug: Replace cpu_up/down() with device_online/offline()
parisc: Replace cpu_up/down() with add/remove_cpu()
sparc: Replace cpu_up/down() with add/remove_cpu()
powerpc: Replace cpu_up/down() with add/remove_cpu()
x86/smp: Replace cpu_up/down() with add/remove_cpu()
arm64: hibernate: Use bringup_hibernate_cpu()
cpu/hotplug: Provide bringup_hibernate_cpu()
arm64: Use reboot_cpu instead of hardconding it to 0
arm64: Don't use disable_nonboot_cpus()
ARM: Use reboot_cpu instead of hardcoding it to 0
ARM: Don't use disable_nonboot_cpus()
ia64: Replace cpu_down() with smp_shutdown_nonboot_cpus()
cpu/hotplug: Create a new function to shutdown nonboot cpus
cpu/hotplug: Add new {add,remove}_cpu() functions
sched/core: Remove rq.hrtick_csd_pending
...
A recent change to freeze_secondary_cpus() which added an early abort if a
wakeup is pending missed the fact that the function is also invoked for
shutdown, reboot and kexec via disable_nonboot_cpus().
In case of disable_nonboot_cpus() the wakeup event needs to be ignored as
the purpose is to terminate the currently running kernel.
Add a 'suspend' argument which is only set when the freeze is in context of
a suspend operation. If not set then an eventually pending wakeup event is
ignored.
Fixes: a66d955e91 ("cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending")
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Pavankumar Kondeti <pkondeti@codeaurora.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/874kuaxdiz.fsf@nanos.tec.linutronix.de
Use separate functions for the device core to bring a CPU up and down.
Users outside the device core must use add/remove_cpu() which will take
care of extra housekeeping work like keeping sysfs in sync.
Make cpu_up/down() static and replace the extra layer of indirection.
[ tglx: Removed the extra wrapper functions and adjusted function names ]
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200323135110.30522-18-qais.yousef@arm.com
This function will be used later in machine_shutdown() for some
architectures.
disable_nonboot_cpus() is not safe to use when doing machine_down(),
because it relies on freeze_secondary_cpus() which in turn is a
suspend/resume related freeze and could abort if the logic detects any
pending activities that can prevent finishing the offlining process.
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200323135110.30522-3-qais.yousef@arm.com
Baby steps in the 5.6-rc1 merge cycle to make things easier to review
and debug.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I005e68433be6b1d66bd56d7e1c8f44ab8e78bebe
Pull scheduler updates from Ingo Molnar:
"These were the main changes in this cycle:
- More -rt motivated separation of CONFIG_PREEMPT and
CONFIG_PREEMPTION.
- Add more low level scheduling topology sanity checks and warnings
to filter out nonsensical topologies that break scheduling.
- Extend uclamp constraints to influence wakeup CPU placement
- Make the RT scheduler more aware of asymmetric topologies and CPU
capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y
- Make idle CPU selection more consistent
- Various fixes, smaller cleanups, updates and enhancements - please
see the git log for details"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
sched/fair: Define sched_idle_cpu() only for SMP configurations
sched/topology: Assert non-NUMA topology masks don't (partially) overlap
idle: fix spelling mistake "iterrupts" -> "interrupts"
sched/fair: Remove redundant call to cpufreq_update_util()
sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled
sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP
sched/fair: calculate delta runnable load only when it's needed
sched/cputime: move rq parameter in irqtime_account_process_tick
stop_machine: Make stop_cpus() static
sched/debug: Reset watchdog on all CPUs while processing sysrq-t
sched/core: Fix size of rq::uclamp initialization
sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
sched/fair: Load balance aggressively for SCHED_IDLE CPUs
sched/fair : Improve update_sd_pick_busiest for spare capacity case
watchdog: Remove soft_lockup_hrtimer_cnt and related code
sched/rt: Make RT capacity-aware
sched/fair: Make EAS wakeup placement consider uclamp restrictions
sched/fair: Make task_fits_capacity() consider uclamp restrictions
sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
sched/uclamp: Make uclamp util helpers use and return UL values
...
When CONFIG_SYSFS is disabled, but CONFIG_HOTPLUG_SMT is enabled,
the kernel fails to link:
arch/x86/power/cpu.o: In function `hibernate_resume_nonboot_cpu_disable':
(.text+0x38d): undefined reference to `cpuhp_smt_enable'
arch/x86/power/hibernate.o: In function `arch_resume_nosmt':
hibernate.c:(.text+0x291): undefined reference to `cpuhp_smt_enable'
hibernate.c:(.text+0x29c): undefined reference to `cpuhp_smt_disable'
Move the exported functions out of the #ifdef section into its
own with the correct conditions.
The patch that caused this is marked for stable backports, so
this one may need to be backported as well.
Fixes: ec527c3180 ("x86/power: Fix 'nosmt' vs hibernation triple fault during resume")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191210195614.786555-1-arnd@arndb.de
Paul reported a very sporadic, rcutorture induced, workqueue failure.
When the planets align, the workqueue rescuer's self-migrate fails and
then triggers a WARN for running a work on the wrong CPU.
Tejun then figured that set_cpus_allowed_ptr()'s stop_one_cpu() call
could be ignored! When stopper->enabled is false, stop_machine will
insta complete the work, without actually doing the work. Worse, it
will not WARN about this (we really should fix this).
It turns out there is a small window where a freshly online'ed CPU is
marked 'online' but doesn't yet have the stopper task running:
BP AP
bringup_cpu()
__cpu_up(cpu, idle) --> start_secondary()
...
cpu_startup_entry()
bringup_wait_for_ap()
wait_for_ap_thread() <-- cpuhp_online_idle()
while (1)
do_idle()
... available to run kthreads ...
stop_machine_unpark()
stopper->enable = true;
Close this by moving the stop_machine_unpark() into
cpuhp_online_idle(), such that the stopper thread is ready before we
start the idle loop and schedule.
Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Debugged-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: "Paul E. McKenney" <paulmck@kernel.org>
Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:
- A comprehensive rewrite of the robust/PI futex code's exit handling
to fix various exit races. (Thomas Gleixner et al)
- Rework the generic REFCOUNT_FULL implementation using
atomic_fetch_* operations so that the performance impact of the
cmpxchg() loops is mitigated for common refcount operations.
With these performance improvements the generic implementation of
refcount_t should be good enough for everybody - and this got
confirmed by performance testing, so remove ARCH_HAS_REFCOUNT and
REFCOUNT_FULL entirely, leaving the generic implementation enabled
unconditionally. (Will Deacon)
- Other misc changes, fixes, cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
lkdtm: Remove references to CONFIG_REFCOUNT_FULL
locking/refcount: Remove unused 'refcount_error_report()' function
locking/refcount: Consolidate implementations of refcount_t
locking/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions
locking/refcount: Move saturation warnings out of line
locking/refcount: Improve performance of generic REFCOUNT_FULL code
locking/refcount: Move the bulk of the REFCOUNT_FULL implementation into the <linux/refcount.h> header
locking/refcount: Remove unused refcount_*_checked() variants
locking/refcount: Ensure integer operands are treated as signed
locking/refcount: Define constants for saturation and max refcount values
futex: Prevent exit livelock
futex: Provide distinct return value when owner is exiting
futex: Add mutex around futex exit
futex: Provide state handling for exec() as well
futex: Sanitize exit state handling
futex: Mark the begin of futex exit explicitly
futex: Set task::futex_state to DEAD right after handling futex exit
futex: Split futex_mm_release() for exit/exec
exit/exec: Seperate mm_release()
futex: Replace PF_EXITPIDONE with a state
...
This is an intermediate (mid-week) merge of Linus's tree into
android-mainline to take all of the "big" security fixes that went into
there into the android-mainline tree to get testing happening sooner.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie4d7914776ac1f917de0436061e46295ad919ead
A kernel module may need to check the value of the "mitigations=" kernel
command line parameter as part of its setup when the module needs
to perform software mitigations for a CPU flaw.
Uninline and export the helper functions surrounding the cpu_mitigations
enum to allow for their usage from a module.
Lastly, privatize the enum and cpu_mitigations variable since the value of
cpu_mitigations can be checked with the exported helper functions.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
To make the 5.4-rc1 merge easier, merge at a prerelease point in time
before the final release happens.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I052c6a28528e10cdda89b6a20d320ac7562266b8
KVM needs to know if SMT is theoretically possible, this means it is
supported and not forcefully disabled ('nosmt=force'). Create and
export cpu_smt_possible() answering this question.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This merges Linus's tree as of commit b41dae061b ("Merge tag
'xfs-5.4-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux")
into android-mainline.
This "early" merge makes it easier to test and handle merge conflicts
instead of having to wait until the "end" of the merge window and handle
all 10000+ commits at once.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6bebf55e5e2353f814e3c87f5033607b1ae5d812
Re-evaluating the bitmap wheight of the online cpus bitmap in every
invocation of num_online_cpus() over and over is a pretty useless
exercise. Especially when num_online_cpus() is used in code paths
like the IPI delivery of x86 or the membarrier code.
Cache the number of online CPUs in the core and just return the cached
variable. The accessor function provides only a snapshot when used without
protection against concurrent CPU hotplug.
The storage needs to use an atomic_t because the kexec and reboot code
(ab)use set_cpu_online() in their 'shutdown' handlers without any form of
serialization as pointed out by Mathieu. Regular CPU hotplug usage is
properly serialized.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907091622590.1634@nanos.tec.linutronix.de
The booted once information which is required to deal with the MCE
broadcast issue on X86 correctly is stored in the per cpu hotplug state,
which is perfectly fine for the intended purpose.
X86 needs that information for supporting NMI broadcasting via shortcuts,
but retrieving it from per cpu data is cumbersome.
Move it to a cpumask so the information can be checked against the
cpu_present_mask quickly.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190722105219.818822855@linutronix.de