Changes in 4.9.41
af_key: Add lock to key dump
pstore: Make spinlock per zone instead of global
net: reduce skb_warn_bad_offload() noise
jfs: Don't clear SGID when inheriting ACLs
ALSA: fm801: Initialize chip after IRQ handler is registered
ALSA: hda - Add missing NVIDIA GPU codec IDs to patch table
parisc: Prevent TLB speculation on flushed pages on CPUs that only support equivalent aliases
parisc: Extend disabled preemption in copy_user_page
parisc: Suspend lockup detectors before system halt
powerpc/pseries: Fix of_node_put() underflow during reconfig remove
NFS: invalidate file size when taking a lock.
NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter
crypto: authencesn - Fix digest_null crash
KVM: PPC: Book3S HV: Enable TM before accessing TM registers
md/raid5: add thread_group worker async_tx_issue_pending_all
drm/vmwgfx: Fix gcc-7.1.1 warning
drm/nouveau/disp/nv50-: bump max chans to 21
drm/nouveau/bar/gf100: fix access to upper half of BAR2
KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
KVM: PPC: Book3S HV: Save/restore host values of debug registers
Revert "powerpc/numa: Fix percpu allocations to be NUMA aware"
Staging: comedi: comedi_fops: Avoid orphaned proc entry
drm: rcar-du: Simplify and fix probe error handling
smp/hotplug: Move unparking of percpu threads to the control CPU
smp/hotplug: Replace BUG_ON and react useful
nfc: Fix hangup of RC-S380* in port100_send_ack()
nfc: fdp: fix NULL pointer dereference
net: phy: Do not perform software reset for Generic PHY
isdn: Fix a sleep-in-atomic bug
isdn/i4l: fix buffer overflow
ath10k: fix null deref on wmi-tlv when trying spectral scan
wil6210: fix deadlock when using fw_no_recovery option
mailbox: always wait in mbox_send_message for blocking Tx mode
mailbox: skip complete wait event if timer expired
mailbox: handle empty message in tx_tick
sched/cgroup: Move sched_online_group() back into css_online() to fix crash
RDMA/uverbs: Fix the check for port number
ipmi/watchdog: fix watchdog timeout set on reboot
dentry name snapshots
v4l: s5c73m3: fix negation operator
pstore: Allow prz to control need for locking
pstore: Correctly initialize spinlock and flags
pstore: Use dynamic spinlock initializer
net: skb_needs_check() accepts CHECKSUM_NONE for tx
device-dax: fix sysfs duplicate warnings
x86/mce/AMD: Make the init code more robust
r8169: add support for RTL8168 series add-on card.
ARM: omap2+: fixing wrong strcat for Non-NULL terminated string
dt-bindings: power/supply: Update TPS65217 properties
dt-bindings: input: Specify the interrupt number of TPS65217 power button
ARM: dts: am57xx-idk: Put USB2 port in peripheral mode
ARM: dts: n900: Mark eMMC slot with no-sdio and no-sd flags
net/mlx5: Disable RoCE on the e-switch management port under switchdev mode
ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output
net/mlx4_core: Use-after-free causes a resource leak in flow-steering detach
net/mlx4: Remove BUG_ON from ICM allocation routine
net/mlx4_core: Fix raw qp flow steering rules under SRIOV
drm/msm: Ensure that the hardware write pointer is valid
drm/msm: Put back the vaddr in submit_reloc()
drm/msm: Verify that MSM_SUBMIT_BO_FLAGS are set
vfio-pci: use 32-bit comparisons for register address for gcc-4.5
irqchip/keystone: Fix "scheduling while atomic" on rt
ASoC: tlv320aic3x: Mark the RESET register as volatile
spi: dw: Make debugfs name unique between instances
ASoC: nau8825: fix invalid configuration in Pre-Scalar of FLL
irqchip/mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
openrisc: Add _text symbol to fix ksym build error
dmaengine: ioatdma: Add Skylake PCI Dev ID
dmaengine: ioatdma: workaround SKX ioatdma version
l2tp: consider '::' as wildcard address in l2tp_ip6 socket lookup
dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
usb: dwc3: omap: fix race of pm runtime with irq handler in probe
ARM64: zynqmp: Fix W=1 dtc 1.4 warnings
ARM64: zynqmp: Fix i2c node's compatible string
perf probe: Fix to get correct modname from elf header
ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_*
ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
usb: gadget: Fix copy/pasted error message
Btrfs: use down_read_nested to make lockdep silent
Btrfs: fix lockdep warning about log_mutex
benet: stricter vxlan offloading check in be_features_check
Btrfs: adjust outstanding_extents counter properly when dio write is split
Xen: ARM: Zero reserved fields of xatp before making hypervisor call
tools lib traceevent: Fix prev/next_prio for deadline tasks
xfrm: Don't use sk_family for socket policy lookups
perf tools: Install tools/lib/traceevent plugins with install-bin
perf symbols: Robustify reading of build-id from sysfs
video: fbdev: cobalt_lcdfb: Handle return NULL error from devm_ioremap
vfio-pci: Handle error from pci_iomap
arm64: mm: fix show_pte KERN_CONT fallout
nvmem: imx-ocotp: Fix wrong register size
net: usb: asix_devices: add .reset_resume for USB PM
ASoC: fsl_ssi: set fifo watermark to more reliable value
sh_eth: enable RX descriptor word 0 shift on SH7734
ARCv2: IRQ: Call entry/exit functions for chained handlers in MCIP
ALSA: usb-audio: test EP_FLAG_RUNNING at urb completion
x86/platform/intel-mid: Rename 'spidev' to 'mrfld_spidev'
perf/x86: Set pmu->module in Intel PMU modules
ASoC: Intel: bytcr-rt5640: fix settings in internal clock mode
HID: ignore Petzl USB headlamp
scsi: fnic: Avoid sending reset to firmware when another reset is in progress
scsi: snic: Return error code on memory allocation failure
scsi: bfa: Increase requested firmware version to 3.2.5.1
ASoC: Intel: Skylake: Release FW ctx in cleanup
ASoC: dpcm: Avoid putting stream state to STOP when FE stream is paused
Linux 4.9.41
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit dea1d0f5f1 upstream.
The move of the unpark functions to the control thread moved the BUG_ON()
there as well. While it made some sense in the idle thread of the upcoming
CPU, it's bogus to crash the control thread on the already online CPU,
especially as the function has a return value and the callsite is prepared
to handle an error return.
Replace it with a WARN_ON_ONCE() and return a proper error code.
Fixes: 9cd4f1a4e7 ("smp/hotplug: Move unparking of percpu threads to the control CPU")
Rightfully-ranted-at-by: Linux Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9cd4f1a4e7 upstream.
Vikram reported the following backtrace:
BUG: scheduling while atomic: swapper/7/0/0x00000002
CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680
schedule
schedule_hrtimeout_range_clock
schedule_hrtimeout
wait_task_inactive
__kthread_bind_mask
__kthread_bind
__kthread_unpark
kthread_unpark
cpuhp_online_idle
cpu_startup_entry
secondary_start_kernel
He analyzed correctly that a parked cpu hotplug thread of an offlined CPU
was still on the runqueue when the CPU came back online and tried to unpark
it. This causes the thread which invoked kthread_unpark() to call
wait_task_inactive() and subsequently schedule() with preemption disabled.
His proposed workaround was to "make sure" that a parked thread has
scheduled out when the CPU goes offline, so the situation cannot happen.
But that's still wrong because the root cause is not the fact that the
percpu thread is still on the runqueue and neither that preemption is
disabled, which could be simply solved by enabling preemption before
calling kthread_unpark().
The real issue is that the calling thread is the idle task of the upcoming
CPU, which is not supposed to call anything which might sleep. The moron,
who wrote that code, missed completely that kthread_unpark() might end up
in schedule().
The solution is simpler than expected. The thread which controls the
hotplug operation is waiting for the CPU to call complete() on the hotplug
state completion. So the idle task of the upcoming CPU can set its state to
CPUHP_AP_ONLINE_IDLE and invoke complete(). This in turn wakes the control
task on a different CPU, which then can safely do the unpark and kick the
now unparked hotplug thread of the upcoming CPU to complete the bringup to
the final target state.
Control CPU AP
bringup_cpu();
__cpu_up() ------------>
bringup_ap();
bringup_wait_for_ap()
wait_for_completion();
cpuhp_online_idle();
<------------ complete();
unpark(AP->stopper);
unpark(AP->hotplugthread);
while(1)
do_idle();
kick(AP->hotplugthread);
wait_for_completion(); hotplug_thread()
run_online_callbacks();
complete();
Fixes: 8df3e07e7f ("cpu/hotplug: Let upcoming cpu bring itself fully up")
Reported-by: Vikram Mulukutla <markivx@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707042218020.2131@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 4.9.32
bnx2x: Fix Multi-Cos
vxlan: eliminate cached dst leak
ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()
cxgb4: avoid enabling napi twice to the same queue
tcp: disallow cwnd undo when switching congestion control
vxlan: fix use-after-free on deletion
ipv6: Fix leak in ipv6_gso_segment().
net: ping: do not abuse udp_poll()
net/ipv6: Fix CALIPSO causing GPF with datagram support
net: ethoc: enable NAPI before poll may be scheduled
net: stmmac: fix completely hung TX when using TSO
net: bridge: start hello timer only if device is up
sparc64: Add __multi3 for gcc 7.x and later.
sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
sparc: Machine description indices can vary
sparc64: reset mm cpumask after wrap
sparc64: combine activate_mm and switch_mm
sparc64: redefine first version
sparc64: add per-cpu mm of secondary contexts
sparc64: new context wrap
sparc64: delete old wrap code
arch/sparc: support NR_CPUS = 4096
serial: ifx6x60: fix use-after-free on module unload
ptrace: Properly initialize ptracer_cred on fork
crypto: asymmetric_keys - handle EBUSY due to backlog correctly
KEYS: fix dereferencing NULL payload with nonzero length
KEYS: fix freeing uninitialized memory in key_update()
KEYS: encrypted: avoid encrypting/decrypting stack buffers
crypto: drbg - wait for crypto op not signal safe
crypto: gcm - wait for crypto op not signal safe
drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
nfsd4: fix null dereference on replay
nfsd: Fix up the "supattr_exclcreat" attributes
efi: Don't issue error message when booted under Xen
kvm: async_pf: fix rcu_irq_enter() with irqs enabled
KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation
arm64: KVM: Preserve RES1 bits in SCTLR_EL2
arm64: KVM: Allow unaligned accesses at EL2
arm: KVM: Allow unaligned accesses at HYP
KVM: async_pf: avoid async pf injection when in guest mode
KVM: arm/arm64: vgic-v3: Do not use Active+Pending state for a HW interrupt
KVM: arm/arm64: vgic-v2: Do not use Active+Pending state for a HW interrupt
dmaengine: usb-dmac: Fix DMAOR AE bit definition
dmaengine: ep93xx: Always start from BASE0
dmaengine: ep93xx: Don't drain the transfers in terminate_all()
dmaengine: mv_xor_v2: handle mv_xor_v2_prep_sw_desc() error properly
dmaengine: mv_xor_v2: properly handle wrapping in the array of HW descriptors
dmaengine: mv_xor_v2: do not use descriptors not acked by async_tx
dmaengine: mv_xor_v2: enable XOR engine after its configuration
dmaengine: mv_xor_v2: fix tx_submit() implementation
dmaengine: mv_xor_v2: remove interrupt coalescing
dmaengine: mv_xor_v2: set DMA mask to 40 bits
cfq-iosched: fix the delay of cfq_group's vdisktime under iops mode
xen/privcmd: Support correctly 64KB page granularity when mapping memory
ext4: fix SEEK_HOLE
ext4: keep existing extra fields when inode expands
ext4: fix data corruption with EXT4_GET_BLOCKS_ZERO
ext4: fix fdatasync(2) after extent manipulation operations
drm: Fix oops + Xserver hang when unplugging USB drm devices
usb: gadget: f_mass_storage: Serialize wake and sleep execution
usb: chipidea: udc: fix NULL pointer dereference if udc_start failed
usb: chipidea: debug: check before accessing ci_role
staging/lustre/lov: remove set_fs() call from lov_getstripe()
iio: adc: bcm_iproc_adc: swap primary and secondary isr handler's
iio: light: ltr501 Fix interchanged als/ps register field
iio: proximity: as3935: fix AS3935_INT mask
iio: proximity: as3935: fix iio_trigger_poll issue
mei: make sysfs modalias format similar as uevent modalias
cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
target: Re-add check to reject control WRITEs with overflow data
drm/msm: Expose our reservation object when exporting a dmabuf.
ahci: Acer SA5-271 SSD Not Detected Fix
cgroup: Prevent kill_css() from being called more than once
Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled
cpuset: consider dying css as offline
fs: add i_blocksize()
ufs: restore proper tail allocation
fix ufs_isblockset()
ufs: restore maintaining ->i_blocks
ufs: set correct ->s_maxsize
ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()
ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path
cxl: Fix error path on bad ioctl
cxl: Avoid double free_irq() for psl,slice interrupts
btrfs: use correct types for page indices in btrfs_page_exists_in_range
btrfs: fix memory leak in update_space_info failure path
KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages
scsi: qla2xxx: don't disable a not previously enabled PCI device
scsi: qla2xxx: Modify T262 FW dump template to specify same start/end to debug customer issues
scsi: qla2xxx: Set bit 15 for DIAG_ECHO_TEST MBC
scsi: qla2xxx: Fix mailbox pointer error in fwdump capture
powerpc/sysdev/simple_gpio: Fix oops in gpio save_regs function
powerpc/numa: Fix percpu allocations to be NUMA aware
powerpc/hotplug-mem: Fix missing endian conversion of aa_index
powerpc/kernel: Fix FP and vector register restoration
powerpc/kernel: Initialize load_tm on task creation
perf/core: Drop kernel samples even though :u is specified
drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve()
drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl()
drm/vmwgfx: Make sure backup_handle is always valid
drm/nouveau/tmr: fully separate alarm execution/pending lists
ALSA: timer: Fix race between read and ioctl
ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT
ASoC: Fix use-after-free at card unregistration
cpu/hotplug: Drop the device lock on error
drivers: char: mem: Fix wraparound check to allow mappings up to the end
serial: sh-sci: Fix panic when serial console and DMA are enabled
arm64: traps: fix userspace cache maintenance emulation on a tagged pointer
arm64: hw_breakpoint: fix watchpoint matching for tagged pointers
arm64: entry: improve data abort handling of tagged pointers
ARM: 8636/1: Cleanup sanity_check_meminfo
ARM: 8637/1: Adjust memory boundaries after reservations
usercopy: Adjust tests to deal with SMAP/PAN
drm/i915/vbt: don't propagate errors from intel_bios_init()
drm/i915/vbt: split out defaults that are set when there is no VBT
cpufreq: schedutil: move cached_raw_freq to struct sugov_policy
cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start()
netfilter: nft_set_rbtree: handle element re-addition after deletion
Linux 4.9.32
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 40da1b11f0 upstream.
If a custom CPU target is specified and that one is not available _or_
can't be interrupted then the code returns to userland without dropping a
lock as notices by lockdep:
|echo 133 > /sys/devices/system/cpu/cpu7/hotplug/target
| ================================================
| [ BUG: lock held when returning to user space! ]
| ------------------------------------------------
| bash/503 is leaving the kernel with locks still held!
| 1 lock held by bash/503:
| #0: (device_hotplug_lock){+.+...}, at: [<ffffffff815b5650>] lock_device_hotplug_sysfs+0x10/0x40
So release the lock then.
Fixes: 757c989b99 ("cpu/hotplug: Make target state writeable")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170602142714.3ogo25f2wbq6fjpj@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 4.9.27:
timerfd: Protect the might cancel mechanism proper
Handle mismatched open calls
tpm_tis: use default timeout value if chip reports it as zero
scsi: storvsc: Workaround for virtual DVD SCSI version
hwmon: (it87) Avoid registering the same chip on both SIO addresses
8250_pci: Fix potential use-after-free in error path
ceph: try getting buffer capability for readahead/fadvise
cpu/hotplug: Serialize callback invocations proper
dm ioctl: prevent stack leak in dm ioctl call
Linux 4.9.27
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit dc434e056f upstream.
The setup/remove_state/instance() functions in the hotplug core code are
serialized against concurrent CPU hotplug, but unfortunately not serialized
against themself.
As a consequence a concurrent invocation of these function results in
corruption of the callback machinery because two instances try to invoke
callbacks on remote cpus at the same time. This results in missing callback
invocations and initiator threads waiting forever on the completion.
The obvious solution to replace get_cpu_online() with cpu_hotplug_begin()
is not possible because at least one callsite calls into these functions
from a get_online_cpu() locked region.
Extend the protection scope of the cpuhp_state_mutex from solely protecting
the state arrays to cover the callback invocation machinery as well.
Fixes: 5b7aa87e04 ("cpu/hotplug: Implement setup/removal interface")
Reported-and-tested-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: hpa@zytor.com
Cc: mingo@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20170314150645.g4tdyoszlcbajmna@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case some sysfs nodes needs to be labeled with a different label than
sysfs then user needs to be notified when a core is brought back online.
Signed-off-by: Thierry Strudel <tstrudel@google.com>
Bug: 29359497
Change-Id: I0395c86e01cd49c348fda8f93087d26f88557c91
Move the x86_64 idle notifiers originally by Andi Kleen and Venkatesh
Pallipadi to generic.
Change-Id: Idf29cda15be151f494ff245933c12462643388d5
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Todd Poynor <toddpoynor@google.com>
commit 777c6e0dae upstream.
Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
notifiers when HOTPLUG_CPU=y while the registration might succeed even
when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
might keep a stale notifier on the list on the manual clean up during
the pool tear down and thus corrupt the list. Resulting in the following
[ 144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
[ 144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40
<snipped>
[ 145.122628] Call Trace:
[ 145.125086] [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20
[ 145.131350] [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400
[ 145.137268] [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300
[ 145.143188] [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10
[ 145.149018] [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30
[ 145.154940] [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0
[ 145.160511] [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20
[ 145.167035] [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0
[ 145.172694] [<ffffffffa290848d>] module_attr_store+0x1d/0x30
[ 145.178443] [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70
[ 145.183925] [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180
[ 145.189761] [<ffffffffa2a99248>] __vfs_write+0x18/0x40
[ 145.194982] [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0
[ 145.200122] [<ffffffffa2a9a732>] SyS_write+0x52/0xa0
[ 145.205177] [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17
This can be even triggered manually by changing
/sys/module/zswap/parameters/compressor multiple times.
Fix this issue by making unregister APIs symmetric to the register so
there are no surprises.
Fixes: 47e627bc8c ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
Reported-and-tested-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull CPU hotplug updates from Thomas Gleixner:
"Yet another batch of cpu hotplug core updates and conversions:
- Provide core infrastructure for multi instance drivers so the
drivers do not have to keep custom lists.
- Convert custom lists to the new infrastructure. The block-mq custom
list conversion comes through the block tree and makes the diffstat
tip over to more lines removed than added.
- Handle unbalanced hotplug enable/disable calls more gracefully.
- Remove the obsolete CPU_STARTING/DYING notifier support.
- Convert another batch of notifier users.
The relayfs changes which conflicted with the conversion have been
shipped to me by Andrew.
The remaining lot is targeted for 4.10 so that we finally can remove
the rest of the notifiers"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
cpufreq: Fix up conversion to hotplug state machine
blk/mq: Reserve hotplug states for block multiqueue
x86/apic/uv: Convert to hotplug state machine
s390/mm/pfault: Convert to hotplug state machine
mips/loongson/smp: Convert to hotplug state machine
mips/octeon/smp: Convert to hotplug state machine
fault-injection/cpu: Convert to hotplug state machine
padata: Convert to hotplug state machine
cpufreq: Convert to hotplug state machine
ACPI/processor: Convert to hotplug state machine
virtio scsi: Convert to hotplug state machine
oprofile/timer: Convert to hotplug state machine
block/softirq: Convert to hotplug state machine
lib/irq_poll: Convert to hotplug state machine
x86/microcode: Convert to hotplug state machine
sh/SH-X3 SMP: Convert to hotplug state machine
ia64/mca: Convert to hotplug state machine
ARM/OMAP/wakeupgen: Convert to hotplug state machine
ARM/shmobile: Convert to hotplug state machine
arm64/FP/SIMD: Convert to hotplug state machine
...
Pull RCU updates from Ingo Molnar:
"The main changes in this cycle were:
- Expedited grace-period changes, most notably avoiding having user
threads drive expedited grace periods, using a workqueue instead.
- Miscellaneous fixes, including a performance fix for lists that was
sent with the lists modifications.
- CPU hotplug updates, most notably providing exact CPU-online
tracking for RCU. This will in turn allow removal of the checks
supporting RCU's prior heuristic that was based on the assumption
that CPUs would take no longer than one jiffy to come online.
- Torture-test updates.
- Documentation updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
list: Expand list_first_entry_or_null()
torture: TOROUT_STRING(): Insert a space between flag and message
rcuperf: Consistently insert space between flag and message
rcutorture: Print out barrier error as document says
torture: Add task state to writer-task stall printk()s
torture: Convert torture_shutdown() to hrtimer
rcutorture: Convert to hotplug state machine
cpu/hotplug: Get rid of CPU_STARTING reference
rcu: Provide exact CPU-online tracking for RCU
rcu: Avoid redundant quiescent-state chasing
rcu: Don't use modular infrastructure in non-modular code
sched: Make wake_up_nohz_cpu() handle CPUs going offline
rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads
rcu: Use RCU's online-CPU state for expedited IPI retry
rcu: Exclude RCU-offline CPUs from expedited grace periods
rcu: Make expedited RCU CPU stall warnings respond to controls
rcu: Stop disabling expedited RCU CPU stall warnings
rcu: Drive expedited grace periods from workqueue
rcu: Consolidate expedited grace period machinery
documentation: Record reason for rcu_head two-byte alignment
...
Some compilers are unhappy with the anon union in the state array. Replace
it with a named union.
While at it align the state array initializers proper and add the missing
name tags.
Fixes: cf392d10b6 "cpu/hotplug: Add multi instance support"
Reported-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Fenguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: rt@linutronix.de
This patch adds the ability for a given state to have multiple
instances. Until now all states have a single instance and the startup /
teardown callback use global variables.
A few drivers need to perform a the same callbacks on multiple
"instances". Currently we have three drivers in tree which all have a
global list which they iterate over. With multi instance they support
don't need their private list and the functionality has been moved into
core code. Plus we hold the hotplug lock in core so no cpus comes/goes
while instances are registered and we do rollback in error case :)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/1471024183-12666-3-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is preparation for the following patch.
This rework here changes the arguments of cpuhp_invoke_callback(). It
passes now `state' and whether `startup' or `teardown' callback should
be invoked. The callback then is looked up by the function.
The following is a clanup of callers:
- cpuhp_issue_call() has one argument less
- struct cpuhp_cpu_state (which is used by the hotplug thread) gets also
its callback removed. The decision if it is a single callback
invocation moved to the `single' variable. Also a `bringup' variable
has been added to distinguish between startup and teardown callback.
- take_cpu_down() needs to start one step earlier. We always get here
via CPUHP_TEARDOWN_CPU callback. Before that change cpuhp_ap_states +
CPUHP_TEARDOWN_CPU pointed to an empty entry because TEARDOWN is saved
in bp_states for this reason. Now that we use cpuhp_get_step() to
lookup the state we must explicitly skip it in order not to invoke it
twice.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/1471024183-12666-2-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
disable_nonboot_cpus() assumes that the lowest numbered online CPU is
the boot CPU, and that this is the correct CPU to run any power
management code on.
On x86 this is always correct, as CPU0 cannot (easily) by taken offline.
On arm64 CPU0 can be taken offline. For hibernate/resume this means we
may hibernate on a CPU other than CPU0. If the system is rebooted with
kexec 'CPU0' will be assigned to a different physical CPU. This
complicates hibernate/resume as now we can't trust the CPU numbers.
Arch code can find the correct physical CPU, and ensure it is online
before resume from hibernate begins, but also needs to influence
disable_nonboot_cpus()s choice of CPU.
Rename disable_nonboot_cpus() as freeze_secondary_cpus() and add an
argument indicating which CPU should be left standing. Follow the logic
in migrate_to_reboot_cpu() to use the lowest numbered online CPU if the
requested CPU is not online.
Add disable_nonboot_cpus() as an inline function that has the existing
behaviour.
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
CPU_STARTING is scheduled for removal. There is no use of it in drivers
and core code uses it only for compatibility with old-style CPU-hotplug
notifiers. This patch removes therefore removes CPU_STARTING from an
RCU-related comment.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Up to now, RCU has assumed that the CPU-online process makes it from
CPU_UP_PREPARE to set_cpu_online() within one jiffy. Given the recent
rise of virtualized environments, this assumption is very clearly
obsolete. Failing to meet this deadline can result in RCU paying
attention to an incoming CPU for one jiffy, then ignoring it until the
grace period following the one in which that CPU sets itself online.
This situation might prove to be fatally disappointing to any RCU
read-side critical sections that had the misfortune to execute during
the time in which RCU was ignoring the slow-to-come-online CPU.
This commit therefore updates RCU's internal CPU state-tracking
information at notify_cpu_starting() time, thus providing RCU with
an exact transition of the CPU's state from offline to online.
Note that this means that incoming CPUs must not use RCU read-side
critical section (other than those of SRCU) until notify_cpu_starting()
time. Note also that the CPU_STARTING notifiers -are- allowed to use
RCU read-side critical sections. (Of course, CPU-hotplug notifiers are
rapidly becoming obsolete, so you need to act fast!)
If a given architecture or CPU family needs to use RCU read-side
critical sections earlier, the call to rcu_cpu_starting() from
notify_cpu_starting() will need to be architecture-specific, with
architectures that need early use being required to hand-place
the call to rcu_cpu_starting() at some point preceding the call to
notify_cpu_starting().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Actually a nice symmetric startup/teardown pair which fits properly into
the state machine concept. In the long run we should be able to invoke
the startup callback for the boot CPU via the state machine and get
rid of the init function which invokes it on the boot CPU.
Note: This comes actually before the perf hardware callbacks. In the notifier
model the hardware callbacks have a higher priority than the core
callback. But that's solely for CPU offline so that hardware migration of
events happens before the core is notified about the outgoing CPU.
With the symetric state array model we have the following ordering:
UP: core -> hardware
DOWN: hardware -> core
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153333.587514098@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The scheduler can handle per cpu threads before the cpu is set to active and
it does not allow user space threads on the cpu before active is
set. Attaching to the scheduling domains is also not required before user
space threads can be handled.
Move the activation to the end of the hotplug state space. That also means
that deactivation is the first action when a cpu is shut down.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.597477199@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Now that we reduced everything into single notifiers, it's simple to move them
into the hotplug state machine space.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The recent introduction of the hotplug thread which invokes the callbacks on
the plugged cpu, cased the following regression:
If takedown_cpu() fails, then we run into several issues:
1) The rollback of the target cpu states is not invoked. That leaves the smp
threads and the hotplug thread in disabled state.
2) notify_online() is executed due to a missing skip_onerr flag. That causes
that both CPU_DOWN_FAILED and CPU_ONLINE notifications are invoked which
confuses quite some notifiers.
3) The CPU_DOWN_FAILED notification is not invoked on the target CPU. That's
not an issue per se, but it is inconsistent and in consequence blocks the
patches which rely on these states being invoked on the target CPU and not
on the controlling cpu. It also does not preserve the strict call order on
rollback which is problematic for the ongoing state machine conversion as
well.
To fix this we add a rollback flag to the remote callback machinery and invoke
the rollback including the CPU_DOWN_FAILED notification on the remote
cpu. Further mark the notify online state with 'skip_onerr' so we don't get a
double invokation.
This workaround will go away once we moved the unplug invocation to the target
cpu itself.
[ tglx: Massaged changelog and moved the CPU_DOWN_FAILED notifiaction to the
target cpu ]
Fixes: 4cb28ced23 ("cpu/hotplug: Create hotplug threads")
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-s390@vger.kernel.org
Cc: rt@linutronix.de
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: http://lkml.kernel.org/r/20160408124015.GA21960@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit 931ef16330 moved the smpboot thread park/unpark invocation to the
state machine. The move of the unpark invocation was premature as it depends
on work in progress patches.
As a result cpu down can fail, because rcu synchronization in takedown_cpu()
eventually requires a functional softirq thread. I never encountered the
problem in testing, but 0day testing managed to provide a reliable reproducer.
Remove the smpboot_threads_park() call from the state machine for now and put
it back into the original place after the rcu synchronization.
I'm embarrassed as I knew about the dependency and still managed to get it
wrong. Hotplug induced brain melt seems to be the only sensible explanation
for that.
Fixes: 931ef16330 "cpu/hotplug: Unpark smpboot threads from the state machine"
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
The check for the AP range in cpuhp_is_ap_state() is redundant after commit
8df3e07e7f "cpu/hotplug: Let upcoming cpu bring itself fully up" because all
states above CPUHP_BRINGUP_CPU are invoked on the hotplugged cpu. Remove it.
Reported-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Paul noticed that the conversion of the death reporting introduced a race
where the outgoing cpu might be delayed after waking the controll processor,
so it might not be able to call rcu_report_dead() before being physically
removed, leading to RCU stalls.
We cant call complete after rcu_report_dead(), so instead of going back to
busy polling, simply issue a function call to do the completion.
Fixes: 27d50c7eeb "rcu: Make CPU_DYING_IDLE an explicit call"
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20160302201127.GA23440@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>