Commit Graph

24689 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
9e5dd8ed9b Merge 4.9.74 into android-4.9
Changes in 4.9.74
	sync objtool's copy of x86-opcode-map.txt
	tracing: Remove extra zeroing out of the ring buffer page
	tracing: Fix possible double free on failure of allocating trace buffer
	tracing: Fix crash when it fails to alloc ring buffer
	ring-buffer: Mask out the info bits when returning buffer page length
	iw_cxgb4: Only validate the MSN for successful completions
	ASoC: wm_adsp: Fix validation of firmware and coeff lengths
	ASoC: da7218: fix fix child-node lookup
	ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
	ASoC: twl4030: fix child-node lookup
	ASoC: tlv320aic31xx: Fix GPIO1 register definition
	ALSA: hda: Drop useless WARN_ON()
	ALSA: hda - fix headset mic detection issue on a Dell machine
	x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
	x86/mm: Remove flush_tlb() and flush_tlb_current_task()
	x86/mm: Make flush_tlb_mm_range() more predictable
	x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()
	x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code
	x86/mm: Disable PCID on 32-bit kernels
	x86/mm: Add the 'nopcid' boot option to turn off PCID
	x86/mm: Enable CR4.PCIDE on supported systems
	x86/mm/64: Fix reboot interaction with CR4.PCIDE
	kbuild: add '-fno-stack-check' to kernel build options
	ipv4: igmp: guard against silly MTU values
	ipv6: mcast: better catch silly mtu values
	net: fec: unmap the xmit buffer that are not transferred by DMA
	net: igmp: Use correct source address on IGMPv3 reports
	netlink: Add netns check on taps
	net: qmi_wwan: add Sierra EM7565 1199:9091
	net: reevalulate autoflowlabel setting after sysctl setting
	ptr_ring: add barriers
	RDS: Check cmsg_len before dereferencing CMSG_DATA
	tcp_bbr: record "full bw reached" decision in new full_bw_reached bit
	tcp md5sig: Use skb's saddr when replying to an incoming segment
	tg3: Fix rx hang on MTU change with 5717/5719
	net: ipv4: fix for a race condition in raw_sendmsg
	net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case
	sctp: Replace use of sockets_allocated with specified macro.
	adding missing rcu_read_unlock in ipxip6_rcv
	ipv4: Fix use-after-free when flushing FIB tables
	net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks
	net: fec: Allow reception of frames bigger than 1522 bytes
	net: Fix double free and memory corruption in get_net_ns_by_id()
	net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround
	sock: free skb in skb_complete_tx_timestamp on error
	tcp: invalidate rate samples during SACK reneging
	net/mlx5: Fix rate limit packet pacing naming and struct
	net/mlx5e: Fix features check of IPv6 traffic
	net/mlx5e: Fix possible deadlock of VXLAN lock
	net/mlx5e: Add refcount to VXLAN structure
	net/mlx5e: Prevent possible races in VXLAN control flow
	net/mlx5: Fix error flow in CREATE_QP command
	s390/qeth: apply takeover changes when mode is toggled
	s390/qeth: don't apply takeover changes to RXIP
	s390/qeth: lock IP table while applying takeover changes
	s390/qeth: update takeover IPs after configuration change
	usbip: fix usbip bind writing random string after command in match_busid
	usbip: prevent leaking socket pointer address in messages
	usbip: stub: stop printing kernel pointer addresses in messages
	usbip: vhci: stop printing kernel pointer addresses in messages
	USB: serial: ftdi_sio: add id for Airbus DS P8GR
	USB: serial: qcserial: add Sierra Wireless EM7565
	USB: serial: option: add support for Telit ME910 PID 0x1101
	USB: serial: option: adding support for YUGA CLM920-NC5
	usb: Add device quirk for Logitech HD Pro Webcam C925e
	usb: add RESET_RESUME for ELSA MicroLink 56K
	USB: Fix off by one in type-specific length check of BOS SSP capability
	usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201
	timers: Use deferrable base independent of base::nohz_active
	timers: Invoke timer_start_debug() where it makes sense
	timers: Reinitialize per cpu bases on hotplug
	nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
	x86/smpboot: Remove stale TLB flush invocations
	n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD)
	tty: fix tty_ldisc_receive_buf() documentation
	mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP
	Linux 4.9.74

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-02 20:45:15 +01:00
Thomas Gleixner
e8119ac05d nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
commit 5d62c183f9 upstream.

The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
subsequently invokes tick_nohz_stop_sched_tick() are:

  if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))

If need_resched() is not set, but a timer softirq is pending then this is
an indication that the softirq code punted and delegated the execution to
softirqd. need_resched() is not true because the current interrupted task
takes precedence over softirqd.

Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
timer interrupts because the timer wheel contains an expired timer, but
softirqs are not yet executed. So it returns an immediate expiry request,
which causes the timer to fire immediately again. Lather, rinse and
repeat....

Prevent that by adding a check for a pending timer soft interrupt to the
conditions in tick_nohz_stop_sched_tick() which avoid calling
get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
prevents a repetitive programming of an already expired timer.

Reported-by: Sebastian Siewior <bigeasy@linutronix.d>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:35:17 +01:00
Thomas Gleixner
249d4a9b32 timers: Reinitialize per cpu bases on hotplug
commit 26456f87ac upstream.

The timer wheel bases are not (re)initialized on CPU hotplug. That leaves
them with a potentially stale clk and next_expiry valuem, which can cause
trouble then the CPU is plugged.

Add a prepare callback which forwards the clock, sets next_expiry to far in
the future and reset the control flags to a known state.

Set base->must_forward_clk so the first timer which is queued will try to
forward the clock to current jiffies.

Fixes: 500462a9de ("timers: Switch to a non-cascading wheel")
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:35:17 +01:00
Thomas Gleixner
574e543ff9 timers: Invoke timer_start_debug() where it makes sense
commit fd45bb77ad upstream.

The timer start debug function is called before the proper timer base is
set. As a consequence the trace data contains the stale CPU and flags
values.

Call the debug function after setting the new base and flags.

Fixes: 500462a9de ("timers: Switch to a non-cascading wheel")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: rt@linutronix.de
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: https://lkml.kernel.org/r/20171222145337.792907137@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:35:17 +01:00
Anna-Maria Gleixner
d840687aa8 timers: Use deferrable base independent of base::nohz_active
commit ced6d5c11d upstream.

During boot and before base::nohz_active is set in the timer bases, deferrable
timers are enqueued into the standard timer base. This works correctly as
long as base::nohz_active is false.

Once it base::nohz_active is set and a timer which was enqueued before that
is accessed the lock selector code choses the lock of the deferred
base. This causes unlocked access to the standard base and in case the
timer is removed it does not clear the pending flag in the standard base
bitmap which causes get_next_timer_interrupt() to return bogus values.

To prevent that, the deferrable timers must be enqueued in the deferrable
base, even when base::nohz_active is not set. Those deferrable timers also
need to be expired unconditional.

Fixes: 500462a9de ("timers: Switch to a non-cascading wheel")
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: rt@linutronix.de
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20171222145337.633328378@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:35:17 +01:00
Steven Rostedt (VMware)
2e0d458c31 ring-buffer: Mask out the info bits when returning buffer page length
commit 45d8b80c2a upstream.

Two info bits were added to the "commit" part of the ring buffer data page
when returned to be consumed. This was to inform the user space readers that
events have been missed, and that the count may be stored at the end of the
page.

What wasn't handled, was the splice code that actually called a function to
return the length of the data in order to zero out the rest of the page
before sending it up to user space. These data bits were returned with the
length making the value negative, and that negative value was not checked.
It was compared to PAGE_SIZE, and only used if the size was less than
PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
unsigned compare, meaning the negative size value did not end up causing a
large portion of memory to be randomly zeroed out.

Fixes: 66a8cb95ed ("ring-buffer: Add place holder recording of dropped events")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:35:07 +01:00
Jing Xia
81e155e7b0 tracing: Fix crash when it fails to alloc ring buffer
commit 24f2aaf952 upstream.

Double free of the ring buffer happens when it fails to alloc new
ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured.
The root cause is that the pointer is not set to NULL after the buffer
is freed in allocate_trace_buffers(), and the freeing of the ring
buffer is invoked again later if the pointer is not equal to Null,
as:

instance_mkdir()
    |-allocate_trace_buffers()
        |-allocate_trace_buffer(tr, &tr->trace_buffer...)
	|-allocate_trace_buffer(tr, &tr->max_buffer...)

          // allocate fail(-ENOMEM),first free
          // and the buffer pointer is not set to null
        |-ring_buffer_free(tr->trace_buffer.buffer)

       // out_free_tr
    |-free_trace_buffers()
        |-free_trace_buffer(&tr->trace_buffer);

	      //if trace_buffer is not null, free again
	    |-ring_buffer_free(buf->buffer)
                |-rb_free_cpu_buffer(buffer->buffers[cpu])
                    // ring_buffer_per_cpu is null, and
                    // crash in ring_buffer_per_cpu->pages

Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com

Fixes: 737223fbca ("tracing: Consolidate buffer allocation code")
Signed-off-by: Jing Xia <jing.xia@spreadtrum.com>
Signed-off-by: Chunyan Zhang <chunyan.zhang@spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:35:07 +01:00
Steven Rostedt (VMware)
5dc4cd2688 tracing: Fix possible double free on failure of allocating trace buffer
commit 4397f04575 upstream.

Jing Xia and Chunyan Zhang reported that on failing to allocate part of the
tracing buffer, memory is freed, but the pointers that point to them are not
initialized back to NULL, and later paths may try to free the freed memory
again. Jing and Chunyan fixed one of the locations that does this, but
missed a spot.

Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com

Fixes: 737223fbca ("tracing: Consolidate buffer allocation code")
Reported-by: Jing Xia <jing.xia@spreadtrum.com>
Reported-by: Chunyan Zhang <chunyan.zhang@spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:35:07 +01:00
Steven Rostedt (VMware)
6edea15d12 tracing: Remove extra zeroing out of the ring buffer page
commit 6b7e633fe9 upstream.

The ring_buffer_read_page() takes care of zeroing out any extra data in the
page that it returns. There's no need to zero it out again from the
consumer. It was removed from one consumer of this function, but
read_buffers_splice_read() did not remove it, and worse, it contained a
nasty bug because of it.

Fixes: 2711ca237a ("ring-buffer: Move zeroing out excess in page to ring buffer code")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:35:06 +01:00
Greg Kroah-Hartman
f3f3457d45 Merge 4.9.73 into android-4.9
Changes in 4.9.73
	ACPI: APEI / ERST: Fix missing error handling in erst_reader()
	acpi, nfit: fix health event notification
	crypto: mcryptd - protect the per-CPU queue with a lock
	mfd: cros ec: spi: Don't send first message too soon
	mfd: twl4030-audio: Fix sibling-node lookup
	mfd: twl6040: Fix child-node lookup
	ALSA: rawmidi: Avoid racy info ioctl via ctl device
	ALSA: usb-audio: Add native DSD support for Esoteric D-05X
	ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU
	PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
	parisc: Hide Diva-built-in serial aux and graphics card
	spi: xilinx: Detect stall with Unknown commands
	pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
	KVM: X86: Fix load RFLAGS w/o the fixed bit
	kvm: x86: fix RSM when PCID is non-zero
	clk: sunxi: sun9i-mmc: Implement reset callback for reset controls
	powerpc/perf: Dereference BHRB entries safely
	libnvdimm, pfn: fix start_pad handling for aligned namespaces
	net: mvneta: clear interface link status on port disable
	net: mvneta: use proper rxq_number in loop on rx queues
	net: mvneta: eliminate wrong call to handle rx descriptor error
	bpf/verifier: Fix states_equal() comparison of pointer and UNKNOWN
	Linux 4.9.73

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-29 18:18:15 +01:00
Ben Hutchings
37435f7e80 bpf/verifier: Fix states_equal() comparison of pointer and UNKNOWN
An UNKNOWN_VALUE is not supposed to be derived from a pointer, unless
pointer leaks are allowed.  Therefore, states_equal() must not treat
a state with a pointer in a register as "equal" to a state with an
UNKNOWN_VALUE in that register.

This was fixed differently upstream, but the code around here was
largely rewritten in 4.14 by commit f1174f77b5 "bpf/verifier: rework
value tracking".  The bug can be detected by the bpf/verifier sub-test
"pointer/scalar confusion in state equality check (way 1)".

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Jann Horn <jannh@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
2017-12-29 17:43:00 +01:00
Greg Kroah-Hartman
cb7518e616 Merge 4.9.72 into android-4.9
Changes in 4.9.72
	cxl: Check if vphb exists before iterating over AFU devices
	arm64: Initialise high_memory global variable earlier
	ALSA: hda - add support for docking station for HP 820 G2
	ALSA: hda - add support for docking station for HP 840 G3
	kvm: fix usage of uninit spinlock in avic_vm_destroy()
	HID: corsair: support for K65-K70 Rapidfire and Scimitar Pro RGB
	HID: corsair: Add driver Scimitar Pro RGB gaming mouse 1b1c:1b3e support to hid-corsair
	arm: kprobes: Fix the return address of multiple kretprobes
	arm: kprobes: Align stack to 8-bytes in test code
	nvme-loop: handle cpu unplug when re-establishing the controller
	cpuidle: Validate cpu_dev in cpuidle_add_sysfs()
	r8152: fix the list rx_done may be used without initialization
	crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex
	vsock: track pkt owner vsock
	vhost-vsock: add pkt cancel capability
	vsock: cancel packets when failing to connect
	sch_dsmark: fix invalid skb_cow() usage
	bna: integer overflow bug in debugfs
	sctp: out_qlen should be updated when pruning unsent queue
	net: qmi_wwan: Add USB IDs for MDM6600 modem on Motorola Droid 4
	hwmon: (max31790) Set correct PWM value
	usb: gadget: f_uvc: Sanity check wMaxPacketSize for SuperSpeed
	usb: gadget: udc: remove pointer dereference after free
	netfilter: nfnl_cthelper: fix runtime expectation policy updates
	netfilter: nfnl_cthelper: Fix memory leak
	iommu/exynos: Workaround FLPD cache flush issues for SYSMMU v5
	r8152: fix the rx early size of RTL8153
	tipc: fix nametbl deadlock at tipc_nametbl_unsubscribe
	inet: frag: release spinlock before calling icmp_send()
	pinctrl: st: add irq_request/release_resources callbacks
	scsi: lpfc: Fix PT2PT PRLI reject
	kvm: vmx: Flush TLB when the APIC-access address changes
	KVM: x86: correct async page present tracepoint
	KVM: VMX: Fix enable VPID conditions
	ARM: dts: ti: fix PCI bus dtc warnings
	hwmon: (asus_atk0110) fix uninitialized data access
	HID: xinmo: fix for out of range for THT 2P arcade controller.
	ASoC: STI: Fix reader substream pointer set
	r8152: prevent the driver from transmitting packets with carrier off
	s390/qeth: size calculation outbound buffers
	s390/qeth: no ETH header for outbound AF_IUCV
	bna: avoid writing uninitialized data into hw registers
	i40iw: Receive netdev events post INET_NOTIFIER state
	IB/core: Protect against self-requeue of a cq work item
	infiniband: Fix alignment of mmap cookies to support VIPT caching
	nbd: set queue timeout properly
	net: Do not allow negative values for busy_read and busy_poll sysctl interfaces
	IB/rxe: double free on error
	IB/rxe: increment msn only when completing a request
	i40e: Do not enable NAPI on q_vectors that have no rings
	RDMA/iser: Fix possible mr leak on device removal event
	irda: vlsi_ir: fix check for DMA mapping errors
	netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash table
	netfilter: nf_nat_snmp: Fix panic when snmp_trap_helper fails to register
	ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend
	cpufreq: Fix creation of symbolic links to policy directories
	net: ipconfig: fix ic_close_devs() use-after-free
	KVM: pci-assign: do not map smm memory slot pages in vt-d page tables
	virtio-balloon: use actual number of stats for stats queue buffers
	virtio_balloon: prevent uninitialized variable use
	isdn: kcapi: avoid uninitialized data
	net: moxa: fix TX overrun memory leak
	xhci: plat: Register shutdown for xhci_plat
	netfilter: nfnetlink_queue: fix secctx memory leak
	Btrfs: fix an integer overflow check
	ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
	cpuidle: powernv: Pass correct drv->cpumask for registration
	bnxt_en: Fix NULL pointer dereference in reopen failure path
	backlight: pwm_bl: Fix overflow condition
	crypto: crypto4xx - increase context and scatter ring buffer elements
	rtc: pl031: make interrupt optional
	kvm, mm: account kvm related kmem slabs to kmemcg
	net: phy: at803x: Change error to EINVAL for invalid MAC
	PCI: Avoid bus reset if bridge itself is broken
	scsi: cxgb4i: fix Tx skb leak
	scsi: mpt3sas: Fix IO error occurs on pulling out a drive from RAID1 volume created on two SATA drive
	PCI: Create SR-IOV virtfn/physfn links before attaching driver
	PM / OPP: Move error message to debug level
	igb: check memory allocation failure
	ixgbe: fix use of uninitialized padding
	IB/rxe: check for allocation failure on elem
	PCI/AER: Report non-fatal errors only to the affected endpoint
	tracing: Exclude 'generic fields' from histograms
	ASoC: img-parallel-out: Add pm_runtime_get/put to set_fmt callback
	fm10k: fix mis-ordered parameters in declaration for .ndo_set_vf_bw
	scsi: lpfc: Fix secure firmware updates
	scsi: lpfc: PLOGI failures during NPIV testing
	vfio/pci: Virtualize Maximum Payload Size
	fm10k: ensure we process SM mbx when processing VF mbx
	net: ipv6: send NS for DAD when link operationally up
	staging: greybus: light: Release memory obtained by kasprintf
	clk: sunxi-ng: sun6i: Rename HDMI DDC clock to avoid name collision
	tcp: fix under-evaluated ssthresh in TCP Vegas
	rtc: set the alarm to the next expiring timer
	cpuidle: fix broadcast control when broadcast can not be entered
	thermal: hisilicon: Handle return value of clk_prepare_enable
	thermal/drivers/hisi: Fix missing interrupt enablement
	thermal/drivers/hisi: Fix kernel panic on alarm interrupt
	thermal/drivers/hisi: Simplify the temperature/step computation
	thermal/drivers/hisi: Fix multiple alarm interrupts firing
	MIPS: math-emu: Fix final emulation phase for certain instructions
	platform/x86: asus-wireless: send an EV_SYN/SYN_REPORT between state changes
	Revert "Bluetooth: btusb: driver to enable the usb-wakeup feature"
	bpf: adjust insn_aux_data when patching insns
	bpf: fix branch pruning logic
	bpf: reject out-of-bounds stack pointer calculation
	bpf: fix incorrect sign extension in check_alu_op()
	sparc32: Export vac_cache_size to fix build error
	Linux 4.9.72

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-27 13:38:06 +01:00
Daniel Borkmann
3695b3b185 bpf: fix incorrect sign extension in check_alu_op()
From: Jann Horn <jannh@google.com>

[ Upstream commit 95a762e2c8 ]

Distinguish between
BPF_ALU64|BPF_MOV|BPF_K (load 32-bit immediate, sign-extended to 64-bit)
and BPF_ALU|BPF_MOV|BPF_K (load 32-bit immediate, zero-padded to 64-bit);
only perform sign extension in the first case.

Starting with v4.14, this is exploitable by unprivileged users as long as
the unprivileged_bpf_disabled sysctl isn't set.

Debian assigned CVE-2017-16995 for this issue.

v3:
 - add CVE number (Ben Hutchings)

Fixes: 484611357c ("bpf: allow access into map value arrays")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:23:47 +01:00
Daniel Borkmann
d75d3ee237 bpf: reject out-of-bounds stack pointer calculation
From: Jann Horn <jannh@google.com>

Reject programs that compute wildly out-of-bounds stack pointers.
Otherwise, pointers can be computed with an offset that doesn't fit into an
`int`, causing security issues in the stack memory access check (as well as
signed integer overflow during offset addition).

This is a fix specifically for the v4.9 stable tree because the mainline
code looks very different at this point.

Fixes: 7bca0a9702 ("bpf: enhance verifier to understand stack pointer arithmetic")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:23:47 +01:00
Daniel Borkmann
7b5b73ea87 bpf: fix branch pruning logic
From: Alexei Starovoitov <ast@fb.com>

[ Upstream commit c131187db2 ]

when the verifier detects that register contains a runtime constant
and it's compared with another constant it will prune exploration
of the branch that is guaranteed not to be taken at runtime.
This is all correct, but malicious program may be constructed
in such a way that it always has a constant comparison and
the other branch is never taken under any conditions.
In this case such path through the program will not be explored
by the verifier. It won't be taken at run-time either, but since
all instructions are JITed the malicious program may cause JITs
to complain about using reserved fields, etc.
To fix the issue we have to track the instructions explored by
the verifier and sanitize instructions that are dead at run time
with NOPs. We cannot reject such dead code, since llvm generates
it for valid C code, since it doesn't do as much data flow
analysis as the verifier does.

Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:23:47 +01:00
Daniel Borkmann
565f012f5a bpf: adjust insn_aux_data when patching insns
From: Alexei Starovoitov <ast@fb.com>

[ Upstream commit 8041902dae ]

convert_ctx_accesses() replaces single bpf instruction with a set of
instructions. Adjust corresponding insn_aux_data while patching.
It's needed to make sure subsequent 'for(all insn)' loops
have matching insn and insn_aux_data.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:23:47 +01:00
Tom Zanussi
6af9b18a2e tracing: Exclude 'generic fields' from histograms
[ Upstream commit a15f7fc203 ]

There are a small number of 'generic fields' (comm/COMM/cpu/CPU) that
are found by trace_find_event_field() but are only meant for
filtering.  Specifically, they unlike normal fields, they have a size
of 0 and thus wreak havoc when used as a histogram key.

Exclude these (return -EINVAL) when used as histogram keys.

Link: http://lkml.kernel.org/r/956154cbc3e8a4f0633d619b886c97f0f0edf7b4.1506105045.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:23:44 +01:00
Greg Kroah-Hartman
319c8e1bc7 Merge 4.9.71 into android-4.9
Changes in 4.9.71
	mfd: fsl-imx25: Clean up irq settings during removal
	crypto: rsa - fix buffer overread when stripping leading zeroes
	crypto: hmac - require that the underlying hash algorithm is unkeyed
	crypto: salsa20 - fix blkcipher_walk API usage
	autofs: fix careless error in recent commit
	tracing: Allocate mask_str buffer dynamically
	USB: uas and storage: Add US_FL_BROKEN_FUA for another JMicron JMS567 ID
	USB: core: prevent malicious bNumInterfaces overflow
	usbip: fix stub_rx: get_pipe() to validate endpoint number
	usb: add helper to extract bits 12:11 of wMaxPacketSize
	usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input
	usbip: fix stub_send_ret_submit() vulnerability to null transfer_buffer
	ceph: drop negative child dentries before try pruning inode's alias
	usb: xhci: fix TDS for MTK xHCI1.1
	Bluetooth: btusb: driver to enable the usb-wakeup feature
	xhci: Don't add a virt_dev to the devs array before it's fully allocated
	nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests
	sched/rt: Do not pull from current CPU if only one CPU to pull
	eeprom: at24: change nvmem stride to 1
	dmaengine: dmatest: move callback wait queue to thread context
	ext4: fix fdatasync(2) after fallocate(2) operation
	ext4: fix crash when a directory's i_size is too small
	mac80211: Fix addition of mesh configuration element
	usb: phy: isp1301: Add OF device ID table
	KVM: nVMX: do not warn when MSR bitmap address is not backed
	usb: xhci-mtk: check hcc_params after adding primary hcd
	md-cluster: free md_cluster_info if node leave cluster
	userfaultfd: shmem: __do_fault requires VM_FAULT_NOPAGE
	userfaultfd: selftest: vm: allow to build in vm/ directory
	net: initialize msg.msg_flags in recvfrom
	bnxt_en: Ignore 0 value in autoneg supported speed from firmware.
	net: bcmgenet: correct the RBUF_OVFL_CNT and RBUF_ERR_CNT MIB values
	net: bcmgenet: correct MIB access of UniMAC RUNT counters
	net: bcmgenet: reserved phy revisions must be checked first
	net: bcmgenet: power down internal phy if open or resume fails
	net: bcmgenet: synchronize irq0 status between the isr and task
	net: bcmgenet: Power up the internal PHY before probing the MII
	rxrpc: Wake up the transmitter if Rx window size increases on the peer
	net/mlx5: Fix create autogroup prev initializer
	net/mlx5: Don't save PCI state when PCI error is detected
	iommu/io-pgtable-arm-v7s: Check for leaf entry before dereferencing it
	drm/amdgpu: fix parser init error path to avoid crash in parser fini
	NFSD: fix nfsd_minorversion(.., NFSD_AVAIL)
	NFSD: fix nfsd_reset_versions for NFSv4.
	Input: i8042 - add TUXEDO BU1406 (N24_25BU) to the nomux list
	drm/omap: fix dmabuf mmap for dma_alloc'ed buffers
	netfilter: bridge: honor frag_max_size when refragmenting
	ASoC: rsnd: fix sound route path when using SRC6/SRC9
	blk-mq: Fix tagset reinit in the presence of cpu hot-unplug
	writeback: fix memory leak in wb_queue_work()
	net: wimax/i2400m: fix NULL-deref at probe
	dmaengine: Fix array index out of bounds warning in __get_unmap_pool()
	irqchip/mvebu-odmi: Select GENERIC_MSI_IRQ_DOMAIN
	net: Resend IGMP memberships upon peer notification.
	mlxsw: reg: Fix SPVM max record count
	mlxsw: reg: Fix SPVMLR max record count
	qed: Align CIDs according to DORQ requirement
	qed: Fix mapping leak on LL2 rx flow
	qed: Fix interrupt flags on Rx LL2
	drm: amd: remove broken include path
	intel_th: pci: Add Gemini Lake support
	openrisc: fix issue handling 8 byte get_user calls
	ASoC: rcar: clear DE bit only in PDMACHCR when it stops
	scsi: hpsa: update check for logical volume status
	scsi: hpsa: limit outstanding rescans
	scsi: hpsa: do not timeout reset operations
	fjes: Fix wrong netdevice feature flags
	drm/radeon/si: add dpm quirk for Oland
	Drivers: hv: util: move waiting for release to hv_utils_transport itself
	iwlwifi: mvm: cleanup pending frames in DQA mode
	sched/deadline: Add missing update_rq_clock() in dl_task_timer()
	sched/deadline: Make sure the replenishment timer fires in the next period
	sched/deadline: Throttle a constrained deadline task activated after the deadline
	sched/deadline: Use deadline instead of period when calculating overflow
	mmc: mediatek: Fixed bug where clock frequency could be set wrong
	drm/radeon: reinstate oland workaround for sclk
	afs: Fix missing put_page()
	afs: Populate group ID from vnode status
	afs: Adjust mode bits processing
	afs: Deal with an empty callback array
	afs: Flush outstanding writes when an fd is closed
	afs: Migrate vlocation fields to 64-bit
	afs: Prevent callback expiry timer overflow
	afs: Fix the maths in afs_fs_store_data()
	afs: Invalid op ID should abort with RXGEN_OPCODE
	afs: Better abort and net error handling
	afs: Populate and use client modification time
	afs: Fix page leak in afs_write_begin()
	afs: Fix afs_kill_pages()
	afs: Fix abort on signal while waiting for call completion
	nvme-loop: fix a possible use-after-free when destroying the admin queue
	nvmet: confirm sq percpu has scheduled and switched to atomic
	nvmet-rdma: Fix a possible uninitialized variable dereference
	net/mlx4_core: Avoid delays during VF driver device shutdown
	net: mpls: Fix nexthop alive tracking on down events
	rxrpc: Ignore BUSY packets on old calls
	tty: don't panic on OOM in tty_set_ldisc()
	tty: fix data race in tty_ldisc_ref_wait()
	perf symbols: Fix symbols__fixup_end heuristic for corner cases
	efi/esrt: Cleanup bad memory map log messages
	NFSv4.1 respect server's max size in CREATE_SESSION
	btrfs: add missing memset while reading compressed inline extents
	target: Use system workqueue for ALUA transitions
	target: fix ALUA transition timeout handling
	target: fix race during implicit transition work flushes
	Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
	HID: cp2112: fix broken gpio_direction_input callback
	sfc: don't warn on successful change of MAC
	fbdev: controlfb: Add missing modes to fix out of bounds access
	video: udlfb: Fix read EDID timeout
	video: fbdev: au1200fb: Release some resources if a memory allocation fails
	video: fbdev: au1200fb: Return an error code if a memory allocation fails
	rtc: pcf8563: fix output clock rate
	ASoC: Intel: Skylake: Fix uuid_module memory leak in failure case
	dmaengine: ti-dma-crossbar: Correct am335x/am43xx mux value type
	PCI/PME: Handle invalid data when reading Root Status
	powerpc/powernv/cpufreq: Fix the frequency read by /proc/cpuinfo
	PCI: Do not allocate more buses than available in parent
	iommu/mediatek: Fix driver name
	netfilter: ipvs: Fix inappropriate output of procfs
	powerpc/opal: Fix EBUSY bug in acquiring tokens
	powerpc/ipic: Fix status get and status clear
	platform/x86: intel_punit_ipc: Fix resource ioremap warning
	target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd()
	iscsi-target: fix memory leak in lio_target_tiqn_addtpg()
	target:fix condition return in core_pr_dump_initiator_port()
	target/file: Do not return error for UNMAP if length is zero
	badblocks: fix wrong return value in badblocks_set if badblocks are disabled
	iommu/amd: Limit the IOVA page range to the specified addresses
	xfs: truncate pagecache before writeback in xfs_setattr_size()
	arm-ccn: perf: Prevent module unload while PMU is in use
	crypto: tcrypt - fix buffer lengths in test_aead_speed()
	mm: Handle 0 flags in _calc_vm_trans() macro
	clk: mediatek: add the option for determining PLL source clock
	clk: imx6: refine hdmi_isfr's parent to make HDMI work on i.MX6 SoCs w/o VPU
	clk: hi6220: mark clock cs_atb_syspll as critical
	clk: tegra: Fix cclk_lp divisor register
	ppp: Destroy the mutex when cleanup
	ASoC: rsnd: rsnd_ssi_run_mods() needs to care ssi_parent_mod
	thermal/drivers/step_wise: Fix temperature regulation misbehavior
	scsi: scsi_debug: write_same: fix error report
	GFS2: Take inode off order_write list when setting jdata flag
	bcache: explicitly destroy mutex while exiting
	bcache: fix wrong cache_misses statistics
	Ib/hfi1: Return actual operational VLs in port info query
	arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
	btrfs: tests: Fix a memory leak in error handling path in 'run_test()'
	platform/x86: hp_accel: Add quirk for HP ProBook 440 G4
	nvme: use kref_get_unless_zero in nvme_find_get_ns
	l2tp: cleanup l2tp_tunnel_delete calls
	xfs: fix log block underflow during recovery cycle verification
	xfs: fix incorrect extent state in xfs_bmap_add_extent_unwritten_real
	RDMA/cxgb4: Declare stag as __be32
	PCI: Detach driver before procfs & sysfs teardown on device remove
	scsi: hpsa: cleanup sas_phy structures in sysfs when unloading
	scsi: hpsa: destroy sas transport properties before scsi_host
	powerpc/perf/hv-24x7: Fix incorrect comparison in memord
	soc: mediatek: pwrap: fix compiler errors
	tty fix oops when rmmod 8250
	usb: musb: da8xx: fix babble condition handling
	pinctrl: adi2: Fix Kconfig build problem
	raid5: Set R5_Expanded on parity devices as well as data.
	scsi: scsi_devinfo: Add REPORTLUN2 to EMC SYMMETRIX blacklist entry
	IB/core: Fix calculation of maximum RoCE MTU
	vt6655: Fix a possible sleep-in-atomic bug in vt6655_suspend
	rtl8188eu: Fix a possible sleep-in-atomic bug in rtw_createbss_cmd
	rtl8188eu: Fix a possible sleep-in-atomic bug in rtw_disassoc_cmd
	scsi: sd: change manage_start_stop to bool in sysfs interface
	scsi: sd: change allow_restart to bool in sysfs interface
	scsi: bfa: integer overflow in debugfs
	udf: Avoid overflow when session starts at large offset
	macvlan: Only deliver one copy of the frame to the macvlan interface
	RDMA/cma: Avoid triggering undefined behavior
	IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop
	icmp: don't fail on fragment reassembly time exceeded
	ath9k: fix tx99 potential info leak
	Linux 4.9.71

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-20 10:51:15 +01:00
Steven Rostedt (VMware)
28714e962a sched/deadline: Use deadline instead of period when calculating overflow
[ Upstream commit 2317d5f1c3 ]

I was testing Daniel's changes with his test case, and tweaked it a
little. Instead of having the runtime equal to the deadline, I
increased the deadline ten fold.

Daniel's test case had:

	attr.sched_runtime  = 2 * 1000 * 1000;		/* 2 ms */
	attr.sched_deadline = 2 * 1000 * 1000;		/* 2 ms */
	attr.sched_period   = 2 * 1000 * 1000 * 1000;	/* 2 s */

To make it more interesting, I changed it to:

	attr.sched_runtime  =  2 * 1000 * 1000;		/* 2 ms */
	attr.sched_deadline = 20 * 1000 * 1000;		/* 20 ms */
	attr.sched_period   =  2 * 1000 * 1000 * 1000;	/* 2 s */

The results were rather surprising. The behavior that Daniel's patch
was fixing came back. The task started using much more than .1% of the
CPU. More like 20%.

Looking into this I found that it was due to the dl_entity_overflow()
constantly returning true. That's because it uses the relative period
against relative runtime vs the absolute deadline against absolute
runtime.

  runtime / (deadline - t) > dl_runtime / dl_period

There's even a comment mentioning this, and saying that when relative
deadline equals relative period, that the equation is the same as using
deadline instead of period. That comment is backwards! What we really
want is:

  runtime / (deadline - t) > dl_runtime / dl_deadline

We care about if the runtime can make its deadline, not its period. And
then we can say "when the deadline equals the period, the equation is
the same as using dl_period instead of dl_deadline".

After correcting this, now when the task gets enqueued, it can throttle
correctly, and Daniel's fix to the throttling of sleeping deadline
tasks works even when the runtime and deadline are not the same.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
Link: http://lkml.kernel.org/r/02135a27f1ae3fe5fd032568a5a2f370e190e8d7.1488392936.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:07:23 +01:00
Daniel Bristot de Oliveira
a2e29113f1 sched/deadline: Throttle a constrained deadline task activated after the deadline
[ Upstream commit df8eac8caf ]

During the activation, CBS checks if it can reuse the current task's
runtime and period. If the deadline of the task is in the past, CBS
cannot use the runtime, and so it replenishes the task. This rule
works fine for implicit deadline tasks (deadline == period), and the
CBS was designed for implicit deadline tasks. However, a task with
constrained deadline (deadine < period) might be awakened after the
deadline, but before the next period. In this case, replenishing the
task would allow it to run for runtime / deadline. As in this case
deadline < period, CBS enables a task to run for more than the
runtime / period. In a very loaded system, this can cause a domino
effect, making other tasks miss their deadlines.

To avoid this problem, in the activation of a constrained deadline
task after the deadline but before the next period, throttle the
task and set the replenishing timer to the begin of the next period,
unless it is boosted.

Reproducer:

 --------------- %< ---------------
  int main (int argc, char **argv)
  {
	int ret;
	int flags = 0;
	unsigned long l = 0;
	struct timespec ts;
	struct sched_attr attr;

	memset(&attr, 0, sizeof(attr));
	attr.size = sizeof(attr);

	attr.sched_policy   = SCHED_DEADLINE;
	attr.sched_runtime  = 2 * 1000 * 1000;		/* 2 ms */
	attr.sched_deadline = 2 * 1000 * 1000;		/* 2 ms */
	attr.sched_period   = 2 * 1000 * 1000 * 1000;	/* 2 s */

	ts.tv_sec = 0;
	ts.tv_nsec = 2000 * 1000;			/* 2 ms */

	ret = sched_setattr(0, &attr, flags);

	if (ret < 0) {
		perror("sched_setattr");
		exit(-1);
	}

	for(;;) {
		/* XXX: you may need to adjust the loop */
		for (l = 0; l < 150000; l++);
		/*
		 * The ideia is to go to sleep right before the deadline
		 * and then wake up before the next period to receive
		 * a new replenishment.
		 */
		nanosleep(&ts, NULL);
	}

	exit(0);
  }
  --------------- >% ---------------

On my box, this reproducer uses almost 50% of the CPU time, which is
obviously wrong for a task with 2/2000 reservation.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
Link: http://lkml.kernel.org/r/edf58354e01db46bf42df8d2dd32418833f68c89.1488392936.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:07:23 +01:00
Daniel Bristot de Oliveira
9cc56a00ea sched/deadline: Make sure the replenishment timer fires in the next period
[ Upstream commit 5ac69d3778 ]

Currently, the replenishment timer is set to fire at the deadline
of a task. Although that works for implicit deadline tasks because the
deadline is equals to the begin of the next period, that is not correct
for constrained deadline tasks (deadline < period).

For instance:

f.c:
 --------------- %< ---------------
int main (void)
{
	for(;;);
}
 --------------- >% ---------------

  # gcc -o f f.c

  # trace-cmd record -e sched:sched_switch                              \
				   -e syscalls:sys_exit_sched_setattr   \
   chrt -d --sched-runtime  490000000					\
           --sched-deadline 500000000					\
	   --sched-period  1000000000 0 ./f

  # trace-cmd report | grep "{pid of ./f}"

After setting parameters, the task is replenished and continue running
until being throttled:

         f-11295 [003] 13322.113776: sys_exit_sched_setattr: 0x0

The task is throttled after running 492318 ms, as expected:

         f-11295 [003] 13322.606094: sched_switch:   f:11295 [-1] R ==> watchdog/3:32 [0]

But then, the task is replenished 500719 ms after the first
replenishment:

    <idle>-0     [003] 13322.614495: sched_switch:   swapper/3:0 [120] R ==> f:11295 [-1]

Running for 490277 ms:

         f-11295 [003] 13323.104772: sched_switch:   f:11295 [-1] R ==>  swapper/3:0 [120]

Hence, in the first period, the task runs 2 * runtime, and that is a bug.

During the first replenishment, the next deadline is set one period away.
So the runtime / period starts to be respected. However, as the second
replenishment took place in the wrong instant, the next replenishment
will also be held in a wrong instant of time. Rather than occurring in
the nth period away from the first activation, it is taking place
in the (nth period - relative deadline).

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Luca Abeni <luca.abeni@santannapisa.it>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
Link: http://lkml.kernel.org/r/ac50d89887c25285b47465638354b63362f8adff.1488392936.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:07:23 +01:00
Wanpeng Li
0a4d4dac5e sched/deadline: Add missing update_rq_clock() in dl_task_timer()
[ Upstream commit dcc3b5ffe1 ]

The following warning can be triggered by hot-unplugging the CPU
on which an active SCHED_DEADLINE task is running on:

 ------------[ cut here ]------------
 WARNING: CPU: 7 PID: 0 at kernel/sched/sched.h:833 replenish_dl_entity+0x71e/0xc40
 rq->clock_update_flags < RQCF_ACT_SKIP
 CPU: 7 PID: 0 Comm: swapper/7 Tainted: G    B           4.11.0-rc1+ #24
 Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
 Call Trace:
  <IRQ>
  dump_stack+0x85/0xc4
  __warn+0x172/0x1b0
  warn_slowpath_fmt+0xb4/0xf0
  ? __warn+0x1b0/0x1b0
  ? debug_check_no_locks_freed+0x2c0/0x2c0
  ? cpudl_set+0x3d/0x2b0
  replenish_dl_entity+0x71e/0xc40
  enqueue_task_dl+0x2ea/0x12e0
  ? dl_task_timer+0x777/0x990
  ? __hrtimer_run_queues+0x270/0xa50
  dl_task_timer+0x316/0x990
  ? enqueue_task_dl+0x12e0/0x12e0
  ? enqueue_task_dl+0x12e0/0x12e0
  __hrtimer_run_queues+0x270/0xa50
  ? hrtimer_cancel+0x20/0x20
  ? hrtimer_interrupt+0x119/0x600
  hrtimer_interrupt+0x19c/0x600
  ? trace_hardirqs_off+0xd/0x10
  local_apic_timer_interrupt+0x74/0xe0
  smp_apic_timer_interrupt+0x76/0xa0
  apic_timer_interrupt+0x93/0xa0

The DL task will be migrated to a suitable later deadline rq once the DL
timer fires and currnet rq is offline. The rq clock of the new rq should
be updated. This patch fixes it by updating the rq clock after holding
the new rq's rq lock.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1488865888-15894-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:07:23 +01:00
Steven Rostedt
d266817f50 sched/rt: Do not pull from current CPU if only one CPU to pull
commit f73c52a5bc upstream.

Daniel Wagner reported a crash on the BeagleBone Black SoC.

This is a single CPU architecture, and does not have a functional
arch_send_call_function_single_ipi() implementation which can crash
the kernel if that is called.

As it only has one CPU, it shouldn't be called, but if the kernel is
compiled for SMP, the push/pull RT scheduling logic now calls it for
irq_work if the one CPU is overloaded, it can use that function to call
itself and crash the kernel.

Ideally, we should disable the SCHED_FEAT(RT_PUSH_IPI) if the system
only has a single CPU. But SCHED_FEAT is a constant if sched debugging
is turned off. Another fix can also be used, and this should also help
with normal SMP machines. That is, do not initiate the pull code if
there's only one RT overloaded CPU, and that CPU happens to be the
current CPU that is scheduling in a lower priority task.

Even on a system with many CPUs, if there's many RT tasks waiting to
run on a single CPU, and that CPU schedules in another RT task of lower
priority, it will initiate the PULL logic in case there's a higher
priority RT task on another CPU that is waiting to run. But if there is
no other CPU with waiting RT tasks, it will initiate the RT pull logic
on itself (as it still has RT tasks waiting to run). This is a wasted
effort.

Not only does this help with SMP code where the current CPU is the only
one with RT overloaded tasks, it should also solve the issue that
Daniel encountered, because it will prevent the PULL logic from
executing, as there's only one CPU on the system, and the check added
here will cause it to exit the RT pull code.

Reported-by: Daniel Wagner <wagi@monom.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Fixes: 4bdced5c9 ("sched/rt: Simplify the IPI based RT balancing logic")
Link: http://lkml.kernel.org/r/20171202130454.4cbbfe8d@vmware.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:07:17 +01:00
Changbin Du
d760f90341 tracing: Allocate mask_str buffer dynamically
commit 90e406f96f upstream.

The default NR_CPUS can be very large, but actual possible nr_cpu_ids
usually is very small. For my x86 distribution, the NR_CPUS is 8192 and
nr_cpu_ids is 4. About 2 pages are wasted.

Most machines don't have so many CPUs, so define a array with NR_CPUS
just wastes memory. So let's allocate the buffer dynamically when need.

With this change, the mutext tracing_cpumask_update_lock also can be
removed now, which was used to protect mask_str.

Link: http://lkml.kernel.org/r/1512013183-19107-1-git-send-email-changbin.du@intel.com

Fixes: 36dfe9252b ("ftrace: make use of tracing_cpumask")
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:07:15 +01:00
jianxin.pan
872270857a module: skip sublevel and crc when ver check
PD#157069: skip SUBLEVEL and crc when ver check durning insmod

When CONFIG_MODVERSIONS enabled, vermagic and crc are checked
durning insmod.

Change-Id: I6eb7bdda5b771afa754f7b783a7bbfe1be7cedd1
Signed-off-by: jianxin.pan <jianxin.pan@amlogic.com>
2017-12-19 04:11:48 -07:00
Dmitry Vyukov
9b3b5d1f25 UPSTREAM: kcov: fix comparison callback signature
Fix a silly copy-paste bug.  We truncated u32 args to u16.

Link: http://lkml.kernel.org/r/20171207101134.107168-1-dvyukov@google.com
Fixes: ded97d2c2b ("kcov: support comparison operands collection")
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: syzkaller@googlegroups.com
Cc: Alexander Potapenko <glider@google.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 64145065
(cherry-picked from 8c0431ec452de79ef3fe998c1fbb1e3d3ac13ddd)
Change-Id: Ic3872c33d03a456640dd6fdcce3b0795765dc1c0
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-18 15:21:35 -08:00
Victor Chibotaru
9412609866 UPSTREAM: kcov: support comparison operands collection
Enables kcov to collect comparison operands from instrumented code.
This is done by using Clang's -fsanitize=trace-cmp instrumentation
(currently not available for GCC).

The comparison operands help a lot in fuzz testing.  E.g.  they are used
in Syzkaller to cover the interiors of conditional statements with way
less attempts and thus make previously unreachable code reachable.

To allow separate collection of coverage and comparison operands two
different work modes are implemented.  Mode selection is now done via a
KCOV_ENABLE ioctl call with corresponding argument value.

Link: http://lkml.kernel.org/r/20171011095459.70721-1-glider@google.com
Signed-off-by: Victor Chibotaru <tchibo@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from ded97d2c2b)
Change-Id: Iaba700a3f4786048be14a5e764ccabceae114eb7
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-18 15:21:35 -08:00
Andrey Ryabinin
9dd90d6851 UPSTREAM: kcov: remove pointless current != NULL check
__sanitizer_cov_trace_pc() is a hot code, so it's worth to remove
pointless '!current' check.  Current is never NULL.

Link: http://lkml.kernel.org/r/20170929162221.32500-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from fcf4edac04)
Change-Id: Ia76e8c6cc0dc3fb796d8e8b92430fcf659b52eee
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-18 15:21:35 -08:00
Dmitry Vyukov
87a77ded9b UPSTREAM: kcov: support compat processes
Support compat processes in KCOV by providing compat_ioctl callback.
Compat mode uses the same ioctl callback: we have 2 commands that do not
use the argument and 1 that already checks that the arg does not overflow
INT_MAX.  This allows to use KCOV-guided fuzzing in compat processes.

Link: http://lkml.kernel.org/r/20170823100553.55812-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 7483e5d420)
Change-Id:I74b62f01941091649ce6e88b3130e4ca4274a8de
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-18 15:21:35 -08:00
Dmitry Vyukov
99463c8134 UPSTREAM: kcov: simplify interrupt check
in_interrupt() semantics are confusing and wrong for most users as it
also returns true when bh is disabled.  Thus we open coded a proper
check for interrupts in __sanitizer_cov_trace_pc() with a lengthy
explanatory comment.

Use the new in_task() predicate instead.

Link: http://lkml.kernel.org/r/20170321091026.139655-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: James Morse <james.morse@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from f61e869d51)
Change-Id: Ice260535314238c8f82ddc578ecaeea6177d28fc
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-18 15:21:35 -08:00
Alexander Popov
5fc77d000b UPSTREAM: kcov: make kcov work properly with KASLR enabled
Subtract KASLR offset from the kernel addresses reported by kcov.
Tested on x86_64 and AArch64 (Hikey LeMaker).

Link: http://lkml.kernel.org/r/1481417456-28826-3-git-send-email-alex.popov@linux.com
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Jon Masters <jcm@redhat.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Nicolai Stange <nicstange@gmail.com>
Cc: James Morse <james.morse@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 4983f0ab7f)
Change-Id: Ib19d7fb559f7db28314cd13a3e33e061d1dfdec9
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-18 15:21:35 -08:00
Kefeng Wang
7d9726f06e UPSTREAM: kcov: add more missing includes
It is fragile that some definitions acquired via transitive
dependencies, as shown in below:

atomic_*        (<linux/atomic.h>)
ENOMEM/EN*      (<linux/errno.h>)
EXPORT_SYMBOL   (<linux/export.h>)
device_initcall (<linux/init.h>)
preempt_*       (<linux/preempt.h>)

Include them to prevent possible issues.

Link: http://lkml.kernel.org/r/1481163221-40170-1-git-send-email-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from db862358a4)
Change-Id: Ia529631d2072cc795c46ae0276e51592318cd40f
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-18 15:21:35 -08:00
Greg Kroah-Hartman
9542d2a012 Merge 4.9.70 into android-4.9
Changes in 4.9.70
	net: qmi_wwan: add Quectel BG96 2c7c:0296
	s390/qeth: fix early exit from error path
	tipc: fix memory leak in tipc_accept_from_sock()
	rds: Fix NULL pointer dereference in __rds_rdma_map
	sit: update frag_off info
	packet: fix crash in fanout_demux_rollover()
	net/packet: fix a race in packet_bind() and packet_notifier()
	usbnet: fix alignment for frames with no ethernet header
	net: remove hlist_nulls_add_tail_rcu()
	stmmac: reset last TSO segment size after device open
	tcp/dccp: block bh before arming time_wait timer
	s390/qeth: build max size GSO skbs on L2 devices
	s390/qeth: fix GSO throughput regression
	s390/qeth: fix thinko in IPv4 multicast address tracking
	tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()
	Fix handling of verdicts after NF_QUEUE
	ipmi: Stop timers before cleaning up the module
	s390: always save and restore all registers on context switch
	usb: gadget: ffs: Forbid usb_ep_alloc_request from sleeping
	fix kcm_clone()
	KVM: arm/arm64: vgic-its: Preserve the revious read from the pending table
	powerpc/64: Fix checksum folding in csum_tcpudp_nofold and ip_fast_csum_nofold
	kbuild: do not call cc-option before KBUILD_CFLAGS initialization
	ipvlan: fix ipv6 outbound device
	audit: ensure that 'audit=1' actually enables audit for PID 1
	md: free unused memory after bitmap resize
	RDMA/cxgb4: Annotate r2 and stag as __be32
	Linux 4.9.70

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-18 10:58:11 +01:00
Paul Moore
93dedcf5a1 audit: ensure that 'audit=1' actually enables audit for PID 1
[ Upstream commit 173743dd99 ]

Prior to this patch we enabled audit in audit_init(), which is too
late for PID 1 as the standard initcalls are run after the PID 1 task
is forked.  This means that we never allocate an audit_context (see
audit_alloc()) for PID 1 and therefore miss a lot of audit events
generated by PID 1.

This patch enables audit as early as possible to help ensure that when
PID 1 is forked it can allocate an audit_context if required.

Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 16:25:47 +01:00
Greg Kroah-Hartman
3f1d77ca5f Merge 4.9.69 into android-4.9
Changes in 4.9.69
	usb: gadget: udc: renesas_usb3: fix number of the pipes
	can: ti_hecc: Fix napi poll return value for repoll
	can: kvaser_usb: free buf in error paths
	can: kvaser_usb: Fix comparison bug in kvaser_usb_read_bulk_callback()
	can: kvaser_usb: ratelimit errors if incomplete messages are received
	can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
	can: ems_usb: cancel urb on -EPIPE and -EPROTO
	can: esd_usb2: cancel urb on -EPIPE and -EPROTO
	can: usb_8dev: cancel urb on -EPIPE and -EPROTO
	virtio: release virtio index when fail to device_register
	hv: kvp: Avoid reading past allocated blocks from KVP file
	isa: Prevent NULL dereference in isa_bus driver callbacks
	scsi: dma-mapping: always provide dma_get_cache_alignment
	scsi: use dma_get_cache_alignment() as minimum DMA alignment
	scsi: libsas: align sata_device's rps_resp on a cacheline
	efi: Move some sysfs files to be read-only by root
	efi/esrt: Use memunmap() instead of kfree() to free the remapping
	ASN.1: fix out-of-bounds read when parsing indefinite length item
	ASN.1: check for error from ASN1_OP_END__ACT actions
	KEYS: add missing permission check for request_key() destination
	X.509: reject invalid BIT STRING for subjectPublicKey
	X.509: fix comparisons of ->pkey_algo
	x86/PCI: Make broadcom_postcore_init() check acpi_disabled
	KVM: x86: fix APIC page invalidation
	btrfs: fix missing error return in btrfs_drop_snapshot
	ALSA: pcm: prevent UAF in snd_pcm_info
	ALSA: seq: Remove spurious WARN_ON() at timer check
	ALSA: usb-audio: Fix out-of-bound error
	ALSA: usb-audio: Add check return value for usb_string()
	iommu/vt-d: Fix scatterlist offset handling
	smp/hotplug: Move step CPUHP_AP_SMPCFD_DYING to the correct place
	s390: fix compat system call table
	KVM: s390: Fix skey emulation permission check
	powerpc/64s: Initialize ISAv3 MMU registers before setting partition table
	brcmfmac: change driver unbind order of the sdio function devices
	kdb: Fix handling of kallsyms_symbol_next() return value
	drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU
	media: dvb: i2c transfers over usb cannot be done from stack
	arm64: KVM: fix VTTBR_BADDR_MASK BUG_ON off-by-one
	arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one
	KVM: VMX: remove I/O port 0x80 bypass on Intel hosts
	KVM: arm/arm64: Fix broken GICH_ELRSR big endian conversion
	KVM: arm/arm64: vgic-irqfd: Fix MSI entry allocation
	KVM: arm/arm64: vgic-its: Check result of allocation before use
	arm64: fpsimd: Prevent registers leaking from dead tasks
	bus: arm-cci: Fix use of smp_processor_id() in preemptible context
	bus: arm-ccn: Check memory allocation failure
	bus: arm-ccn: Fix use of smp_processor_id() in preemptible context
	bus: arm-ccn: fix module unloading Error: Removing state 147 which has instances left.
	crypto: talitos - fix AEAD test failures
	crypto: talitos - fix memory corruption on SEC2
	crypto: talitos - fix setkey to check key weakness
	crypto: talitos - fix AEAD for sha224 on non sha224 capable chips
	crypto: talitos - fix use of sg_link_tbl_len
	crypto: talitos - fix ctr-aes-talitos
	usb: f_fs: Force Reserved1=1 in OS_DESC_EXT_COMPAT
	ARM: BUG if jumping to usermode address in kernel mode
	ARM: avoid faulting on qemu
	thp: reduce indentation level in change_huge_pmd()
	thp: fix MADV_DONTNEED vs. numa balancing race
	mm: drop unused pmdp_huge_get_and_clear_notify()
	Revert "drm/armada: Fix compile fail"
	Revert "spi: SPI_FSL_DSPI should depend on HAS_DMA"
	ARM: 8657/1: uaccess: consistently check object sizes
	vti6: Don't report path MTU below IPV6_MIN_MTU.
	ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure
	x86/selftests: Add clobbers for int80 on x86_64
	x86/platform/uv/BAU: Fix HUB errors by remove initial write to sw-ack register
	sched/fair: Make select_idle_cpu() more aggressive
	x86/hpet: Prevent might sleep splat on resume
	powerpc/64: Invalidate process table caching after setting process table
	selftest/powerpc: Fix false failures for skipped tests
	powerpc: Fix compiling a BE kernel with a powerpc64le toolchain
	lirc: fix dead lock between open and wakeup_filter
	module: set __jump_table alignment to 8
	powerpc/64: Fix checksum folding in csum_add()
	ARM: OMAP2+: Fix device node reference counts
	ARM: OMAP2+: Release device node after it is no longer needed.
	ASoC: rcar: avoid SSI_MODEx settings for SSI8
	gpio: altera: Use handle_level_irq when configured as a level_high
	HID: chicony: Add support for another ASUS Zen AiO keyboard
	usb: gadget: configs: plug memory leak
	USB: gadgetfs: Fix a potential memory leak in 'dev_config()'
	usb: dwc3: gadget: Fix system suspend/resume on TI platforms
	usb: gadget: pxa27x: Test for a valid argument pointer
	usb: gadget: udc: net2280: Fix tmp reusage in net2280 driver
	kvm: nVMX: VMCLEAR should not cause the vCPU to shut down
	libata: drop WARN from protocol error in ata_sff_qc_issue()
	workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq
	scsi: qla2xxx: Fix ql_dump_buffer
	scsi: lpfc: Fix crash during Hardware error recovery on SLI3 adapters
	irqchip/crossbar: Fix incorrect type of register size
	KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset
	arm: KVM: Survive unknown traps from guests
	arm64: KVM: Survive unknown traps from guests
	KVM: arm/arm64: VGIC: Fix command handling while ITS being disabled
	spi_ks8995: fix "BUG: key accdaa28 not in .data!"
	spi_ks8995: regs_size incorrect for some devices
	bnx2x: prevent crash when accessing PTP with interface down
	bnx2x: fix possible overrun of VFPF multicast addresses array
	bnx2x: fix detection of VLAN filtering feature for VF
	bnx2x: do not rollback VF MAC/VLAN filters we did not configure
	rds: tcp: Sequence teardown of listen and acceptor sockets to avoid races
	ibmvnic: Fix overflowing firmware/hardware TX queue
	ibmvnic: Allocate number of rx/tx buffers agreed on by firmware
	ipv6: reorder icmpv6_init() and ip6_mr_init()
	crypto: s5p-sss - Fix completing crypto request in IRQ handler
	i2c: riic: fix restart condition
	blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue()
	zram: set physical queue limits to avoid array out of bounds accesses
	netfilter: don't track fragmented packets
	axonram: Fix gendisk handling
	drm/amd/amdgpu: fix console deadlock if late init failed
	powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested
	EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro
	EDAC, i5000, i5400: Fix definition of NRECMEMB register
	kbuild: pkg: use --transform option to prefix paths in tar
	coccinelle: fix parallel build with CHECK=scripts/coccicheck
	x86/mpx/selftests: Fix up weird arrays
	mac80211_hwsim: Fix memory leak in hwsim_new_radio_nl()
	gre6: use log_ecn_error module parameter in ip6_tnl_rcv()
	route: also update fnhe_genid when updating a route cache
	route: update fnhe_expires for redirect when the fnhe exists
	drivers/rapidio/devices/rio_mport_cdev.c: fix resource leak in error handling path in 'rio_dma_transfer()'
	lib/genalloc.c: make the avail variable an atomic_long_t
	dynamic-debug-howto: fix optional/omitted ending line number to be LARGE instead of 0
	NFS: Fix a typo in nfs_rename()
	sunrpc: Fix rpc_task_begin trace point
	xfs: fix forgotten rcu read unlock when skipping inode reclaim
	dt-bindings: usb: fix reg-property port-number range
	block: wake up all tasks blocked in get_request()
	sparc64/mm: set fields in deferred pages
	zsmalloc: calling zs_map_object() from irq is a bug
	sctp: do not free asoc when it is already dead in sctp_sendmsg
	sctp: use the right sk after waking up from wait_buf sleep
	bpf: fix lockdep splat
	clk: uniphier: fix DAPLL2 clock rate of Pro5
	atm: horizon: Fix irq release error
	jump_label: Invoke jump_label_test() via early_initcall()
	xfrm: Copy policy family in clone_policy
	IB/mlx4: Increase maximal message size under UD QP
	IB/mlx5: Assign send CQ and recv CQ of UMR QP
	afs: Connect up the CB.ProbeUuid
	Linux 4.9.69

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-14 09:58:43 +01:00
Jason Baron
74b470ce47 jump_label: Invoke jump_label_test() via early_initcall()
[ Upstream commit 92ee46efeb ]

Fengguang Wu reported that running the rcuperf test during boot can cause
the jump_label_test() to hit a WARN_ON(). The issue is that the core jump
label code relies on kernel_text_address() to detect when it can no longer
update branches that may be contained in __init sections. The
kernel_text_address() in turn assumes that if the system_state variable is
greter than or equal to SYSTEM_RUNNING then __init sections are no longer
valid (since the assumption is that they have been freed). However, when
rcuperf is setup to run in early boot it can call kernel_power_off() which
sets the system_state to SYSTEM_POWER_OFF.

Since rcuperf initialization is invoked via a module_init(), we can make
the dependency of jump_label_test() needing to complete before rcuperf
explicit by calling it via early_initcall().

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1510609727-2238-1-git-send-email-jbaron@akamai.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14 09:28:24 +01:00
Eric Dumazet
f45f4f8a7c bpf: fix lockdep splat
[ Upstream commit 89ad2fa3f0 ]

pcpu_freelist_pop() needs the same lockdep awareness than
pcpu_freelist_populate() to avoid a false positive.

 [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]

 switchto-defaul/12508 [HC0[0]:SC0[6]:HE0:SE0] is trying to acquire:
  (&htab->buckets[i].lock){......}, at: [<ffffffff9dc099cb>] __htab_percpu_map_update_elem+0x1cb/0x300

 and this task is already holding:
  (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}, at: [<ffffffff9e135848>] __dev_queue_xmit+0
x868/0x1240
 which would create a new lock dependency:
  (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...} -> (&htab->buckets[i].lock){......}

 but this new dependency connects a SOFTIRQ-irq-safe lock:
  (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}
 ... which became SOFTIRQ-irq-safe at:
   [<ffffffff9db5931b>] __lock_acquire+0x42b/0x1f10
   [<ffffffff9db5b32c>] lock_acquire+0xbc/0x1b0
   [<ffffffff9da05e38>] _raw_spin_lock+0x38/0x50
   [<ffffffff9e135848>] __dev_queue_xmit+0x868/0x1240
   [<ffffffff9e136240>] dev_queue_xmit+0x10/0x20
   [<ffffffff9e1965d9>] ip_finish_output2+0x439/0x590
   [<ffffffff9e197410>] ip_finish_output+0x150/0x2f0
   [<ffffffff9e19886d>] ip_output+0x7d/0x260
   [<ffffffff9e19789e>] ip_local_out+0x5e/0xe0
   [<ffffffff9e197b25>] ip_queue_xmit+0x205/0x620
   [<ffffffff9e1b8398>] tcp_transmit_skb+0x5a8/0xcb0
   [<ffffffff9e1ba152>] tcp_write_xmit+0x242/0x1070
   [<ffffffff9e1baffc>] __tcp_push_pending_frames+0x3c/0xf0
   [<ffffffff9e1b3472>] tcp_rcv_established+0x312/0x700
   [<ffffffff9e1c1acc>] tcp_v4_do_rcv+0x11c/0x200
   [<ffffffff9e1c3dc2>] tcp_v4_rcv+0xaa2/0xc30
   [<ffffffff9e191107>] ip_local_deliver_finish+0xa7/0x240
   [<ffffffff9e191a36>] ip_local_deliver+0x66/0x200
   [<ffffffff9e19137d>] ip_rcv_finish+0xdd/0x560
   [<ffffffff9e191e65>] ip_rcv+0x295/0x510
   [<ffffffff9e12ff88>] __netif_receive_skb_core+0x988/0x1020
   [<ffffffff9e130641>] __netif_receive_skb+0x21/0x70
   [<ffffffff9e1306ff>] process_backlog+0x6f/0x230
   [<ffffffff9e132129>] net_rx_action+0x229/0x420
   [<ffffffff9da07ee8>] __do_softirq+0xd8/0x43d
   [<ffffffff9e282bcc>] do_softirq_own_stack+0x1c/0x30
   [<ffffffff9dafc2f5>] do_softirq+0x55/0x60
   [<ffffffff9dafc3a8>] __local_bh_enable_ip+0xa8/0xb0
   [<ffffffff9db4c727>] cpu_startup_entry+0x1c7/0x500
   [<ffffffff9daab333>] start_secondary+0x113/0x140

 to a SOFTIRQ-irq-unsafe lock:
  (&head->lock){+.+...}
 ... which became SOFTIRQ-irq-unsafe at:
 ...  [<ffffffff9db5971f>] __lock_acquire+0x82f/0x1f10
   [<ffffffff9db5b32c>] lock_acquire+0xbc/0x1b0
   [<ffffffff9da05e38>] _raw_spin_lock+0x38/0x50
   [<ffffffff9dc0b7fa>] pcpu_freelist_pop+0x7a/0xb0
   [<ffffffff9dc08b2c>] htab_map_alloc+0x50c/0x5f0
   [<ffffffff9dc00dc5>] SyS_bpf+0x265/0x1200
   [<ffffffff9e28195f>] entry_SYSCALL_64_fastpath+0x12/0x17

 other info that might help us debug this:

 Chain exists of:
   dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2 --> &htab->buckets[i].lock --> &head->lock

  Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&head->lock);
                                local_irq_disable();
                                lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);
                                lock(&htab->buckets[i].lock);
   <Interrupt>
     lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);

  *** DEADLOCK ***

Fixes: e19494edab ("bpf: introduce percpu_freelist")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14 09:28:23 +01:00
Tejun Heo
757e1845d6 workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq
[ Upstream commit 637fdbae60 ]

If queue_delayed_work() gets called with NULL @wq, the kernel will
oops asynchronuosly on timer expiration which isn't too helpful in
tracking down the offender.  This actually happened with smc.

__queue_delayed_work() already does several input sanity checks
synchronously.  Add NULL @wq check.

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Link: http://lkml.kernel.org/r/20170227171439.jshx3qplflyrgcv7@codemonkey.org.uk
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14 09:28:19 +01:00
Peter Zijlstra
4e4a9ebe33 sched/fair: Make select_idle_cpu() more aggressive
[ Upstream commit 4c77b18cf8 ]

Kitsunyan reported desktop latency issues on his Celeron 887 because
of commit:

  1b568f0aab ("sched/core: Optimize SCHED_SMT")

... even though his CPU doesn't do SMT.

The effect of running the SMT code on a !SMT part is basically a more
aggressive select_idle_cpu(). Removing the avg condition fixed things
for him.

I also know FB likes this test gone, even though other workloads like
having it.

For now, take it out by default, until we get a better idea.

Reported-by: kitsunyan <kitsunyan@inbox.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14 09:28:17 +01:00
Daniel Thompson
30b18ee253 kdb: Fix handling of kallsyms_symbol_next() return value
commit c07d353380 upstream.

kallsyms_symbol_next() returns a boolean (true on success). Currently
kdb_read() tests the return value with an inequality that
unconditionally evaluates to true.

This is fixed in the obvious way and, since the conditional branch is
supposed to be unreachable, we also add a WARN_ON().

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14 09:28:13 +01:00
Lai Jiangshan
ff3d4fd537 smp/hotplug: Move step CPUHP_AP_SMPCFD_DYING to the correct place
commit 46febd37f9 upstream.

Commit 31487f8328 ("smp/cfd: Convert core to hotplug state machine")
accidently put this step on the wrong place. The step should be at the
cpuhp_ap_states[] rather than the cpuhp_bp_states[].

grep smpcfd /sys/devices/system/cpu/hotplug/states
 40: smpcfd:prepare
129: smpcfd:dying

"smpcfd:dying" was missing before.
So was the invocation of the function smpcfd_dying_cpu().

Fixes: 31487f8328 ("smp/cfd: Convert core to hotplug state machine")
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lkml.kernel.org/r/20171128131954.81229-1-jiangshanlai@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14 09:28:13 +01:00
Greg Kroah-Hartman
fdeec8fdb7 Merge 4.9.68 into android-4.9
Changes in 4.9.68
	bcache: only permit to recovery read error when cache device is clean
	bcache: recover data from backing when data is clean
	drm/fsl-dcu: avoid disabling pixel clock twice on suspend
	drm/fsl-dcu: enable IRQ before drm_atomic_helper_resume()
	Revert "crypto: caam - get rid of tasklet"
	mm, oom_reaper: gather each vma to prevent leaking TLB entry
	uas: Always apply US_FL_NO_ATA_1X quirk to Seagate devices
	usb: quirks: Add no-lpm quirk for KY-688 USB 3.1 Type-C Hub
	serial: 8250_pci: Add Amazon PCI serial device ID
	s390/runtime instrumentation: simplify task exit handling
	USB: serial: option: add Quectel BG96 id
	ima: fix hash algorithm initialization
	s390/pci: do not require AIS facility
	selftests/x86/ldt_get: Add a few additional tests for limits
	staging: greybus: loopback: Fix iteration count on async path
	m68k: fix ColdFire node shift size calculation
	serial: 8250_fintek: Fix rs485 disablement on invalid ioctl()
	staging: rtl8188eu: avoid a null dereference on pmlmepriv
	spi: sh-msiof: Fix DMA transfer size check
	spi: spi-axi: fix potential use-after-free after deregistration
	mmc: sdhci-msm: fix issue with power irq
	usb: phy: tahvo: fix error handling in tahvo_usb_probe()
	serial: 8250: Preserve DLD[7:4] for PORT_XR17V35X
	x86/entry: Use SYSCALL_DEFINE() macros for sys_modify_ldt()
	EDAC, sb_edac: Fix missing break in switch
	sysrq : fix Show Regs call trace on ARM
	usbip: tools: Install all headers needed for libusbip development
	perf test attr: Fix ignored test case result
	kprobes/x86: Disable preemption in ftrace-based jprobes
	tools include: Do not use poison with C++
	iio: adc: ti-ads1015: add 10% to conversion wait time
	dax: Avoid page invalidation races and unnecessary radix tree traversals
	net/mlx4_en: Fix type mismatch for 32-bit systems
	l2tp: take remote address into account in l2tp_ip and l2tp_ip6 socket lookups
	dmaengine: stm32-dma: Set correct args number for DMA request from DT
	dmaengine: stm32-dma: Fix null pointer dereference in stm32_dma_tx_status
	usb: gadget: f_fs: Fix ExtCompat descriptor validation
	libcxgb: fix error check for ip6_route_output()
	net: systemport: Utilize skb_put_padto()
	net: systemport: Pad packet before inserting TSB
	ARM: OMAP2+: Fix WL1283 Bluetooth Baud Rate
	ARM: OMAP1: DMA: Correct the number of logical channels
	vti6: fix device register to report IFLA_INFO_KIND
	be2net: fix accesses to unicast list
	be2net: fix unicast list filling
	net/appletalk: Fix kernel memory disclosure
	libfs: Modify mount_pseudo_xattr to be clear it is not a userspace mount
	net: qrtr: Mark 'buf' as little endian
	mm: fix remote numa hits statistics
	mac80211: calculate min channel width correctly
	ravb: Remove Rx overflow log messages
	nfs: Don't take a reference on fl->fl_file for LOCK operation
	drm/exynos/decon5433: update shadow registers iff there are active windows
	drm/exynos/decon5433: set STANDALONE_UPDATE_F also if planes are disabled
	KVM: arm/arm64: Fix occasional warning from the timer work function
	mac80211: prevent skb/txq mismatch
	NFSv4: Fix client recovery when server reboots multiple times
	perf/x86/intel: Account interrupts for PEBS errors
	powerpc/mm: Fix memory hotplug BUG() on radix
	qla2xxx: Fix wrong IOCB type assumption
	drm/amdgpu: fix bug set incorrect value to vce register
	drm/exynos/decon5433: set STANDALONE_UPDATE_F on output enablement
	net: sctp: fix array overrun read on sctp_timer_tbl
	x86/fpu: Set the xcomp_bv when we fake up a XSAVES area
	drm/amdgpu: fix unload driver issue for virtual display
	mac80211: don't try to sleep in rate_control_rate_init()
	RDMA/qedr: Return success when not changing QP state
	RDMA/qedr: Fix RDMA CM loopback
	tipc: fix nametbl_lock soft lockup at module exit
	tipc: fix cleanup at module unload
	dmaengine: pl330: fix double lock
	tcp: correct memory barrier usage in tcp_check_space()
	i2c: i2c-cadence: Initialize configuration before probing devices
	nvmet: cancel fatal error and flush async work before free controller
	gtp: clear DF bit on GTP packet tx
	gtp: fix cross netns recv on gtp socket
	net: phy: micrel: KSZ8795 do not set SUPPORTED_[Asym_]Pause
	net: thunderx: avoid dereferencing xcv when NULL
	be2net: fix initial MAC setting
	vfio/spapr: Fix missing mutex unlock when creating a window
	mm: avoid returning VM_FAULT_RETRY from ->page_mkwrite handlers
	xen-netfront: Improve error handling during initialization
	cec: initiator should be the same as the destination for, poll
	xen-netback: vif counters from int/long to u64
	net: fec: fix multicast filtering hardware setup
	dma-buf/dma-fence: Extract __dma_fence_is_later()
	dma-buf/sw-sync: Fix the is-signaled test to handle u32 wraparound
	dma-buf/sw-sync: Prevent user overflow on timeline advance
	dma-buf/sw-sync: Reduce irqsave/irqrestore from known context
	dma-buf/sw-sync: sync_pt is private and of fixed size
	dma-buf/sw-sync: Fix locking around sync_timeline lists
	dma-buf/sw-sync: Use an rbtree to sort fences in the timeline
	dma-buf/sw_sync: move timeline_fence_ops around
	dma-buf/sw_sync: clean up list before signaling the fence
	dma-fence: Clear fence->status during dma_fence_init()
	dma-fence: Wrap querying the fence->status
	dma-fence: Introduce drm_fence_set_error() helper
	dma-buf/sw_sync: force signal all unsignaled fences on dying timeline
	dma-buf/sync_file: hold reference to fence when creating sync_file
	dma-buf: Update kerneldoc for sync_file_create
	usb: hub: Cycle HUB power when initialization fails
	usb: xhci: fix panic in xhci_free_virt_devices_depth_first
	USB: core: Add type-specific length check of BOS descriptors
	USB: Increase usbfs transfer limit
	USB: devio: Prevent integer overflow in proc_do_submiturb()
	USB: usbfs: Filter flags passed in from user space
	usb: host: fix incorrect updating of offset
	xen-netfront: avoid crashing on resume after a failure in talk_to_netback()
	Linux 4.9.68

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-10 17:13:13 +01:00
Jiri Olsa
a88ff235e8 perf/x86/intel: Account interrupts for PEBS errors
[ Upstream commit 475113d937 ]

It's possible to set up PEBS events to get only errors and not
any data, like on SNB-X (model 45) and IVB-EP (model 62)
via 2 perf commands running simultaneously:

    taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10

This leads to a soft lock up, because the error path of the
intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt
for error PEBS interrupts, so in case you're getting ONLY
errors you don't have a way to stop the event when it's over
the max_samples_per_tick limit:

  NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816]
  ...
  RIP: 0010:[<ffffffff81159232>]  [<ffffffff81159232>] smp_call_function_single+0xe2/0x140
  ...
  Call Trace:
   ? trace_hardirqs_on_caller+0xf5/0x1b0
   ? perf_cgroup_attach+0x70/0x70
   perf_install_in_context+0x199/0x1b0
   ? ctx_resched+0x90/0x90
   SYSC_perf_event_open+0x641/0xf90
   SyS_perf_event_open+0x9/0x10
   do_syscall_64+0x6c/0x1f0
   entry_SYSCALL64_slow_path+0x25/0x25

Add perf_event_account_interrupt() which does the interrupt
and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s
error path.

We keep the pending_kill and pending_wakeup logic only in the
__perf_event_overflow() path, because they make sense only if
there's any data to deliver.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-09 22:01:52 +01:00
Viresh Kumar
87cdf4eda5 BACKPORT: schedutil: Reset cached freq if it is not in sync with next_freq
'cached_raw_freq' is used to get the next frequency quickly but should
always be in sync with sg_policy->next_freq. There are cases where it is
not and in such cases it should be reset to avoid switching to incorrect
frequencies.

Consider this case for example:
- policy->cur is 1.2 GHz (Max)
- New request comes for 780 MHz and we store that in cached_raw_freq.
- Based on 780 MHz, we calculate the effective frequency as 800 MHz.
- We then decide not to update the frequency as
  sugov_up_down_rate_limit() return true.
- Here cached_raw_freq is 780 MHz and sg_policy->next_freq is 1.2 GHz.
- Now if the utilization doesn't change in next request, then the next
  target frequency will still be 780 MHz and it will match with
  cached_raw_freq and so we will directly return 1.2 GHz instead of 800
  MHz.

BACKPORT of upstream commit 07458f6a51 ("cpufreq: schedutil: Reset
cached_raw_freq when not in sync with next_freq").

This also updates sugov_update_commit() for handling up/down tunables,
which aren't present in mainline.

Change-Id: Ie86465231e7cb265e5b4c26f59d6faf8d9630b0a
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-12-04 16:23:03 +00:00
Victor Wan
1e547d2935 Merge branch 'android-4.9' into amlogic-4.9-dev 2017-12-02 16:52:23 +08:00
Ke Wang
b763480947 sched: EAS/WALT: Don't take into account of running task's util
For upmigrating misfit running task case, the currently running
task's util has been counted into cpu_util(). Thus currently
__cpu_overutilized() which add task's uitl twice is overestimated.

Change-Id: I5326f4c736a55679009d2e7293f8792311c04294
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
2017-12-01 16:14:01 +00:00
Joonwoo Park
b41e1ca689 sched: EAS/WALT: take into account of waking task's load
WALT's function cpu_util(cpu) reports CPU's load without taking into
account of waking task's load.  Thus currently cpu_overutilized()
underestimates load on the previous CPU of waking task.

Take into account of task's load to determine whether previous CPU is
overutilzed to bail out early without running energy_diff() which is
expensive.

Change-Id: I30f146984a880ad2cc1b8a4ce35bd239a8c9a607
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(minor rebase conflicts)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit 94e5c96507)
[trivial cherry-pick issues]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-12-01 16:11:57 +00:00
Joonwoo Park
4f0693ad08 sched: EAS: upmigrate misfit current task
Upmigrate misfit current task upon scheduler tick with stopper.

We can kick an random (not necessarily big CPU) NOHZ idle CPU when a
CPU bound task is in need of upmigration.  But it's not efficient as that
way needs following unnecessary wakeups:

  1. Busy little CPU A to kick idle B
  2. B runs idle balancer and enqueue migration/A
  3. B goes idle
  4. A runs migration/A, enqueues busy task on B.
  5. B wakes up again.

This change makes active upmigration more efficiently by doing:

  1. Busy little CPU A find target CPU B upon tick.
  2. CPU A enqueues migration/A.

Change-Id: Ie865738054ea3296f28e6ba01710635efa7193c0
[joonwoop: The original version had logic to reserve CPU.  The logic is
 omitted in this version.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
(cherry picked from commit 9e293db052)
[trivial cherry-pick issues]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-12-01 16:11:45 +00:00
Prasad Sodagudi
d345372a29 sched: avoid pushing tasks to an offline CPU
Currently active_load_balance_cpu_stop is run by cpu stopper and it
pushes running tasks off the busiest CPU onto idle target CPU. But
there is no check to see whether target cpu is offline or not before
pushing the tasks.  With the introduction of active migration in the
scheduler tick path (see check_for_migration()) there have been
instances of attempts to migrate tasks to offline CPUs.

Add a check as to whether the target cpu is online or not to prevent
scheduling on offline CPUs.

Change-Id: Ib8ac7f8aeabd3ca7365f3eae977075952dab4f21
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
(cherry picked from commit dc626b28ee)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-12-01 16:11:35 +00:00
Srivatsa Vaddagiri
70e14af60c sched: Extend active balance to accept 'push_task' argument
Active balance currently picks one task to migrate from busy cpu to
a chosen cpu (push_cpu). This patch extends active load balance to
recognize a particular task ('push_task') that needs to be migrated to
'push_cpu'. This capability will be leveraged by HMP-aware task
placement in a subsequent patch.

Change-Id: If31320111e6cc7044e617b5c3fd6d8e0c0e16952
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
(cherry picked from commit 2da014c0d8)
[ trivial cherry-pick issues, fix "may be used uninitialized" warning
for push_task ]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-12-01 16:11:25 +00:00