Commit Graph

23985 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
9542d2a012 Merge 4.9.70 into android-4.9
Changes in 4.9.70
	net: qmi_wwan: add Quectel BG96 2c7c:0296
	s390/qeth: fix early exit from error path
	tipc: fix memory leak in tipc_accept_from_sock()
	rds: Fix NULL pointer dereference in __rds_rdma_map
	sit: update frag_off info
	packet: fix crash in fanout_demux_rollover()
	net/packet: fix a race in packet_bind() and packet_notifier()
	usbnet: fix alignment for frames with no ethernet header
	net: remove hlist_nulls_add_tail_rcu()
	stmmac: reset last TSO segment size after device open
	tcp/dccp: block bh before arming time_wait timer
	s390/qeth: build max size GSO skbs on L2 devices
	s390/qeth: fix GSO throughput regression
	s390/qeth: fix thinko in IPv4 multicast address tracking
	tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()
	Fix handling of verdicts after NF_QUEUE
	ipmi: Stop timers before cleaning up the module
	s390: always save and restore all registers on context switch
	usb: gadget: ffs: Forbid usb_ep_alloc_request from sleeping
	fix kcm_clone()
	KVM: arm/arm64: vgic-its: Preserve the revious read from the pending table
	powerpc/64: Fix checksum folding in csum_tcpudp_nofold and ip_fast_csum_nofold
	kbuild: do not call cc-option before KBUILD_CFLAGS initialization
	ipvlan: fix ipv6 outbound device
	audit: ensure that 'audit=1' actually enables audit for PID 1
	md: free unused memory after bitmap resize
	RDMA/cxgb4: Annotate r2 and stag as __be32
	Linux 4.9.70

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-18 10:58:11 +01:00
Paul Moore
93dedcf5a1 audit: ensure that 'audit=1' actually enables audit for PID 1
[ Upstream commit 173743dd99 ]

Prior to this patch we enabled audit in audit_init(), which is too
late for PID 1 as the standard initcalls are run after the PID 1 task
is forked.  This means that we never allocate an audit_context (see
audit_alloc()) for PID 1 and therefore miss a lot of audit events
generated by PID 1.

This patch enables audit as early as possible to help ensure that when
PID 1 is forked it can allocate an audit_context if required.

Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 16:25:47 +01:00
Greg Kroah-Hartman
3f1d77ca5f Merge 4.9.69 into android-4.9
Changes in 4.9.69
	usb: gadget: udc: renesas_usb3: fix number of the pipes
	can: ti_hecc: Fix napi poll return value for repoll
	can: kvaser_usb: free buf in error paths
	can: kvaser_usb: Fix comparison bug in kvaser_usb_read_bulk_callback()
	can: kvaser_usb: ratelimit errors if incomplete messages are received
	can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
	can: ems_usb: cancel urb on -EPIPE and -EPROTO
	can: esd_usb2: cancel urb on -EPIPE and -EPROTO
	can: usb_8dev: cancel urb on -EPIPE and -EPROTO
	virtio: release virtio index when fail to device_register
	hv: kvp: Avoid reading past allocated blocks from KVP file
	isa: Prevent NULL dereference in isa_bus driver callbacks
	scsi: dma-mapping: always provide dma_get_cache_alignment
	scsi: use dma_get_cache_alignment() as minimum DMA alignment
	scsi: libsas: align sata_device's rps_resp on a cacheline
	efi: Move some sysfs files to be read-only by root
	efi/esrt: Use memunmap() instead of kfree() to free the remapping
	ASN.1: fix out-of-bounds read when parsing indefinite length item
	ASN.1: check for error from ASN1_OP_END__ACT actions
	KEYS: add missing permission check for request_key() destination
	X.509: reject invalid BIT STRING for subjectPublicKey
	X.509: fix comparisons of ->pkey_algo
	x86/PCI: Make broadcom_postcore_init() check acpi_disabled
	KVM: x86: fix APIC page invalidation
	btrfs: fix missing error return in btrfs_drop_snapshot
	ALSA: pcm: prevent UAF in snd_pcm_info
	ALSA: seq: Remove spurious WARN_ON() at timer check
	ALSA: usb-audio: Fix out-of-bound error
	ALSA: usb-audio: Add check return value for usb_string()
	iommu/vt-d: Fix scatterlist offset handling
	smp/hotplug: Move step CPUHP_AP_SMPCFD_DYING to the correct place
	s390: fix compat system call table
	KVM: s390: Fix skey emulation permission check
	powerpc/64s: Initialize ISAv3 MMU registers before setting partition table
	brcmfmac: change driver unbind order of the sdio function devices
	kdb: Fix handling of kallsyms_symbol_next() return value
	drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU
	media: dvb: i2c transfers over usb cannot be done from stack
	arm64: KVM: fix VTTBR_BADDR_MASK BUG_ON off-by-one
	arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one
	KVM: VMX: remove I/O port 0x80 bypass on Intel hosts
	KVM: arm/arm64: Fix broken GICH_ELRSR big endian conversion
	KVM: arm/arm64: vgic-irqfd: Fix MSI entry allocation
	KVM: arm/arm64: vgic-its: Check result of allocation before use
	arm64: fpsimd: Prevent registers leaking from dead tasks
	bus: arm-cci: Fix use of smp_processor_id() in preemptible context
	bus: arm-ccn: Check memory allocation failure
	bus: arm-ccn: Fix use of smp_processor_id() in preemptible context
	bus: arm-ccn: fix module unloading Error: Removing state 147 which has instances left.
	crypto: talitos - fix AEAD test failures
	crypto: talitos - fix memory corruption on SEC2
	crypto: talitos - fix setkey to check key weakness
	crypto: talitos - fix AEAD for sha224 on non sha224 capable chips
	crypto: talitos - fix use of sg_link_tbl_len
	crypto: talitos - fix ctr-aes-talitos
	usb: f_fs: Force Reserved1=1 in OS_DESC_EXT_COMPAT
	ARM: BUG if jumping to usermode address in kernel mode
	ARM: avoid faulting on qemu
	thp: reduce indentation level in change_huge_pmd()
	thp: fix MADV_DONTNEED vs. numa balancing race
	mm: drop unused pmdp_huge_get_and_clear_notify()
	Revert "drm/armada: Fix compile fail"
	Revert "spi: SPI_FSL_DSPI should depend on HAS_DMA"
	ARM: 8657/1: uaccess: consistently check object sizes
	vti6: Don't report path MTU below IPV6_MIN_MTU.
	ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure
	x86/selftests: Add clobbers for int80 on x86_64
	x86/platform/uv/BAU: Fix HUB errors by remove initial write to sw-ack register
	sched/fair: Make select_idle_cpu() more aggressive
	x86/hpet: Prevent might sleep splat on resume
	powerpc/64: Invalidate process table caching after setting process table
	selftest/powerpc: Fix false failures for skipped tests
	powerpc: Fix compiling a BE kernel with a powerpc64le toolchain
	lirc: fix dead lock between open and wakeup_filter
	module: set __jump_table alignment to 8
	powerpc/64: Fix checksum folding in csum_add()
	ARM: OMAP2+: Fix device node reference counts
	ARM: OMAP2+: Release device node after it is no longer needed.
	ASoC: rcar: avoid SSI_MODEx settings for SSI8
	gpio: altera: Use handle_level_irq when configured as a level_high
	HID: chicony: Add support for another ASUS Zen AiO keyboard
	usb: gadget: configs: plug memory leak
	USB: gadgetfs: Fix a potential memory leak in 'dev_config()'
	usb: dwc3: gadget: Fix system suspend/resume on TI platforms
	usb: gadget: pxa27x: Test for a valid argument pointer
	usb: gadget: udc: net2280: Fix tmp reusage in net2280 driver
	kvm: nVMX: VMCLEAR should not cause the vCPU to shut down
	libata: drop WARN from protocol error in ata_sff_qc_issue()
	workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq
	scsi: qla2xxx: Fix ql_dump_buffer
	scsi: lpfc: Fix crash during Hardware error recovery on SLI3 adapters
	irqchip/crossbar: Fix incorrect type of register size
	KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset
	arm: KVM: Survive unknown traps from guests
	arm64: KVM: Survive unknown traps from guests
	KVM: arm/arm64: VGIC: Fix command handling while ITS being disabled
	spi_ks8995: fix "BUG: key accdaa28 not in .data!"
	spi_ks8995: regs_size incorrect for some devices
	bnx2x: prevent crash when accessing PTP with interface down
	bnx2x: fix possible overrun of VFPF multicast addresses array
	bnx2x: fix detection of VLAN filtering feature for VF
	bnx2x: do not rollback VF MAC/VLAN filters we did not configure
	rds: tcp: Sequence teardown of listen and acceptor sockets to avoid races
	ibmvnic: Fix overflowing firmware/hardware TX queue
	ibmvnic: Allocate number of rx/tx buffers agreed on by firmware
	ipv6: reorder icmpv6_init() and ip6_mr_init()
	crypto: s5p-sss - Fix completing crypto request in IRQ handler
	i2c: riic: fix restart condition
	blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue()
	zram: set physical queue limits to avoid array out of bounds accesses
	netfilter: don't track fragmented packets
	axonram: Fix gendisk handling
	drm/amd/amdgpu: fix console deadlock if late init failed
	powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested
	EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro
	EDAC, i5000, i5400: Fix definition of NRECMEMB register
	kbuild: pkg: use --transform option to prefix paths in tar
	coccinelle: fix parallel build with CHECK=scripts/coccicheck
	x86/mpx/selftests: Fix up weird arrays
	mac80211_hwsim: Fix memory leak in hwsim_new_radio_nl()
	gre6: use log_ecn_error module parameter in ip6_tnl_rcv()
	route: also update fnhe_genid when updating a route cache
	route: update fnhe_expires for redirect when the fnhe exists
	drivers/rapidio/devices/rio_mport_cdev.c: fix resource leak in error handling path in 'rio_dma_transfer()'
	lib/genalloc.c: make the avail variable an atomic_long_t
	dynamic-debug-howto: fix optional/omitted ending line number to be LARGE instead of 0
	NFS: Fix a typo in nfs_rename()
	sunrpc: Fix rpc_task_begin trace point
	xfs: fix forgotten rcu read unlock when skipping inode reclaim
	dt-bindings: usb: fix reg-property port-number range
	block: wake up all tasks blocked in get_request()
	sparc64/mm: set fields in deferred pages
	zsmalloc: calling zs_map_object() from irq is a bug
	sctp: do not free asoc when it is already dead in sctp_sendmsg
	sctp: use the right sk after waking up from wait_buf sleep
	bpf: fix lockdep splat
	clk: uniphier: fix DAPLL2 clock rate of Pro5
	atm: horizon: Fix irq release error
	jump_label: Invoke jump_label_test() via early_initcall()
	xfrm: Copy policy family in clone_policy
	IB/mlx4: Increase maximal message size under UD QP
	IB/mlx5: Assign send CQ and recv CQ of UMR QP
	afs: Connect up the CB.ProbeUuid
	Linux 4.9.69

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-14 09:58:43 +01:00
Jason Baron
74b470ce47 jump_label: Invoke jump_label_test() via early_initcall()
[ Upstream commit 92ee46efeb ]

Fengguang Wu reported that running the rcuperf test during boot can cause
the jump_label_test() to hit a WARN_ON(). The issue is that the core jump
label code relies on kernel_text_address() to detect when it can no longer
update branches that may be contained in __init sections. The
kernel_text_address() in turn assumes that if the system_state variable is
greter than or equal to SYSTEM_RUNNING then __init sections are no longer
valid (since the assumption is that they have been freed). However, when
rcuperf is setup to run in early boot it can call kernel_power_off() which
sets the system_state to SYSTEM_POWER_OFF.

Since rcuperf initialization is invoked via a module_init(), we can make
the dependency of jump_label_test() needing to complete before rcuperf
explicit by calling it via early_initcall().

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1510609727-2238-1-git-send-email-jbaron@akamai.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14 09:28:24 +01:00
Eric Dumazet
f45f4f8a7c bpf: fix lockdep splat
[ Upstream commit 89ad2fa3f0 ]

pcpu_freelist_pop() needs the same lockdep awareness than
pcpu_freelist_populate() to avoid a false positive.

 [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]

 switchto-defaul/12508 [HC0[0]:SC0[6]:HE0:SE0] is trying to acquire:
  (&htab->buckets[i].lock){......}, at: [<ffffffff9dc099cb>] __htab_percpu_map_update_elem+0x1cb/0x300

 and this task is already holding:
  (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}, at: [<ffffffff9e135848>] __dev_queue_xmit+0
x868/0x1240
 which would create a new lock dependency:
  (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...} -> (&htab->buckets[i].lock){......}

 but this new dependency connects a SOFTIRQ-irq-safe lock:
  (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}
 ... which became SOFTIRQ-irq-safe at:
   [<ffffffff9db5931b>] __lock_acquire+0x42b/0x1f10
   [<ffffffff9db5b32c>] lock_acquire+0xbc/0x1b0
   [<ffffffff9da05e38>] _raw_spin_lock+0x38/0x50
   [<ffffffff9e135848>] __dev_queue_xmit+0x868/0x1240
   [<ffffffff9e136240>] dev_queue_xmit+0x10/0x20
   [<ffffffff9e1965d9>] ip_finish_output2+0x439/0x590
   [<ffffffff9e197410>] ip_finish_output+0x150/0x2f0
   [<ffffffff9e19886d>] ip_output+0x7d/0x260
   [<ffffffff9e19789e>] ip_local_out+0x5e/0xe0
   [<ffffffff9e197b25>] ip_queue_xmit+0x205/0x620
   [<ffffffff9e1b8398>] tcp_transmit_skb+0x5a8/0xcb0
   [<ffffffff9e1ba152>] tcp_write_xmit+0x242/0x1070
   [<ffffffff9e1baffc>] __tcp_push_pending_frames+0x3c/0xf0
   [<ffffffff9e1b3472>] tcp_rcv_established+0x312/0x700
   [<ffffffff9e1c1acc>] tcp_v4_do_rcv+0x11c/0x200
   [<ffffffff9e1c3dc2>] tcp_v4_rcv+0xaa2/0xc30
   [<ffffffff9e191107>] ip_local_deliver_finish+0xa7/0x240
   [<ffffffff9e191a36>] ip_local_deliver+0x66/0x200
   [<ffffffff9e19137d>] ip_rcv_finish+0xdd/0x560
   [<ffffffff9e191e65>] ip_rcv+0x295/0x510
   [<ffffffff9e12ff88>] __netif_receive_skb_core+0x988/0x1020
   [<ffffffff9e130641>] __netif_receive_skb+0x21/0x70
   [<ffffffff9e1306ff>] process_backlog+0x6f/0x230
   [<ffffffff9e132129>] net_rx_action+0x229/0x420
   [<ffffffff9da07ee8>] __do_softirq+0xd8/0x43d
   [<ffffffff9e282bcc>] do_softirq_own_stack+0x1c/0x30
   [<ffffffff9dafc2f5>] do_softirq+0x55/0x60
   [<ffffffff9dafc3a8>] __local_bh_enable_ip+0xa8/0xb0
   [<ffffffff9db4c727>] cpu_startup_entry+0x1c7/0x500
   [<ffffffff9daab333>] start_secondary+0x113/0x140

 to a SOFTIRQ-irq-unsafe lock:
  (&head->lock){+.+...}
 ... which became SOFTIRQ-irq-unsafe at:
 ...  [<ffffffff9db5971f>] __lock_acquire+0x82f/0x1f10
   [<ffffffff9db5b32c>] lock_acquire+0xbc/0x1b0
   [<ffffffff9da05e38>] _raw_spin_lock+0x38/0x50
   [<ffffffff9dc0b7fa>] pcpu_freelist_pop+0x7a/0xb0
   [<ffffffff9dc08b2c>] htab_map_alloc+0x50c/0x5f0
   [<ffffffff9dc00dc5>] SyS_bpf+0x265/0x1200
   [<ffffffff9e28195f>] entry_SYSCALL_64_fastpath+0x12/0x17

 other info that might help us debug this:

 Chain exists of:
   dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2 --> &htab->buckets[i].lock --> &head->lock

  Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&head->lock);
                                local_irq_disable();
                                lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);
                                lock(&htab->buckets[i].lock);
   <Interrupt>
     lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);

  *** DEADLOCK ***

Fixes: e19494edab ("bpf: introduce percpu_freelist")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14 09:28:23 +01:00
Tejun Heo
757e1845d6 workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq
[ Upstream commit 637fdbae60 ]

If queue_delayed_work() gets called with NULL @wq, the kernel will
oops asynchronuosly on timer expiration which isn't too helpful in
tracking down the offender.  This actually happened with smc.

__queue_delayed_work() already does several input sanity checks
synchronously.  Add NULL @wq check.

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Link: http://lkml.kernel.org/r/20170227171439.jshx3qplflyrgcv7@codemonkey.org.uk
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14 09:28:19 +01:00
Peter Zijlstra
4e4a9ebe33 sched/fair: Make select_idle_cpu() more aggressive
[ Upstream commit 4c77b18cf8 ]

Kitsunyan reported desktop latency issues on his Celeron 887 because
of commit:

  1b568f0aab ("sched/core: Optimize SCHED_SMT")

... even though his CPU doesn't do SMT.

The effect of running the SMT code on a !SMT part is basically a more
aggressive select_idle_cpu(). Removing the avg condition fixed things
for him.

I also know FB likes this test gone, even though other workloads like
having it.

For now, take it out by default, until we get a better idea.

Reported-by: kitsunyan <kitsunyan@inbox.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14 09:28:17 +01:00
Daniel Thompson
30b18ee253 kdb: Fix handling of kallsyms_symbol_next() return value
commit c07d353380 upstream.

kallsyms_symbol_next() returns a boolean (true on success). Currently
kdb_read() tests the return value with an inequality that
unconditionally evaluates to true.

This is fixed in the obvious way and, since the conditional branch is
supposed to be unreachable, we also add a WARN_ON().

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14 09:28:13 +01:00
Lai Jiangshan
ff3d4fd537 smp/hotplug: Move step CPUHP_AP_SMPCFD_DYING to the correct place
commit 46febd37f9 upstream.

Commit 31487f8328 ("smp/cfd: Convert core to hotplug state machine")
accidently put this step on the wrong place. The step should be at the
cpuhp_ap_states[] rather than the cpuhp_bp_states[].

grep smpcfd /sys/devices/system/cpu/hotplug/states
 40: smpcfd:prepare
129: smpcfd:dying

"smpcfd:dying" was missing before.
So was the invocation of the function smpcfd_dying_cpu().

Fixes: 31487f8328 ("smp/cfd: Convert core to hotplug state machine")
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lkml.kernel.org/r/20171128131954.81229-1-jiangshanlai@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14 09:28:13 +01:00
Greg Kroah-Hartman
fdeec8fdb7 Merge 4.9.68 into android-4.9
Changes in 4.9.68
	bcache: only permit to recovery read error when cache device is clean
	bcache: recover data from backing when data is clean
	drm/fsl-dcu: avoid disabling pixel clock twice on suspend
	drm/fsl-dcu: enable IRQ before drm_atomic_helper_resume()
	Revert "crypto: caam - get rid of tasklet"
	mm, oom_reaper: gather each vma to prevent leaking TLB entry
	uas: Always apply US_FL_NO_ATA_1X quirk to Seagate devices
	usb: quirks: Add no-lpm quirk for KY-688 USB 3.1 Type-C Hub
	serial: 8250_pci: Add Amazon PCI serial device ID
	s390/runtime instrumentation: simplify task exit handling
	USB: serial: option: add Quectel BG96 id
	ima: fix hash algorithm initialization
	s390/pci: do not require AIS facility
	selftests/x86/ldt_get: Add a few additional tests for limits
	staging: greybus: loopback: Fix iteration count on async path
	m68k: fix ColdFire node shift size calculation
	serial: 8250_fintek: Fix rs485 disablement on invalid ioctl()
	staging: rtl8188eu: avoid a null dereference on pmlmepriv
	spi: sh-msiof: Fix DMA transfer size check
	spi: spi-axi: fix potential use-after-free after deregistration
	mmc: sdhci-msm: fix issue with power irq
	usb: phy: tahvo: fix error handling in tahvo_usb_probe()
	serial: 8250: Preserve DLD[7:4] for PORT_XR17V35X
	x86/entry: Use SYSCALL_DEFINE() macros for sys_modify_ldt()
	EDAC, sb_edac: Fix missing break in switch
	sysrq : fix Show Regs call trace on ARM
	usbip: tools: Install all headers needed for libusbip development
	perf test attr: Fix ignored test case result
	kprobes/x86: Disable preemption in ftrace-based jprobes
	tools include: Do not use poison with C++
	iio: adc: ti-ads1015: add 10% to conversion wait time
	dax: Avoid page invalidation races and unnecessary radix tree traversals
	net/mlx4_en: Fix type mismatch for 32-bit systems
	l2tp: take remote address into account in l2tp_ip and l2tp_ip6 socket lookups
	dmaengine: stm32-dma: Set correct args number for DMA request from DT
	dmaengine: stm32-dma: Fix null pointer dereference in stm32_dma_tx_status
	usb: gadget: f_fs: Fix ExtCompat descriptor validation
	libcxgb: fix error check for ip6_route_output()
	net: systemport: Utilize skb_put_padto()
	net: systemport: Pad packet before inserting TSB
	ARM: OMAP2+: Fix WL1283 Bluetooth Baud Rate
	ARM: OMAP1: DMA: Correct the number of logical channels
	vti6: fix device register to report IFLA_INFO_KIND
	be2net: fix accesses to unicast list
	be2net: fix unicast list filling
	net/appletalk: Fix kernel memory disclosure
	libfs: Modify mount_pseudo_xattr to be clear it is not a userspace mount
	net: qrtr: Mark 'buf' as little endian
	mm: fix remote numa hits statistics
	mac80211: calculate min channel width correctly
	ravb: Remove Rx overflow log messages
	nfs: Don't take a reference on fl->fl_file for LOCK operation
	drm/exynos/decon5433: update shadow registers iff there are active windows
	drm/exynos/decon5433: set STANDALONE_UPDATE_F also if planes are disabled
	KVM: arm/arm64: Fix occasional warning from the timer work function
	mac80211: prevent skb/txq mismatch
	NFSv4: Fix client recovery when server reboots multiple times
	perf/x86/intel: Account interrupts for PEBS errors
	powerpc/mm: Fix memory hotplug BUG() on radix
	qla2xxx: Fix wrong IOCB type assumption
	drm/amdgpu: fix bug set incorrect value to vce register
	drm/exynos/decon5433: set STANDALONE_UPDATE_F on output enablement
	net: sctp: fix array overrun read on sctp_timer_tbl
	x86/fpu: Set the xcomp_bv when we fake up a XSAVES area
	drm/amdgpu: fix unload driver issue for virtual display
	mac80211: don't try to sleep in rate_control_rate_init()
	RDMA/qedr: Return success when not changing QP state
	RDMA/qedr: Fix RDMA CM loopback
	tipc: fix nametbl_lock soft lockup at module exit
	tipc: fix cleanup at module unload
	dmaengine: pl330: fix double lock
	tcp: correct memory barrier usage in tcp_check_space()
	i2c: i2c-cadence: Initialize configuration before probing devices
	nvmet: cancel fatal error and flush async work before free controller
	gtp: clear DF bit on GTP packet tx
	gtp: fix cross netns recv on gtp socket
	net: phy: micrel: KSZ8795 do not set SUPPORTED_[Asym_]Pause
	net: thunderx: avoid dereferencing xcv when NULL
	be2net: fix initial MAC setting
	vfio/spapr: Fix missing mutex unlock when creating a window
	mm: avoid returning VM_FAULT_RETRY from ->page_mkwrite handlers
	xen-netfront: Improve error handling during initialization
	cec: initiator should be the same as the destination for, poll
	xen-netback: vif counters from int/long to u64
	net: fec: fix multicast filtering hardware setup
	dma-buf/dma-fence: Extract __dma_fence_is_later()
	dma-buf/sw-sync: Fix the is-signaled test to handle u32 wraparound
	dma-buf/sw-sync: Prevent user overflow on timeline advance
	dma-buf/sw-sync: Reduce irqsave/irqrestore from known context
	dma-buf/sw-sync: sync_pt is private and of fixed size
	dma-buf/sw-sync: Fix locking around sync_timeline lists
	dma-buf/sw-sync: Use an rbtree to sort fences in the timeline
	dma-buf/sw_sync: move timeline_fence_ops around
	dma-buf/sw_sync: clean up list before signaling the fence
	dma-fence: Clear fence->status during dma_fence_init()
	dma-fence: Wrap querying the fence->status
	dma-fence: Introduce drm_fence_set_error() helper
	dma-buf/sw_sync: force signal all unsignaled fences on dying timeline
	dma-buf/sync_file: hold reference to fence when creating sync_file
	dma-buf: Update kerneldoc for sync_file_create
	usb: hub: Cycle HUB power when initialization fails
	usb: xhci: fix panic in xhci_free_virt_devices_depth_first
	USB: core: Add type-specific length check of BOS descriptors
	USB: Increase usbfs transfer limit
	USB: devio: Prevent integer overflow in proc_do_submiturb()
	USB: usbfs: Filter flags passed in from user space
	usb: host: fix incorrect updating of offset
	xen-netfront: avoid crashing on resume after a failure in talk_to_netback()
	Linux 4.9.68

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-10 17:13:13 +01:00
Jiri Olsa
a88ff235e8 perf/x86/intel: Account interrupts for PEBS errors
[ Upstream commit 475113d937 ]

It's possible to set up PEBS events to get only errors and not
any data, like on SNB-X (model 45) and IVB-EP (model 62)
via 2 perf commands running simultaneously:

    taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10

This leads to a soft lock up, because the error path of the
intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt
for error PEBS interrupts, so in case you're getting ONLY
errors you don't have a way to stop the event when it's over
the max_samples_per_tick limit:

  NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816]
  ...
  RIP: 0010:[<ffffffff81159232>]  [<ffffffff81159232>] smp_call_function_single+0xe2/0x140
  ...
  Call Trace:
   ? trace_hardirqs_on_caller+0xf5/0x1b0
   ? perf_cgroup_attach+0x70/0x70
   perf_install_in_context+0x199/0x1b0
   ? ctx_resched+0x90/0x90
   SYSC_perf_event_open+0x641/0xf90
   SyS_perf_event_open+0x9/0x10
   do_syscall_64+0x6c/0x1f0
   entry_SYSCALL64_slow_path+0x25/0x25

Add perf_event_account_interrupt() which does the interrupt
and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s
error path.

We keep the pending_kill and pending_wakeup logic only in the
__perf_event_overflow() path, because they make sense only if
there's any data to deliver.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-09 22:01:52 +01:00
Viresh Kumar
87cdf4eda5 BACKPORT: schedutil: Reset cached freq if it is not in sync with next_freq
'cached_raw_freq' is used to get the next frequency quickly but should
always be in sync with sg_policy->next_freq. There are cases where it is
not and in such cases it should be reset to avoid switching to incorrect
frequencies.

Consider this case for example:
- policy->cur is 1.2 GHz (Max)
- New request comes for 780 MHz and we store that in cached_raw_freq.
- Based on 780 MHz, we calculate the effective frequency as 800 MHz.
- We then decide not to update the frequency as
  sugov_up_down_rate_limit() return true.
- Here cached_raw_freq is 780 MHz and sg_policy->next_freq is 1.2 GHz.
- Now if the utilization doesn't change in next request, then the next
  target frequency will still be 780 MHz and it will match with
  cached_raw_freq and so we will directly return 1.2 GHz instead of 800
  MHz.

BACKPORT of upstream commit 07458f6a51 ("cpufreq: schedutil: Reset
cached_raw_freq when not in sync with next_freq").

This also updates sugov_update_commit() for handling up/down tunables,
which aren't present in mainline.

Change-Id: Ie86465231e7cb265e5b4c26f59d6faf8d9630b0a
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-12-04 16:23:03 +00:00
Ke Wang
b763480947 sched: EAS/WALT: Don't take into account of running task's util
For upmigrating misfit running task case, the currently running
task's util has been counted into cpu_util(). Thus currently
__cpu_overutilized() which add task's uitl twice is overestimated.

Change-Id: I5326f4c736a55679009d2e7293f8792311c04294
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
2017-12-01 16:14:01 +00:00
Joonwoo Park
b41e1ca689 sched: EAS/WALT: take into account of waking task's load
WALT's function cpu_util(cpu) reports CPU's load without taking into
account of waking task's load.  Thus currently cpu_overutilized()
underestimates load on the previous CPU of waking task.

Take into account of task's load to determine whether previous CPU is
overutilzed to bail out early without running energy_diff() which is
expensive.

Change-Id: I30f146984a880ad2cc1b8a4ce35bd239a8c9a607
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(minor rebase conflicts)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit 94e5c96507)
[trivial cherry-pick issues]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-12-01 16:11:57 +00:00
Joonwoo Park
4f0693ad08 sched: EAS: upmigrate misfit current task
Upmigrate misfit current task upon scheduler tick with stopper.

We can kick an random (not necessarily big CPU) NOHZ idle CPU when a
CPU bound task is in need of upmigration.  But it's not efficient as that
way needs following unnecessary wakeups:

  1. Busy little CPU A to kick idle B
  2. B runs idle balancer and enqueue migration/A
  3. B goes idle
  4. A runs migration/A, enqueues busy task on B.
  5. B wakes up again.

This change makes active upmigration more efficiently by doing:

  1. Busy little CPU A find target CPU B upon tick.
  2. CPU A enqueues migration/A.

Change-Id: Ie865738054ea3296f28e6ba01710635efa7193c0
[joonwoop: The original version had logic to reserve CPU.  The logic is
 omitted in this version.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
(cherry picked from commit 9e293db052)
[trivial cherry-pick issues]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-12-01 16:11:45 +00:00
Prasad Sodagudi
d345372a29 sched: avoid pushing tasks to an offline CPU
Currently active_load_balance_cpu_stop is run by cpu stopper and it
pushes running tasks off the busiest CPU onto idle target CPU. But
there is no check to see whether target cpu is offline or not before
pushing the tasks.  With the introduction of active migration in the
scheduler tick path (see check_for_migration()) there have been
instances of attempts to migrate tasks to offline CPUs.

Add a check as to whether the target cpu is online or not to prevent
scheduling on offline CPUs.

Change-Id: Ib8ac7f8aeabd3ca7365f3eae977075952dab4f21
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
(cherry picked from commit dc626b28ee)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-12-01 16:11:35 +00:00
Srivatsa Vaddagiri
70e14af60c sched: Extend active balance to accept 'push_task' argument
Active balance currently picks one task to migrate from busy cpu to
a chosen cpu (push_cpu). This patch extends active load balance to
recognize a particular task ('push_task') that needs to be migrated to
'push_cpu'. This capability will be leveraged by HMP-aware task
placement in a subsequent patch.

Change-Id: If31320111e6cc7044e617b5c3fd6d8e0c0e16952
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
(cherry picked from commit 2da014c0d8)
[ trivial cherry-pick issues, fix "may be used uninitialized" warning
for push_task ]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-12-01 16:11:25 +00:00
Vikram Mulukutla
44310bf8ab sched: walt: Correct WALT window size initialization
It is preferable that WALT window rollover occurs just
before a tick, since the tick is an opportune moment
to record a complete window's statistics, as well as report
those stats to the cpu frequency governor. When CONFIG_HZ
results in a TICK_NSEC that isn't a integral number, this
requirement may be violated. Account for this by reducing
the WALT window size to the nearest multiple of TICK_NSEC.

Commit d368c6faa1 ("sched: walt: fix window misalignment
when HZ=300") attempted to do this but WALT isn't using
MIN_SCHED_RAVG_WINDOW as the window size and the patch was
doing nothing.

Also, change the type of 'walt_disabled' to bool and warn
if an invalid window size causes WALT to be disabled.

Change-Id: Ie3dcfc21a3df4408254ca1165a355bbe391ed5c7
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
(cherry picked from commit e79f447a97)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-12-01 16:11:15 +00:00
Joonwoo Park
7f17fff119 sched: WALT: account cumulative window demand
Energy cost estimation has been a long lasting challenge for WALT
because WALT guides CPU frequency based on the CPU utilization of
previous window.  Consequently it's not possible to know newly
waking-up task's energy cost until WALT's end of the current window.

The WALT already tracks 'Previous Runnable Sum' (prev_runnable_sum)
and 'Cumulative Runnable Average' (cr_avg).  They are designed for
CPU frequency guidance and task placement but unfortunately both
are not suitable for the energy cost estimation.

It's because using prev_runnable_sum for energy cost calculation would
make us to account CPU and task's energy solely based on activity in the
previous window so for example, any task didn't have an activity in the
previous window will be accounted as a 'zero energy cost' task.
Energy estimation with cr_avg is what energy_diff() relies on at present.
However cr_avg can only represent instantaneous picture of energy cost
thus for example, if a CPU was fully occupied for an entire WALT window
and became idle just before window boundary, and if there is a wake-up,
energy_diff() accounts that CPU is a 'zero energy cost' CPU.

As a result, introduce a new accounting unit 'Cumulative Window Demand'.
The cumulative window demand tracks all the tasks' demands have seen in
current window which is neither instantaneous nor actual execution time.
Because task demand represents estimated scaled execution time when the
task runs a full window, accumulation of all the demands represents
predicted CPU load at the end of window.

Thus we can estimate CPU's frequency at the end of current WALT window
with the cumulative window demand.

The use of prev_runnable_sum for the CPU frequency guidance and cr_avg
for the task placement have not changed and these are going to be used
for both purpose while this patch aims to add an additional statistics.

Change-Id: I9908c77ead9973a26dea2b36c001c2baf944d4f5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(cherry picked from commit 43bd960dfe)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-12-01 16:11:06 +00:00
Joonwoo Park
6cb3bed2d0 sched: EAS/WALT: finish accounting prior to task_tick
In order to set rq->misfit_task in time, call update_task_ravg() prior
to task_tick.  This reduces upmigration delay by 1 scheduler window.

Change-Id: I7cc80badd423f2e7684125fbfd853b0a3610f0e8
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
(cherry picked from commit ed9e749668)
[Trivial cherry-pick issue]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-12-01 16:10:56 +00:00
Joonwoo Park
a8611936de sched/fair: prevent meaningless active migration
At present need_active_balance() determines whether an active
upmigration is needed by using capacity_of(). A CPU's capacity
may be reduced by RT pressure, and therefore distinguishing
capability differences with capacity_of() may lead to suboptimal
active migrations to less capable CPUs. Use capacity_orig_of
to distinguish differently capable CPUs in addition to
capacity_of(), thus avoiding placing tasks on less capable CPUs
due to instantaneous RT pressure.

Change-Id: I3e1435246a8edc3ad618ef98a34866cfbd8c16a5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[markivx: Reworked the commit text a bit]
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
(cherry picked from commit 7ab48e4c8d)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-12-01 16:10:47 +00:00
Vikram Mulukutla
b28cab9cd2 sched: walt: Leverage existing helper APIs to apply invariance
There's no need for a separate hierarchy of notifiers, APIs
and variables in walt.c for the purpose of applying frequency
and IPC invariance. Let's just use capacity_curr_of and get
rid of a lot of the infrastructure relating to capacity,
load_scale_factor etc.

Change-Id: Ia220e2c896373fa535db05bff60f9aa33aefc978
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
(cherry picked from commit be832f69a9)
[Trivial cherry pick issues]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-12-01 16:10:25 +00:00
Greg Kroah-Hartman
c1a286429a Merge 4.9.66 into android-4.9
Changes in 4.9.66
	s390: fix transactional execution control register handling
	s390/runtime instrumention: fix possible memory corruption
	s390/disassembler: add missing end marker for e7 table
	s390/disassembler: increase show_code buffer size
	ACPI / EC: Fix regression related to triggering source of EC event handling
	x86/mm: fix use-after-free of vma during userfaultfd fault
	ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
	vsock: use new wait API for vsock_stream_sendmsg()
	sched: Make resched_cpu() unconditional
	lib/mpi: call cond_resched() from mpi_powm() loop
	x86/decoder: Add new TEST instruction pattern
	x86/entry/64: Add missing irqflags tracing to native_load_gs_index()
	arm64: Implement arch-specific pte_access_permitted()
	ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE
	ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
	MIPS: ralink: Fix MT7628 pinmux
	MIPS: ralink: Fix typo in mt7628 pinmux function
	PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF
	ALSA: hda: Add Raven PCI ID
	dm bufio: fix integer overflow when limiting maximum cache size
	dm: allocate struct mapped_device with kvzalloc
	MIPS: pci: Remove KERN_WARN instance inside the mt7620 driver
	dm: fix race between dm_get_from_kobject() and __dm_destroy()
	MIPS: Fix odd fp register warnings with MIPS64r2
	MIPS: dts: remove bogus bcm96358nb4ser.dtb from dtb-y entry
	MIPS: Fix an n32 core file generation regset support regression
	MIPS: BCM47XX: Fix LED inversion for WRT54GSv1
	rt2x00usb: mark device removed when get ENOENT usb error
	autofs: don't fail mount for transient error
	nilfs2: fix race condition that causes file system corruption
	eCryptfs: use after free in ecryptfs_release_messaging()
	libceph: don't WARN() if user tries to add invalid key
	bcache: check ca->alloc_thread initialized before wake up it
	isofs: fix timestamps beyond 2027
	NFS: Fix typo in nomigration mount option
	nfs: Fix ugly referral attributes
	NFS: Avoid RCU usage in tracepoints
	nfsd: deal with revoked delegations appropriately
	rtlwifi: rtl8192ee: Fix memory leak when loading firmware
	rtlwifi: fix uninitialized rtlhal->last_suspend_sec time
	ata: fixes kernel crash while tracing ata_eh_link_autopsy event
	ext4: fix interaction between i_size, fallocate, and delalloc after a crash
	ALSA: pcm: update tstamp only if audio_tstamp changed
	ALSA: usb-audio: Add sanity checks to FE parser
	ALSA: usb-audio: Fix potential out-of-bound access at parsing SU
	ALSA: usb-audio: Add sanity checks in v2 clock parsers
	ALSA: timer: Remove kernel warning at compat ioctl error paths
	ALSA: hda: Fix too short HDMI/DP chmap reporting
	ALSA: hda/realtek - Fix ALC700 family no sound issue
	fix a page leak in vhost_scsi_iov_to_sgl() error recovery
	fs/9p: Compare qid.path in v9fs_test_inode
	iscsi-target: Fix non-immediate TMR reference leak
	target: Fix QUEUE_FULL + SCSI task attribute handling
	mtd: nand: omap2: Fix subpage write
	mtd: nand: Fix writing mtdoops to nand flash.
	mtd: nand: mtk: fix infinite ECC decode IRQ issue
	p54: don't unregister leds when they are not initialized
	block: Fix a race between blk_cleanup_queue() and timeout handling
	irqchip/gic-v3: Fix ppi-partitions lookup
	lockd: double unregister of inetaddr notifiers
	KVM: nVMX: set IDTR and GDTR limits when loading L1 host state
	KVM: SVM: obey guest PAT
	SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status
	clk: ti: dra7-atl-clock: fix child-node lookups
	libnvdimm, pfn: make 'resource' attribute only readable by root
	libnvdimm, namespace: fix label initialization to use valid seq numbers
	libnvdimm, namespace: make 'resource' attribute only readable by root
	IB/srpt: Do not accept invalid initiator port names
	IB/srp: Avoid that a cable pull can trigger a kernel crash
	NFC: fix device-allocation error return
	i40e: Use smp_rmb rather than read_barrier_depends
	igb: Use smp_rmb rather than read_barrier_depends
	igbvf: Use smp_rmb rather than read_barrier_depends
	ixgbevf: Use smp_rmb rather than read_barrier_depends
	i40evf: Use smp_rmb rather than read_barrier_depends
	fm10k: Use smp_rmb rather than read_barrier_depends
	ixgbe: Fix skb list corruption on Power systems
	parisc: Fix validity check of pointer size argument in new CAS implementation
	powerpc/signal: Properly handle return value from uprobe_deny_signal()
	media: Don't do DMA on stack for firmware upload in the AS102 driver
	media: rc: check for integer overflow
	cx231xx-cards: fix NULL-deref on missing association descriptor
	media: v4l2-ctrl: Fix flags field on Control events
	sched/rt: Simplify the IPI based RT balancing logic
	fscrypt: lock mutex before checking for bounce page pool
	net/9p: Switch to wait_event_killable()
	PM / OPP: Add missing of_node_put(np)
	Revert "drm/i915: Do not rely on wm preservation for ILK watermarks"
	e1000e: Fix error path in link detection
	e1000e: Fix return value test
	e1000e: Separate signaling for link check/link up
	e1000e: Avoid receiver overrun interrupt bursts
	RDS: make message size limit compliant with spec
	RDS: RDMA: return appropriate error on rdma map failures
	RDS: RDMA: fix the ib_map_mr_sg_zbva() argument
	PCI: Apply _HPX settings only to relevant devices
	drm/sun4i: Fix a return value in case of error
	clk: sunxi-ng: A31: Fix spdif clock register
	clk: sunxi-ng: fix PLL_CPUX adjusting on A33
	dmaengine: zx: set DMA_CYCLIC cap_mask bit
	fscrypt: use ENOKEY when file cannot be created w/o key
	fscrypt: use ENOTDIR when setting encryption policy on nondirectory
	net: Allow IP_MULTICAST_IF to set index to L3 slave
	net: 3com: typhoon: typhoon_init_one: make return values more specific
	net: 3com: typhoon: typhoon_init_one: fix incorrect return values
	drm/armada: Fix compile fail
	rt2800: set minimum MPDU and PSDU lengths to sane values
	adm80211: return an error if adm8211_alloc_rings() fails
	mwifiex: sdio: fix use after free issue for save_adapter
	ath10k: fix incorrect txpower set by P2P_DEVICE interface
	ath10k: ignore configuring the incorrect board_id
	ath10k: fix potential memory leak in ath10k_wmi_tlv_op_pull_fw_stats()
	pinctrl: sirf: atlas7: Add missing 'of_node_put()'
	bnxt_en: Set default completion ring for async events.
	ath10k: set CTS protection VDEV param only if VDEV is up
	ALSA: hda - Apply ALC269_FIXUP_NO_SHUTUP on HDA_FIXUP_ACT_PROBE
	gpio: mockup: dynamically allocate memory for chip name
	drm: Apply range restriction after color adjustment when allocation
	clk: qcom: ipq4019: Add all the frequencies for apss cpu
	drm/mediatek: don't use drm_put_dev
	mac80211: Remove invalid flag operations in mesh TSF synchronization
	mac80211: Suppress NEW_PEER_CANDIDATE event if no room
	adm80211: add checks for dma mapping errors
	iio: light: fix improper return value
	staging: iio: cdc: fix improper return value
	spi: SPI_FSL_DSPI should depend on HAS_DMA
	netfilter: nft_queue: use raw_smp_processor_id()
	netfilter: nf_tables: fix oob access
	ASoC: rsnd: don't double free kctrl
	crypto: marvell - Copy IVDIG before launching partial DMA ahash requests
	btrfs: return the actual error value from from btrfs_uuid_tree_iterate
	ASoC: wm_adsp: Don't overrun firmware file buffer when reading region data
	s390/kbuild: enable modversions for symbols exported from asm
	cec: when canceling a message, don't overwrite old status info
	cec: CEC_MSG_GIVE_FEATURES should abort for CEC version < 2
	cec: update log_addr[] before finishing configuration
	nvmet: fix KATO offset in Set Features
	xen: xenbus driver must not accept invalid transaction ids
	Linux 4.9.66

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-11-30 16:24:14 +00:00
Steven Rostedt (Red Hat)
1c37ff7829 sched/rt: Simplify the IPI based RT balancing logic
commit 4bdced5c9a upstream.

When a CPU lowers its priority (schedules out a high priority task for a
lower priority one), a check is made to see if any other CPU has overloaded
RT tasks (more than one). It checks the rto_mask to determine this and if so
it will request to pull one of those tasks to itself if the non running RT
task is of higher priority than the new priority of the next task to run on
the current CPU.

When we deal with large number of CPUs, the original pull logic suffered
from large lock contention on a single CPU run queue, which caused a huge
latency across all CPUs. This was caused by only having one CPU having
overloaded RT tasks and a bunch of other CPUs lowering their priority. To
solve this issue, commit:

  b6366f048e ("sched/rt: Use IPI to trigger RT task push migration instead of pulling")

changed the way to request a pull. Instead of grabbing the lock of the
overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work.

Although the IPI logic worked very well in removing the large latency build
up, it still could suffer from a large number of IPIs being sent to a single
CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet,
when I tested this on a 120 CPU box, with a stress test that had lots of
RT tasks scheduling on all CPUs, it actually triggered the hard lockup
detector! One CPU had so many IPIs sent to it, and due to the restart
mechanism that is triggered when the source run queue has a priority status
change, the CPU spent minutes! processing the IPIs.

Thinking about this further, I realized there's no reason for each run queue
to send its own IPI. As all CPUs with overloaded tasks must be scanned
regardless if there's one or many CPUs lowering their priority, because
there's no current way to find the CPU with the highest priority task that
can schedule to one of these CPUs, there really only needs to be one IPI
being sent around at a time.

This greatly simplifies the code!

The new approach is to have each root domain have its own irq work, as the
rto_mask is per root domain. The root domain has the following fields
attached to it:

  rto_push_work	 - the irq work to process each CPU set in rto_mask
  rto_lock	 - the lock to protect some of the other rto fields
  rto_loop_start - an atomic that keeps contention down on rto_lock
		    the first CPU scheduling in a lower priority task
		    is the one to kick off the process.
  rto_loop_next	 - an atomic that gets incremented for each CPU that
		    schedules in a lower priority task.
  rto_loop	 - a variable protected by rto_lock that is used to
		    compare against rto_loop_next
  rto_cpu	 - The cpu to send the next IPI to, also protected by
		    the rto_lock.

When a CPU schedules in a lower priority task and wants to make sure
overloaded CPUs know about it. It increments the rto_loop_next. Then it
atomically sets rto_loop_start with a cmpxchg. If the old value is not "0",
then it is done, as another CPU is kicking off the IPI loop. If the old
value is "0", then it will take the rto_lock to synchronize with a possible
IPI being sent around to the overloaded CPUs.

If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no
IPI being sent around, or one is about to finish. Then rto_cpu is set to the
first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set
in rto_mask, then there's nothing to be done.

When the CPU receives the IPI, it will first try to push any RT tasks that is
queued on the CPU but can't run because a higher priority RT task is
currently running on that CPU.

Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it
finds one, it simply sends an IPI to that CPU and the process continues.

If there's no more CPUs in the rto_mask, then rto_loop is compared with
rto_loop_next. If they match, everything is done and the process is over. If
they do not match, then a CPU scheduled in a lower priority task as the IPI
was being passed around, and the process needs to start again. The first CPU
in rto_mask is sent the IPI.

This change removes this duplication of work in the IPI logic, and greatly
lowers the latency caused by the IPIs. This removed the lockup happening on
the 120 CPU machine. It also simplifies the code tremendously. What else
could anyone ask for?

Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and
supplying me with the rto_start_trylock() and rto_start_unlock() helper
functions.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Clark Williams <williams@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Wood <swood@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-30 08:39:09 +00:00
Paul E. McKenney
fb8bd56e35 sched: Make resched_cpu() unconditional
commit 7c2102e56a upstream.

The current implementation of synchronize_sched_expedited() incorrectly
assumes that resched_cpu() is unconditional, which it is not.  This means
that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
fails as follows (analysis by Neeraj Upadhyay):

o	CPU1 is waiting for expedited wait to complete:

	sync_rcu_exp_select_cpus
	     rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
	     IPI sent to CPU5

	synchronize_sched_expedited_wait
		 ret = swait_event_timeout(rsp->expedited_wq,
					   sync_rcu_preempt_exp_done(rnp_root),
					   jiffies_stall);

	expmask = 0x20, CPU 5 in idle path (in cpuidle_enter())

o	CPU5 handles IPI and fails to acquire rq lock.

	Handles IPI
	     sync_sched_exp_handler
		 resched_cpu
		     returns while failing to try lock acquire rq->lock
		 need_resched is not set

o	CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
	idle (schedule() is not called).

o	CPU 1 reports RCU stall.

Given that resched_cpu() is now used only by RCU, this commit fixes the
assumption by making resched_cpu() unconditional.

Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Suggested-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-30 08:39:01 +00:00
John Stultz
5311c740c0 UPSTREAM: time: Clean up CLOCK_MONOTONIC_RAW time handling
(cherry pick from commit fc6eead7c1)

Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW,
remove the duplicitive tk->raw_time.tv_nsec, which can be
stored in tk->tkr_raw.xtime_nsec (similarly to how its handled
for monotonic time).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Daniel Mentz <danielmentz@google.com>
Tested-by: Daniel Mentz <danielmentz@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 20045882
Bug: 63737556
Change-Id: I243827d21b08703a09d2d2fe738a9258be224582
2017-11-29 13:44:16 -08:00
Greg Kroah-Hartman
a6d71ba679 Merge 4.9.62 into android-4.9
Changes in 4.9.62
	adv7604: Initialize drive strength to default when using DT
	video: fbdev: pmag-ba-fb: Remove bad `__init' annotation
	PCI: mvebu: Handle changes to the bridge windows while enabled
	sched/core: Add missing update_rq_clock() call in sched_move_task()
	xen/netback: set default upper limit of tx/rx queues to 8
	ARM: dts: imx53-qsb-common: fix FEC pinmux config
	dt-bindings: clockgen: Add compatible string for LS1012A
	EDAC, amd64: Add x86cpuid sanity check during init
	PM / OPP: Error out on failing to add static OPPs for v1 bindings
	clk: samsung: exynos5433: Add IDs for PHYCLK_MIPIDPHY0_* clocks
	drm: drm_minor_register(): Clean up debugfs on failure
	KVM: PPC: Book 3S: XICS: correct the real mode ICP rejecting counter
	iommu/arm-smmu-v3: Clear prior settings when updating STEs
	pinctrl: baytrail: Fix debugfs offset output
	powerpc/corenet: explicitly disable the SDHC controller on kmcoge4
	cxl: Force psl data-cache flush during device shutdown
	ARM: omap2plus_defconfig: Fix probe errors on UARTs 5 and 6
	arm64: dma-mapping: Only swizzle DMA ops for IOMMU_DOMAIN_DMA
	crypto: vmx - disable preemption to enable vsx in aes_ctr.c
	drm: mali-dp: fix Lx_CONTROL register fields clobber
	iio: trigger: free trigger resource correctly
	iio: pressure: ms5611: claim direct mode during oversampling changes
	iio: magnetometer: mag3110: claim direct mode during raw writes
	iio: proximity: sx9500: claim direct mode during raw proximity reads
	dt-bindings: Add LEGO MINDSTORMS EV3 compatible specification
	dt-bindings: Add vendor prefix for LEGO
	phy: increase size of MII_BUS_ID_SIZE and bus_id
	serial: sh-sci: Fix register offsets for the IRDA serial port
	libertas: fix improper return value
	usb: hcd: initialize hcd->flags to 0 when rm hcd
	netfilter: nft_meta: deal with PACKET_LOOPBACK in netdev family
	brcmfmac: setup wiphy bands after registering it first
	rt2800usb: mark tx failure on timeout
	apparmor: fix undefined reference to `aa_g_hash_policy'
	IPsec: do not ignore crypto err in ah4 input
	EDAC, amd64: Save and return err code from probe_one_instance()
	s390/topology: make "topology=off" parameter work
	Input: mpr121 - handle multiple bits change of status register
	Input: mpr121 - set missing event capability
	sched/cputime, powerpc32: Fix stale scaled stime on context switch
	IB/ipoib: Change list_del to list_del_init in the tx object
	ARM: dts: STiH410-family: fix wrong parent clock frequency
	s390/qeth: fix retrieval of vipa and proxy-arp addresses
	s390/qeth: issue STARTLAN as first IPA command
	wcn36xx: Don't use the destroyed hal_mutex
	IB/rxe: Fix reference leaks in memory key invalidation code
	clk: mvebu: adjust AP806 CPU clock frequencies to production chip
	net: dsa: select NET_SWITCHDEV
	platform/x86: hp-wmi: Fix detection for dock and tablet mode
	cdc_ncm: Set NTB format again after altsetting switch for Huawei devices
	KEYS: trusted: sanitize all key material
	KEYS: trusted: fix writing past end of buffer in trusted_read()
	platform/x86: hp-wmi: Fix error value for hp_wmi_tablet_state
	platform/x86: hp-wmi: Do not shadow error values
	x86/uaccess, sched/preempt: Verify access_ok() context
	workqueue: Fix NULL pointer dereference
	crypto: ccm - preserve the IV buffer
	crypto: x86/sha1-mb - fix panic due to unaligned access
	crypto: x86/sha256-mb - fix panic due to unaligned access
	KEYS: fix NULL pointer dereference during ASN.1 parsing [ver #2]
	ARM: 8720/1: ensure dump_instr() checks addr_limit
	ALSA: seq: Fix OSS sysex delivery in OSS emulation
	ALSA: seq: Avoid invalid lockdep class warning
	drm/i915: Do not rely on wm preservation for ILK watermarks
	MIPS: microMIPS: Fix incorrect mask in insn_table_MM
	MIPS: Fix CM region target definitions
	MIPS: SMP: Use a completion event to signal CPU up
	MIPS: Fix race on setting and getting cpu_online_mask
	MIPS: SMP: Fix deadlock & online race
	selftests: firmware: send expected errors to /dev/null
	tools: firmware: check for distro fallback udev cancel rule
	ASoC: sun4i-spdif: remove legacy dapm components
	MIPS: BMIPS: Fix missing cbr address
	MIPS: AR7: Defer registration of GPIO
	MIPS: AR7: Ensure that serial ports are properly set up
	Input: elan_i2c - add ELAN060C to the ACPI table
	rbd: use GFP_NOIO for parent stat and data requests
	drm/vmwgfx: Fix Ubuntu 17.10 Wayland black screen issue
	drm/bridge: adv7511: Rework adv7511_power_on/off() so they can be reused internally
	drm/bridge: adv7511: Reuse __adv7511_power_on/off() when probing EDID
	drm/bridge: adv7511: Re-write the i2c address before EDID probing
	can: sun4i: handle overrun in RX FIFO
	can: ifi: Fix transmitter delay calculation
	can: c_can: don't indicate triple sampling support for D_CAN
	x86/smpboot: Make optimization of delay calibration work correctly
	x86/oprofile/ppro: Do not use __this_cpu*() in preemptible context
	Linux 4.9.62

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-11-15 16:13:49 +01:00
Li Bin
46f15501c5 workqueue: Fix NULL pointer dereference
commit cef572ad9b upstream.

When queue_work() is used in irq (not in task context), there is
a potential case that trigger NULL pointer dereference.
----------------------------------------------------------------
worker_thread()
|-spin_lock_irq()
|-process_one_work()
	|-worker->current_pwq = pwq
	|-spin_unlock_irq()
	|-worker->current_func(work)
	|-spin_lock_irq()
 	|-worker->current_pwq = NULL
|-spin_unlock_irq()

				//interrupt here
				|-irq_handler
					|-__queue_work()
						//assuming that the wq is draining
						|-is_chained_work(wq)
							|-current_wq_worker()
							//Here, 'current' is the interrupted worker!
								|-current->current_pwq is NULL here!
|-schedule()
----------------------------------------------------------------

Avoid it by checking for task context in current_wq_worker(), and
if not in task context, we shouldn't use the 'current' to check the
condition.

Reported-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Li Bin <huawei.libin@huawei.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 8d03ecfe47 ("workqueue: reimplement is_chained_work() using current_wq_worker()")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-15 15:53:17 +01:00
Peter Zijlstra
6da1c989cc sched/core: Add missing update_rq_clock() call in sched_move_task()
[ Upstream commit 1b1d62254d ]

Bug was noticed via this warning:

  WARNING: CPU: 6 PID: 1 at kernel/sched/sched.h:804 detach_task_cfs_rq+0x8e8/0xb80
  rq->clock_update_flags < RQCF_ACT_SKIP
  Modules linked in:
  CPU: 6 PID: 1 Comm: systemd Not tainted 4.10.0-rc5-00140-g0874170baf55-dirty #1
  Hardware name: Supermicro SYS-4048B-TRFT/X10QBi, BIOS 1.0 04/11/2014
  Call Trace:
   dump_stack+0x4d/0x65
   __warn+0xcb/0xf0
   warn_slowpath_fmt+0x5f/0x80
   detach_task_cfs_rq+0x8e8/0xb80
   ? allocate_cgrp_cset_links+0x59/0x80
   task_change_group_fair+0x27/0x150
   sched_change_group+0x48/0xf0
   sched_move_task+0x53/0x150
   cpu_cgroup_attach+0x36/0x70
   cgroup_taskset_migrate+0x175/0x300
   cgroup_migrate+0xab/0xd0
   cgroup_attach_task+0xf0/0x190
   __cgroup_procs_write+0x1ed/0x2f0
   cgroup_procs_write+0x14/0x20
   cgroup_file_write+0x3f/0x100
   kernfs_fop_write+0x104/0x180
   __vfs_write+0x37/0x140
   vfs_write+0xb8/0x1b0
   SyS_write+0x55/0xc0
   do_syscall_64+0x61/0x170
   entry_SYSCALL64_slow_path+0x25/0x25

Reported-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-15 15:53:11 +01:00
Joonwoo Park
cc1034265e sched: WALT: fix potential overflow
Task demand and CPU util are in u64.

Change-Id: If7ec1623e723026d3346201122aab0303a6d2ba2
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(cherry picked from commit c8b8c92bbc)
[trivial cherry-pick issues]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-14 16:41:51 +00:00
Olav Haugan
77ba2b9f0b sched: Update task->on_rq when tasks are moving between runqueues
Task->on_rq has three states:
	0 - Task is not on runqueue (rq)
	1 (TASK_ON_RQ_QUEUED) - Task is on rq
	2 (TASK_ON_RQ_MIGRATING) - Task is on rq but in the
	process of being migrated to another rq

When a task is moving between rqs task->on_rq state should be
TASK_ON_RQ_MIGRATING in order for WALT to account rq's cumulative
runnable average correctly.  Without such state marking for all the
classes, WALT's update_history() would try to fixup task's demand
which was never contributed to any of CPUs during migration.

Change-Id: I65e74a8f176c3ed4b8577577f6da8897ecda7bb8
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[joonwoop: Reinforced changelog to explain why this is needed by WALT.
           Fixed conflicts in deadline.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(cherry picked from commit 0f8791c90a99f718c43ab8214076c0c671a36667)
[fixed cherry-pick issue due to missing fab5cc59bf]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-14 16:39:00 +00:00
Joonwoo Park
98a5fa3b95 sched: WALT: fix window mis-alignment
The initial window start needs to be close to ktime ns = 0 to be
aligned with scheduler tick.

Change-Id: Ia91f74efce2f910106622a054a6fcd507e763ca5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(cherry picked from commit 0caf1df0c5)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-14 16:38:50 +00:00
Joonwoo Park
cd76b21535 sched: EAS: kill incorrect nohz idle cpu kick
EAS won't allow NOHZ idle balancer until CPU's over utilized.  However
nohz_kick_needed() can return true.  This causes idle CPU wake up for
nothing.

Change-Id: I6e548442e29e4f85cda695e4c7101dd591b12fe6
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(cherry picked from commit 3989a247e2)
[trivial cherry-pick issues]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-14 16:38:38 +00:00
Joonwoo Park
d7a6b8be91 sched: EAS: fix incorrect energy delta calculation due to rounding error
In order to calculate energy difference we currently iterates CPUs under
the same sched doamin to accumulate total energy cost and compare before
and after :

  for_each_domain(cpu)
          total_energy_before += (cpu_util * power) >> SCHED_CAPACITY_SHIFT;

  for_each_domain(cpu)
          total_energy_after += (cpu_util * power) >> SCHED_CAPACITY_SHIFT;

Doing such can incorrectly calculate and report abs(delta) > 0 when
there is actually no energy delta between before and after because the
same total accumulated cpu_util of all the CPUs can be distributed
differently before and after and it causes different amount of rounding
error.

Fix such incorrectness by shifting just once with accumulated
total_energy.

Change-Id: I82f1e2e358367058960938b4ef81714f57e921cf
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(moved part to another commit)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit 11b618a0b2)
[trivial cherry-pick issues]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-14 16:38:23 +00:00
Joonwoo Park
8b34bba642 sched: EAS/WALT: use cr_avg instead of prev_runnable_sum
WALT accounts two major statistics; CPU load and cumulative tasks
demand.

The CPU load which is account of accumulated each CPU's absolute
execution time is for CPU frequency guidance.  Whereas cumulative
tasks demand which is each CPU's instantaneous load to reflect
CPU's load at given time is for task placement decision.

Use cumulative tasks demand for cpu_util() for task placement and
introduce cpu_util_freq() for frequency guidance.

Change-Id: Id928f01dbc8cb2a617cdadc584c1f658022565c5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(cherry picked from commit ee4cebd75e)
[removed schedfreq dependency]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-14 16:35:34 +00:00
Joonwoo Park
8b1a1ce14f sched: WALT: fix broken cumulative runnable average accounting
When running tasks's ravg.demand is changed update_history() adjusts
rq->cumulative_runnable_avg to reflect change of CPU load.  Currently
this fixup is broken by accumulating task's new demand without
subtracting the task's old demand.

Fix the fixup logic to subtract the task's old demand.

Change-Id: I61beb32a4850879ccb39b733f5564251e465bfeb
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(cherry picked from commit 48f67ea85d)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-14 16:34:26 +00:00
Joonwoo Park
241a319ae7 sched: deadline: WALT: account cumulative runnable avg
Account cumulative runnable average for WALT CPU utilization accounting.

Change-Id: I56934894e626dec183740eeaf89a57d2ef638143
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(cherry picked from commit 26b37261ea)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-14 16:33:12 +00:00
Quentin Perret
904c79c425 sched: compute task utilisation with WALT consistently
Using WALT, the utilisation of a task is computed with a resolution
scaling factor that has been used inconsistently in the code with either
hardcoded values or macros (NICE_0_LOAD_SHIFT in this case). Changes in
these macros (as the 32 to 64 bits resolution shift of 2159197d66)
happened to break the utilisation calculation wherever they have been
used whilst results remained correct in other places. This commit fixes
this issue by using SCHED_CAPACITY_SCALE as resolution scaling factor
consistently.

Change-Id: Ic5418f8a5dfc455a22bafbebb4142b4665b61c6f
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-10 14:16:08 +00:00
Chenbo Feng
0521e0b3fc UPSTREAM: selinux: bpf: Add addtional check for bpf object file receive
Introduce a bpf object related check when sending and receiving files
through unix domain socket as well as binder. It checks if the receiving
process have privilege to read/write the bpf map or use the bpf program.
This check is necessary because the bpf maps and programs are using a
anonymous inode as their shared inode so the normal way of checking the
files and sockets when passing between processes cannot work properly on
eBPF object. This check only works when the BPF_SYSCALL is configured.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry-pick from net-next: f66e448cfd)
Bug: 30950746

Change-Id: I5b2cf4ccb4eab7eda91ddd7091d6aa3e7ed9f2cd
2017-11-07 12:59:54 -08:00
Chenbo Feng
f3ad3766a9 BACKPORT: security: bpf: Add LSM hooks for bpf object related syscall
Introduce several LSM hooks for the syscalls that will allow the
userspace to access to eBPF object such as eBPF programs and eBPF maps.
The security check is aimed to enforce a per object security protection
for eBPF object so only processes with the right priviliges can
read/write to a specific map or use a specific eBPF program. Besides
that, a general security hook is added before the multiplexer of bpf
syscall to check the cmd and the attribute used for the command. The
actual security module can decide which command need to be checked and
how the cmd should be checked.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Added the LIST_HEAD_INIT call for security hooks, it nolonger exist in
uptream code.
(cherry-pick from net-next: afdb09c720)
Bug: 30950746

Change-Id: Ieb3ac74392f531735fc7c949b83346a5f587a77b
2017-11-07 12:59:20 -08:00
Chenbo Feng
4672ded3ec BACKPORT: bpf: Add file mode configuration into bpf maps
Introduce the map read/write flags to the eBPF syscalls that returns the
map fd. The flags is used to set up the file mode when construct a new
file descriptor for bpf maps. To not break the backward capability, the
f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise
it should be O_RDONLY or O_WRONLY. When the userspace want to modify or
read the map content, it will check the file mode to see if it is
allowed to make the change.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Deleted the file mode configuration code in unsupported map type and
removed the file mode check in non-existing helper functions.
(cherry-pick from net-next: 6e71b04a82)
Bug: 30950746

Change-Id: Icfad20f1abb77f91068d244fb0d87fa40824dd1b
2017-11-07 12:47:56 -08:00
Viresh Kumar
e0907557ef cpufreq: Drop schedfreq governor
We all should be using (and improving) the schedutil governor now. Get
rid of the non-upstream governor.

Tested on Hikey.

Change-Id: I2104558b03118b0a9c5f099c23c42cd9a6c2a963
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-11-07 10:31:04 +05:30
Chris Redpath
dfe0a9bcfc Merge branch 'ack/android-4.9-eas-dev' into ack/android_4.9/merge_eas_dev_r1.4
Merge in the EAS r1.4 patches from eas-dev to 4.9 common branch.

There is one patch in android-4.9-eas-dev which is not part of the 1.4
patches
  ANDROID: sched/fair: Select correct capacity state for energy_diff
but we have already merged it into android-4.4 so in the interests
of keeping aligned, let's include that in the merge.

Merge Log:
* ack/android-4.9-eas-dev:
  sched: EAS: Fix the condition to distinguish energy before/after
  sched: EAS: update trg_cpu to backup_cpu if no energy saving for target_cpu
  sched/fair: consider task utilization in group_max_util()
  sched/fair: consider task utilization in group_norm_util()
  sched/fair: enforce EAS mode
  sched/fair: ignore backup CPU when not valid
  sched/fair: trace energy_diff for non boosted tasks
  UPSTREAM: sched/fair: Sync task util before slow-path wakeup
  UPSTREAM: sched/core: Add missing update_rq_clock() call in set_user_nice()
  UPSTREAM: sched/core: Add missing update_rq_clock() call for task_hot()
  UPSTREAM: sched/core: Add missing update_rq_clock() in detach_task_cfs_rq()
  UPSTREAM: sched/core: Add missing update_rq_clock() in post_init_entity_util_avg()
  UPSTREAM: sched/fair: Fix task group initialization
  cpufreq/sched: Consider max cpu capacity when choosing frequencies
  cpufreq/sched: Use cpu max freq rather than policy max
  sched/fair: remove erroneous RCU_LOCKDEP_WARN from start_cpu()
  ANDROID: sched/fair: Select correct capacity state for energy_diff
  UPSTREAM: sched/fair: Fix usage of find_idlest_group() when the local group is idlest
  UPSTREAM: sched/fair: Fix usage of find_idlest_group() when no groups are allowed
  UPSTREAM: sched/fair: Fix find_idlest_group() when local group is not allowed
  UPSTREAM: sched/fair: Remove unnecessary comparison with -1
  UPSTREAM: sched/fair: Move select_task_rq_fair() slow-path into its own function
  UPSTREAM: sched/fair: Force balancing on NOHZ balance if local group has capacity
  UPSTREAM: sched: use load_avg for selecting idlest group
  UPSTREAM: sched: fix find_idlest_group for fork

Change-Id: I57bc516f9c804bfc7144a6a5bcf70572d82f7321
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-11-03 13:51:48 +00:00
Ke Wang
c409b20240 sched: EAS: Fix the condition to distinguish energy before/after
Before commit 5f8b3a757d ("sched/fair: consider task utilization in
group_norm_util()"), eenv->util_delta is used to distinguish energy
before and energy after in sched_group_energy(). After that commit,
eenv->util_delta can not do that any more.

In this commit, use trg_cpu to distinguish energy before/after in
sched_group_energy().

Before apply this commit, cap_before/cap_delta is not correct:
<idle>-0 [001] 147504.608920: sched_energy_diff: pid=7 comm=rcu_preempt
src_cpu=1 dst_cpu=3 usage_delta=7 nrg_before=250 nrg_after=250 nrg_diff=0
cap_before=0 cap_after=528 cap_delta=1056 nrg_delta=0 nrg_payoff=0

After apply this commit, cap_before/cap_delta retrun to normal:
<idle>-0 [001] 220.494011: sched_energy_diff:    pid=7 comm=rcu_preempt
src_cpu=1 dst_cpu=2 usage_delta=3 nrg_before=248 nrg_after=248 nrg_diff=0
cap_before=528 cap_after=528 cap_delta=0 nrg_delta=0 nrg_payoff=0

Change-Id: I7b5f7ccce56e93af7ea4e87d8e0ea6e2405f9c27
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
(cherry picked from commit 0da783a605cd20d5a37c2a840e8a1fa641c09768)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-02 18:24:24 +00:00
Ke Wang
ece6d3b76e sched: EAS: update trg_cpu to backup_cpu if no energy saving for target_cpu
If no energy saving for target_cpu in the calculation of energy_diff(),
backup_cpu will be set as the new dst_cpu for the next calculation. At this
point, we also need update the new trg_cpu as backup_cpu to make sure the
subsequent calculation of energy_diff() is correct.

Change-Id: If3b35b6dc54865f1cb4b1603134102d4422227d5
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
(cherry picked from commit b1923e22f4eca3e537e015d6ea3dce1187edea37)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-02 18:24:04 +00:00
Patrick Bellasi
06d637c9f9 sched/fair: consider task utilization in group_max_util()
The group_max_util() function is used to compute the maximum utilization
across the CPUs of a certain energy_env configuration.
Its main client is the energy_diff function when it needs to compute the
SG capacity for one of the before/after scheduling candidates.

Currently, the energy_diff function sets util_delta = 0 when it wants to
compute the energy corresponding to the scheduling candidate where the
task runs in the previous CPU. This implies that, for the task waking up
in the previous CPU we consider only its blocked load tracked by the CPU
RQ. However, in case of a medium-big task which is waking up on a long
time idle CPU, this blocked load can be already completely decayed.

More in general, the current approach is biased towards under-estimating
the capacity requirements for the "before" scheduling candidate.

This patch fixes this by:
- always use the cpu_util_wake() to properly get the utilization of a CPU
  without any (partially decayed) contribution of the waking up task
- adding the task utilization to the cpu_util_wake just for the target
  cpu

The "target CPU" is defined by the energy_env to be either the src_cpu or
the dst_cpu, depending on which scheduling candidate we are considering.

Finally, since this update removes the last usage of calc_util_delta()
this function is now safely removed.

Change-Id: I20ee1bcf40cee6bf6e265fb2d32ef79061ad6ced
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit 52d70152fade678e304683bd1a5842af95e83558)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-02 18:23:46 +00:00
Chris Redpath
4530ed9a46 sched/fair: consider task utilization in group_norm_util()
The group_norm_util() function is used to compute the normalized
utilization of a SG given a certain energy_env configuration.
The main client of this function is the energy_diff function when it
comes to compute the SG energy for one of the before/after scheduling
candidates.

Currently, the energy_diff function sets util_delta = 0 when it wants to
compute the energy corresponding to the scheduling candidate where the
task runs in the previous CPU. This implies that, for the task waking up
in the previous CPU we consider only its blocked load tracked by the CPU
RQ. However, in case of a medium-big task which is waking up on a long
time idle CPU, this blocked load can be already completely decayed.

More in general, the current approach is biased towards under-estimating
the energy consumption for the "before" scheduling candidate.

This patch fixes this by:
- always use the cpu_util_wake() to properly get the utilization of a CPU
  without any (partially decayed) contribution of the waking up task
- adding the task utilization to the cpu_util_wake just for the
  target cpu

The "target CPU" is defined by the energy_env to be either the src_cpu
or the dst_cpu, depending on which scheduling candidate we are
considering.

This patch update also the definition of __cpu_norm_util(), which is
currently called just by the group_norm_util() function. This allows to
simplify the code by using this function just to normalize a specified
utilization with respect to a given capacity.

This update allows to completely remove any dependency of
group_norm_util() from calc_util_delta().

Change-Id: I3b6ec50ce8decb1521faae660e326ab3319d3c82
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit ef34ea830347ca175ba8a1baf05357ed98d5728c)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-02 18:23:26 +00:00
Patrick Bellasi
1e58674375 sched/fair: enforce EAS mode
For non latency sensitive tasks the goal is to optimize for energy efficiency.
Thus, we should try our best to avoid moving a task on a CPU which is then
going to be marked as overutilized.

Let's use the capacity_margin metric to verify if a candidate target CPU
should be considered without risking to bail out of EAS mode.

Change-Id: Ib3697106f4073aedf4a6c6ce42bd5d000fa8c007
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit f95753da4ba7e1c1eee0500ffe41b3e5fa68b347)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-02 18:23:06 +00:00
Patrick Bellasi
78ff98b3aa sched/fair: ignore backup CPU when not valid
The find_best_target can sometimes not return a valid backup CPU, either
because it cannot find one or just becasue it returns prev_cpu as a backup.
In these cases we should skip the energy_diff evaluation for the backup CPU.

Change-Id: I3787dbdfe74122348dd7a7485b88c4679051bd32
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit d9bcf5b88594a0225b51878236e49305f272eadc)
[trivial cherry-pick issue]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-02 18:22:44 +00:00
Patrick Bellasi
6abf18bda9 sched/fair: trace energy_diff for non boosted tasks
In systems where SchedTune is enabled, we do not report energy diff for non
boosted tasks. Let's fix this by always genereting an energy_diff event where
however:
  nrg.delta = 0, since we skip energy normalization
  payoff = nrg.diff, since the payoff is defined just by the energy difference

Change-Id: I9a11ec19b6f56da04147f5ae5b47daf1dd180445
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit 13e2e3c7f7d09f619444631a0962fd8020660b8d)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-02 18:22:26 +00:00