Commit Graph

23929 Commits

Author SHA1 Message Date
Peter Zijlstra
4b9300bf83 UPSTREAM: sched/core: Add missing update_rq_clock() in post_init_entity_util_avg()
Address this rq-clock update bug:

  WARNING: CPU: 0 PID: 0 at ../kernel/sched/sched.h:797 post_init_entity_util_avg()
  rq->clock_update_flags < RQCF_ACT_SKIP

  Call Trace:
    __warn()
    post_init_entity_util_avg()
    wake_up_new_task()
    _do_fork()
    kernel_thread()
    rest_init()
    start_kernel()

Change-Id: I0d8b0c83b19447dc53ebd397bb14f756cb97d8a3
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 4126bad671)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-02 18:20:00 +00:00
Vincent Guittot
4604331553 UPSTREAM: sched/fair: Fix task group initialization
The moves of tasks are now propagated down to root and the utilization
of cfs_rq reflects reality so it doesn't need to be estimated at init.

Change-Id: I496f26d22cbbe38d78d683f7eeb389b997eb9035
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: kernellwp@gmail.com
Cc: pjt@google.com
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1478598827-32372-7-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit d03266910a)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-02 18:19:24 +00:00
Chris Redpath
d870d26fb3 cpufreq/sched: Consider max cpu capacity when choosing frequencies
When using schedfreq on cpus with max capacity significantly smaller than
1024, the tick update uses non-normalised capacities - this leads to
selecting an incorrect OPP as we were scaling the frequency as if the
max capacity achievable was 1024 rather than the max for that particular
cpu or group. This could result in a cpu being stuck at the lowest OPP
and unable to generate enough utilisation to climb out if the max
capacity is significantly smaller than 1024.

Instead, normalize the capacity to be in the range 0-1024 in the tick
so that when we later select a frequency, we get the correct one.

Also comments updated to be clearer about what is needed.

Change-Id: Id84391c7ac015311002ada21813a353ee13bee60
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit bc3b0db024f3d0b323e7d93994655e9e3d9f8d68)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-02 11:15:14 -07:00
Chris Redpath
22f1127850 cpufreq/sched: Use cpu max freq rather than policy max
When we convert capacity into frequency, we used policy->max to get
the max freq of the cpu. Since this can be changed by userspace policy
or thermal events, we are potentially asking for a lower frequency
than the utilization demands.

Change over to using cpuinfo.max which is the max freq supported by
that cpu rather than the currently-chosen max. Frequency granted still
honours the max policy.

Tested by setting a userspace policy and observing the relevant vars
in a trace. In this instance, we ask for around 1ghz instead of 620MHz.

freq_new=1013512
unfixed_freq_new=624487
capacity=546
cpuinfo_max=1900800
policy_max=1171200

Change-Id: I8c5694db42243c6fb78bb9be9046b06ac81295e7
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit 114c6ceb2e5b0e30442e2ba5f78cfaedf76bf1f9)
[trivial cherry-pick issue]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-02 11:15:14 -07:00
Dietmar Eggemann
dad887ffaf sched/fair: remove erroneous RCU_LOCKDEP_WARN from start_cpu()
Fixes: https://bugs.linaro.org/show_bug.cgi?id=3075

Change-Id: I62d714fc4b9366a9b2535649aa92d1edc840cf94
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit 2617b1391f69dfbd6ffc340c235059107745aac2)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-02 11:15:14 -07:00
Quentin Perret
5783838362 Merge branch 'android-4.9' into android-4.9-eas-dev
Change-Id: Ic2303d3a53ca8dba9306fdc4f26ee77662e7534f
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
2017-11-01 10:34:19 +00:00
Abhilash Kesavan
3861f0b0f1 ANDROID: sched/walt: Fix divide by zero error in cpufreq notifier
On systems using ACPI cpufreq, at boot-time only CPU0 max_freq has been
populated when load_scale_factor/capacity recomputation is triggered on
all possible cpus. This leads to the following divide by zero panic when
rq->max_freq for any cpu other than CPU0 is accessed:

[    4.827597]  divide error: 0000 [#1] PREEMPT SMP
[    4.827604]  [<ffffffff862ad88e>] ? notifier_call_chain+0x37/0x57
[    4.827607]  [<ffffffff862adaf3>] ? __blocking_notifier_call_chain+0x46/0x5c
[    4.827611]  [<ffffffff8684b9f4>] ? cpufreq_set_policy+0xa6/0x2dc
[    4.827614]  [<ffffffff869e4d55>] ? cpufreq_init_policy+0xba/0xde
[    4.827617]  [<ffffffff8684bd4c>] ? cpufreq_update_policy+0x122/0x122
[    4.827620]  [<ffffffff8684c51c>] ? cpufreq_online+0x5a7/0x64e
[    4.827623]  [<ffffffff8684c63b>] ? cpufreq_add_dev+0x78/0xed
[    4.827627]  [<ffffffff866ce412>] ? subsys_interface_register+0xd6/0x117
[    4.827630]  [<ffffffff8684b410>] ? cpufreq_register_driver+0x168/0x2b2
[    4.827632]  [<ffffffff8684b410>] ? cpufreq_register_driver+0x168/0x2b2
[    4.827636]  [<ffffffff86f83b2e>] ? cpufreq_gov_dbs_init+0xc/0xc
[    4.827639]  [<ffffffff86f83d43>] ? acpi_cpufreq_init+0x215/0x282
[    4.827641]  [<ffffffff86f83b2e>] ? cpufreq_gov_dbs_init+0xc/0xc
[    4.827645]  [<ffffffff8620046e>] ? do_one_initcall+0x9c/0x12e
[    4.827649]  [<ffffffff86f3d058>] ? kernel_init_freeable+0x190/0x22b
[    4.827651]  [<ffffffff869e59d9>] ? rest_init+0x7c/0x7c
[    4.827653]  [<ffffffff869e59e3>] ? kernel_init+0xa/0xf3
[    4.827656]  [<ffffffff869ed482>] ? ret_from_fork+0x22/0x30

On ACPI based systems, for CPUs with uninitialized rq->max_freq skip the
re-calculation.

Reference discussion on a similar issue here:
https://www.spinics.net/lists/arm-kernel/msg557612.html

Change-Id: I5025e3f6db671e9e72369f85708d3b0482d5dad2
Signed-off-by: Abhilash Kesavan <abhilash.kesavan@intel.com>
2017-10-27 16:04:52 +00:00
Chris Redpath
185507e31e ANDROID: sched/fair: Select correct capacity state for energy_diff
The util returned from group_max_util is not capped at the max util
present in the group, so it can be larger than the capacity stored in
the array. Ensure that when this happens, we always use the last entry
in the array to fetch energy from.

Tested with synthetics on Juno board.

Change-Id: I89fb52fb7e68fa3e682e308acc232596672d03f7
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-26 15:17:49 +01:00
Rafael J. Wysocki
53d56d4130 BACKPORT: cpufreq: schedutil: Use policy-dependent transition delays
Make the schedutil governor take the initial (default) value of the
rate_limit_us sysfs attribute from the (new) transition_delay_us
policy parameter (to be set by the scaling driver).

That will allow scaling drivers to make schedutil use smaller default
values of rate_limit_us and reduce the default average time interval
between consecutive frequency changes.

Make intel_pstate set transition_delay_us to 500.

BACKPORT: Modified to support the separate up_rate_limit_us and
down_rate_limit_us (upstream just has a single rate_limit_us). Also
dropped the changes for intel_pstate as there's a merge conflict.

Change-Id: I62a8543879a4d8582cdcb31ebd55607705d1c8b1
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
(cherry picked from commit 1b72e7fd30)
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-10-26 02:26:34 +00:00
Greg Kroah-Hartman
f108c7d9b5 Merge 4.9.58 into android-4.9
Changes in 4.9.58
	MIPS: Fix minimum alignment requirement of IRQ stack
	Revert "bsg-lib: don't free job in bsg_prepare_job"
	xen-netback: Use GFP_ATOMIC to allocate hash
	locking/lockdep: Add nest_lock integrity test
	watchdog: kempld: fix gcc-4.3 build
	irqchip/crossbar: Fix incorrect type of local variables
	initramfs: finish fput() before accessing any binary from initramfs
	mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length
	ALSA: hda: Add Geminilake HDMI codec ID
	qed: Don't use attention PTT for configuring BW
	mac80211: fix power saving clients handling in iwlwifi
	net/mlx4_en: fix overflow in mlx4_en_init_timestamp()
	staging: vchiq_2835_arm: Make cache-line-size a required DT property
	netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value.
	iio: adc: xilinx: Fix error handling
	f2fs: do SSR for data when there is enough free space
	sched/fair: Update rq clock before changing a task's CPU affinity
	Btrfs: send, fix failure to rename top level inode due to name collision
	f2fs: do not wait for writeback in write_begin
	md/linear: shutup lockdep warnning
	sparc64: Migrate hvcons irq to panicked cpu
	net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs
	crypto: xts - Add ECB dependency
	mm/memory_hotplug: set magic number to page->freelist instead of page->lru.next
	ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
	slub: do not merge cache if slub_debug contains a never-merge flag
	scsi: scsi_dh_emc: return success in clariion_std_inquiry()
	ASoC: mediatek: add I2C dependency for CS42XX8
	drm/amdgpu: refuse to reserve io mem for split VRAM buffers
	net: mvpp2: release reference to txq_cpu[] entry after unmapping
	qede: Prevent index problems in loopback test
	qed: Reserve doorbell BAR space for present CPUs
	qed: Read queue state before releasing buffer
	i2c: at91: ensure state is restored after suspending
	ceph: don't update_dentry_lease unless we actually got one
	ceph: fix bogus endianness change in ceph_ioctl_set_layout
	ceph: clean up unsafe d_parent accesses in build_dentry_path
	uapi: fix linux/rds.h userspace compilation errors
	uapi: fix linux/mroute6.h userspace compilation errors
	IB/hfi1: Use static CTLE with Preset 6 for integrated HFIs
	IB/hfi1: Allocate context data on memory node
	target/iscsi: Fix unsolicited data seq_end_offset calculation
	hrtimer: Catch invalid clockids again
	nfsd/callback: Cleanup callback cred on shutdown
	powerpc/perf: Add restrictions to PMC5 in power9 DD1
	drm/nouveau/gr/gf100-: fix ccache error logging
	regulator: core: Resolve supplies before disabling unused regulators
	btmrvl: avoid double-disable_irq() race
	EDAC, mce_amd: Print IPID and Syndrome on a separate line
	cpufreq: CPPC: add ACPI_PROCESSOR dependency
	usb: dwc3: gadget: Correct ISOC DATA PIDs for short packets
	Linux 4.9.58

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-23 09:35:27 +02:00
Marc Zyngier
0c92e73293 hrtimer: Catch invalid clockids again
[ Upstream commit 336a9cde10 ]

commit 82e88ff1ea ("hrtimer: Revert CLOCK_MONOTONIC_RAW support") removed
unfortunately a sanity check in the hrtimer code which was part of that
MONOTONIC_RAW patch series.

It would have caught the bogus usage of CLOCK_MONOTONIC_RAW in the wireless
code. So bring it back.

It is way too easy to take any random clockid and feed it to the hrtimer
subsystem. At best, it gets mapped to a monotonic base, but it would be
better to just catch illegal values as early as possible.

Detect invalid clockids, map them to CLOCK_MONOTONIC and emit a warning.

[ tglx: Replaced the BUG by a WARN and gracefully map to CLOCK_MONOTONIC ]

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Tomasz Nowicki <tn@semihalf.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Link: http://lkml.kernel.org/r/1452879670-16133-3-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:21:38 +02:00
Wanpeng Li
ab3d531745 sched/fair: Update rq clock before changing a task's CPU affinity
[ Upstream commit a499c3ead8 ]

This is triggered during boot when CONFIG_SCHED_DEBUG is enabled:

 ------------[ cut here ]------------
 WARNING: CPU: 6 PID: 81 at kernel/sched/sched.h:812 set_next_entity+0x11d/0x380
 rq->clock_update_flags < RQCF_ACT_SKIP
 CPU: 6 PID: 81 Comm: torture_shuffle Not tainted 4.10.0+ #1
 Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
 Call Trace:
  dump_stack+0x85/0xc2
  __warn+0xcb/0xf0
  warn_slowpath_fmt+0x5f/0x80
  set_next_entity+0x11d/0x380
  set_curr_task_fair+0x2b/0x60
  do_set_cpus_allowed+0x139/0x180
  __set_cpus_allowed_ptr+0x113/0x260
  set_cpus_allowed_ptr+0x10/0x20
  torture_shuffle+0xfd/0x180
  kthread+0x10f/0x150
  ? torture_shutdown_init+0x60/0x60
  ? kthread_create_on_node+0x60/0x60
  ret_from_fork+0x31/0x40
 ---[ end trace dd94d92344cea9c6 ]---

The task is running && !queued, so there is no rq clock update before calling
set_curr_task().

This patch fixes it by updating rq clock after holding rq->lock/pi_lock
just as what other dequeue + put_prev + enqueue + set_curr story does.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1487749975-5994-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:21:35 +02:00
Peter Zijlstra
8b0be545de locking/lockdep: Add nest_lock integrity test
[ Upstream commit 7fb4a2cea6 ]

Boqun reported that hlock->references can overflow. Add a debug test
for that to generate a clear error when this happens.

Without this, lockdep is likely to report a mysterious failure on
unlock.

Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:21:33 +02:00
Greg Kroah-Hartman
b86d2b1467 Merge 4.9.57 into android-4.9
Changes in 4.9.57
	ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets
	CIFS: Reconnect expired SMB sessions
	nl80211: Define policy for packet pattern attributes
	rcu: Allow for page faults in NMI handlers
	USB: dummy-hcd: Fix deadlock caused by disconnect detection
	MIPS: math-emu: Remove pr_err() calls from fpu_emu()
	dmaengine: edma: Align the memcpy acnt array size with the transfer
	dmaengine: ti-dma-crossbar: Fix possible race condition with dma_inuse
	HID: usbhid: fix out-of-bounds bug
	crypto: shash - Fix zero-length shash ahash digest crash
	KVM: MMU: always terminate page walks at level 1
	KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit
	usb: renesas_usbhs: Fix DMAC sequence for receiving zero-length packet
	pinctrl/amd: Fix build dependency on pinmux code
	iommu/amd: Finish TLB flush in amd_iommu_unmap()
	device property: Track owner device of device property
	fs/mpage.c: fix mpage_writepage() for pages with buffers
	ALSA: usb-audio: Kill stray URB at exiting
	ALSA: seq: Fix use-after-free at creating a port
	ALSA: seq: Fix copy_from_user() call inside lock
	ALSA: caiaq: Fix stray URB at probe error path
	ALSA: line6: Fix missing initialization before error path
	ALSA: line6: Fix leftover URB at error-path during probe
	drm/i915/edp: Get the Panel Power Off timestamp after panel is off
	drm/i915: Read timings from the correct transcoder in intel_crtc_mode_get()
	drm/i915/bios: parse DDI ports also for CHV for HDMI DDC pin and DP AUX channel
	usb: gadget: configfs: Fix memory leak of interface directory data
	usb: gadget: composite: Fix use-after-free in usb_composite_overwrite_options
	direct-io: Prevent NULL pointer access in submit_page_section
	fix unbalanced page refcounting in bio_map_user_iov
	more bio_map_user_iov() leak fixes
	bio_copy_user_iov(): don't ignore ->iov_offset
	USB: serial: ftdi_sio: add id for Cypress WICED dev board
	USB: serial: cp210x: add support for ELV TFD500
	USB: serial: option: add support for TP-Link LTE module
	USB: serial: qcserial: add Dell DW5818, DW5819
	USB: serial: console: fix use-after-free after failed setup
	x86/alternatives: Fix alt_max_short macro to really be a max()
	KVM: nVMX: update last_nonleaf_level when initializing nested EPT
	Linux 4.9.57

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-18 09:45:37 +02:00
Paul E. McKenney
97535791d8 rcu: Allow for page faults in NMI handlers
commit 28585a8326 upstream.

A number of architecture invoke rcu_irq_enter() on exception entry in
order to allow RCU read-side critical sections in the exception handler
when the exception is from an idle or nohz_full CPU.  This works, at
least unless the exception happens in an NMI handler.  In that case,
rcu_nmi_enter() would already have exited the extended quiescent state,
which would mean that rcu_irq_enter() would (incorrectly) cause RCU
to think that it is again in an extended quiescent state.  This will
in turn result in lockdep splats in response to later RCU read-side
critical sections.

This commit therefore causes rcu_irq_enter() and rcu_irq_exit() to
take no action if there is an rcu_nmi_enter() in effect, thus avoiding
the unscheduled return to RCU quiescent state.  This in turn should
make the kernel safe for on-demand RCU voyeurism.

Link: http://lkml.kernel.org/r/20170922211022.GA18084@linux.vnet.ibm.com

Fixes: 0be964be0 ("module: Sanitize RCU usage and locking")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-18 09:35:38 +02:00
Brendan Jackman
55f81c1c68 UPSTREAM: sched/fair: Fix usage of find_idlest_group() when the local group is idlest
find_idlest_group() returns NULL when the local group is idlest. The
caller then continues the find_idlest_group() search at a lower level
of the current CPU's sched_domain hierarchy. find_idlest_group_cpu() is
not consulted and, crucially, @new_cpu is not updated. This means the
search is pointless and we return @prev_cpu from select_task_rq_fair().

This is fixed by initialising @new_cpu to @cpu instead of @prev_cpu.

Change-Id: I8644a35179b00d4833befe237aa443ec840046ec
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-6-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 93f50f9024)
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-10-13 13:31:44 +01:00
Brendan Jackman
a8c3c11c0e UPSTREAM: sched/fair: Fix usage of find_idlest_group() when no groups are allowed
When 'p' is not allowed on any of the CPUs in the sched_domain, we
currently return NULL from find_idlest_group(), and pointlessly
continue the search on lower sched_domain levels (where 'p' is also not
allowed) before returning prev_cpu regardless (as we have not updated
new_cpu).

Add an explicit check for this case, and add a comment to
find_idlest_group(). Now when find_idlest_group() returns NULL, it always
means that the local group is allowed and idlest.

Change-Id: I900538644e62dab68f1cfddb59d41846d991ce09
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-5-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 6fee85ccbc)
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-10-13 13:31:43 +01:00
Brendan Jackman
13c4d4fdca UPSTREAM: sched/fair: Fix find_idlest_group() when local group is not allowed
When the local group is not allowed we do not modify this_*_load from
their initial value of 0. That means that the load checks at the end
of find_idlest_group cause us to incorrectly return NULL. Fixing the
initial values to ULONG_MAX means we will instead return the idlest
remote group in that case.

Change-Id: I4cb6269a653205b1d1ba6dce1ca42359faf85362
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-4-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 0d10ab952e)
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-10-13 13:31:42 +01:00
Brendan Jackman
da6485cee9 UPSTREAM: sched/fair: Remove unnecessary comparison with -1
Since commit:

  83a0a96a5f ("sched/fair: Leverage the idle state info when choosing the "idlest" cpu")

find_idlest_group_cpu() (formerly find_idlest_cpu) no longer returns -1,
so we can simplify the checking of the return value in find_idlest_cpu().

Change-Id: I43828722863cf286971c92fea1f3019f00b8e627
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-3-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit e90381eaec)
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-10-13 13:31:42 +01:00
Brendan Jackman
6cc75c9bd2 UPSTREAM: sched/fair: Move select_task_rq_fair() slow-path into its own function
In preparation for changes that would otherwise require adding a new
level of indentation to the while(sd) loop, create a new function
find_idlest_cpu() which contains this loop, and rename the existing
find_idlest_cpu() to find_idlest_group_cpu().

Code inside the while(sd) loop is unchanged. @new_cpu is added as a
variable in the new function, with the same initial value as the
@new_cpu in select_task_rq_fair().

Change-Id: Ic08cc71eb4d0408b8987b2b15fcb7326956d7e81
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-2-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 18bd1b4bd5)
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-10-13 13:31:41 +01:00
Brendan Jackman
e019d8f5b5 UPSTREAM: sched/fair: Force balancing on NOHZ balance if local group has capacity
The "goto force_balance" here is intended to mitigate the fact that
avg_load calculations can result in bad placement decisions when
priority is asymmetrical.

The original commit that adds it:

  fab476228b ("sched: Force balancing on newidle balance if local group has capacity")

explains:

    Under certain situations, such as a niced down task (i.e. nice =
    -15) in the presence of nr_cpus NICE0 tasks, the niced task lands
    on a sched group and kicks away other tasks because of its large
    weight. This leads to sub-optimal utilization of the
    machine. Even though the sched group has capacity, it does not
    pull tasks because sds.this_load >> sds.max_load, and f_b_g()
    returns NULL.

A similar but inverted issue also affects ARM big.LITTLE (asymmetrical CPU
capacity) systems - consider 8 always-running, same-priority tasks on a
system with 4 "big" and 4 "little" CPUs. Suppose that 5 of them end up on
the "big" CPUs (which will be represented by one sched_group in the DIE
sched_domain) and 3 on the "little" (the other sched_group in DIE), leaving
one CPU unused. Because the "big" group has a higher group_capacity its
avg_load may not present an imbalance that would cause migrating a
task to the idle "little".

The force_balance case here solves the problem but currently only for
CPU_NEWLY_IDLE balances, which in theory might never happen on the
unused CPU. Including CPU_IDLE in the force_balance case means
there's an upper bound on the time before we can attempt to solve the
underutilization: after DIE's sd->balance_interval has passed the
next nohz balance kick will help us out.

Change-Id: I4cbc8ced11e6a56f98ec67633e67e09cbb515268
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170807163900.25180-1-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 583ffd99d7)
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-10-13 13:31:28 +01:00
Vincent Guittot
08c53a651c UPSTREAM: sched: use load_avg for selecting idlest group
find_idlest_group() only compares the runnable_load_avg when looking for
the least loaded group. But on fork intensive use case like hackbench
where tasks blocked quickly after the fork, this can lead to selecting the
same CPU instead of other CPUs, which have similar runnable load but a
lower load_avg.

When the runnable_load_avg of 2 CPUs are close, we now take into account
the amount of blocked load as a 2nd selection factor. There is now 3 zones
for the runnable_load of the rq:
-[0 .. (runnable_load - imbalance)] : Select the new rq which has
significantly less runnable_load
-](runnable_load - imbalance) .. (runnable_load + imbalance)[ : The
runnable loads are close so we use load_avg to chose between the 2 rq
-[(runnable_load + imbalance) .. ULONG_MAX] : Keep the current rq which
has significantly less runnable_load

The scale factor that is currently used for comparing runnable_load,
doesn't work well with small value. As an example, the use of a scaling
factor fails as soon as this_runnable_load == 0 because we always select
local rq even if min_runnable_load is only 1, which doesn't really make
sense because they are just the same. So instead of scaling factor, we use
an absolute margin for runnable_load to detect CPUs with similar
runnable_load and we keep using scaling factor for blocked load.

For use case like hackbench, this enable the scheduler to select different
CPUs during the fork sequence and to spread tasks across the system.

Tests have been done on a Hikey board (ARM based octo cores) for several
kernel. The result below gives min, max, avg and stdev values of 18 runs
with each configuration.

The v4.8+patches configuration also includes the changes below which is
part of the proposal made by Peter to ensure that the clock will be up to
date when the fork task will be attached to the rq.

@@ -2568,6 +2568,7 @@ void wake_up_new_task(struct task_struct *p)
 	__set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
 #endif
 	rq = __task_rq_lock(p, &rf);
+	update_rq_clock(rq);
 	post_init_entity_util_avg(&p->se);

 	activate_task(rq, p, 0);

hackbench -P -g 1

       ea86cb4b76  7dc603c902  v4.8        v4.8+patches
min    0.049         0.050         0.051       0,048
avg    0.057         0.057(0%)     0.057(0%)   0,055(+5%)
max    0.066         0.068         0.070       0,063
stdev  +/-9%         +/-9%         +/-8%       +/-9%

Change-Id: I8fe40bbd106454282b1963db825705f3cf377b9e
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
2017-10-12 16:41:43 -07:00
Vincent Guittot
ff634002ac UPSTREAM: sched: fix find_idlest_group for fork
During fork, the utilization of a task is init once the rq has been
selected because the current utilization level of the rq is used to set
the utilization of the fork task. As the task's utilization is still
null at this step of the fork sequence, it doesn't make sense to look for
some spare capacity that can fit the task's utilization.
Furthermore, I can see perf regressions for the test "hackbench -P -g 1"
because the least loaded policy is always bypassed and tasks are not
spread during fork.

With this patch and the fix below, we are back to same performances as
for v4.8. The fix below is only a temporary one used for the test until a
smarter solution is found because we can't simply remove the test which is
useful for others benchmarks

@@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t

 	avg_cost = this_sd->avg_scan_cost;

-	/*
-	 * Due to large variance we need a large fuzz factor; hackbench in
-	 * particularly is sensitive here.
-	 */
-	if ((avg_idle / 512) < avg_cost)
-		return -1;
-
 	time = local_clock();

 	for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) {

Change-Id: If7dd1c2a81fd74419733b8ecff519ae8cb8eb32d
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
2017-10-12 16:41:35 -07:00
Greg Kroah-Hartman
cdbe07ad26 Merge 4.9.55 into android-4.9
Changes in 4.9.55
	USB: gadgetfs: Fix crash caused by inadequate synchronization
	USB: gadgetfs: fix copy_to_user while holding spinlock
	usb: gadget: udc: atmel: set vbus irqflags explicitly
	usb: gadget: udc: renesas_usb3: fix for no-data control transfer
	usb: gadget: udc: renesas_usb3: fix Pn_RAMMAP.Pn_MPKT value
	usb: gadget: udc: renesas_usb3: Fix return value of usb3_write_pipe()
	usb-storage: unusual_devs entry to fix write-access regression for Seagate external drives
	usb-storage: fix bogus hardware error messages for ATA pass-thru devices
	usb: renesas_usbhs: fix the BCLR setting condition for non-DCP pipe
	usb: renesas_usbhs: fix usbhsf_fifo_clear() for RX direction
	ALSA: usb-audio: Check out-of-bounds access by corrupted buffer descriptor
	usb: pci-quirks.c: Corrected timeout values used in handshake
	USB: cdc-wdm: ignore -EPIPE from GetEncapsulatedResponse
	USB: dummy-hcd: fix connection failures (wrong speed)
	USB: dummy-hcd: fix infinite-loop resubmission bug
	USB: dummy-hcd: Fix erroneous synchronization change
	USB: devio: Don't corrupt user memory
	usb: gadget: mass_storage: set msg_registered after msg registered
	USB: g_mass_storage: Fix deadlock when driver is unbound
	USB: uas: fix bug in handling of alternate settings
	USB: core: harden cdc_parse_cdc_header
	usb: Increase quirk delay for USB devices
	USB: fix out-of-bounds in usb_set_configuration
	xhci: fix finding correct bus_state structure for USB 3.1 hosts
	xhci: Fix sleeping with spin_lock_irq() held in ASmedia 1042A workaround
	xhci: set missing SuperSpeedPlus Link Protocol bit in roothub descriptor
	Revert "xhci: Limit USB2 port wake support for AMD Promontory hosts"
	iio: adc: twl4030: Fix an error handling path in 'twl4030_madc_probe()'
	iio: adc: twl4030: Disable the vusb3v1 rugulator in the error handling path of 'twl4030_madc_probe()'
	iio: ad_sigma_delta: Implement a dedicated reset function
	staging: iio: ad7192: Fix - use the dedicated reset function avoiding dma from stack.
	iio: core: Return error for failed read_reg
	IIO: BME280: Updates to Humidity readings need ctrl_reg write!
	iio: ad7793: Fix the serial interface reset
	iio: adc: mcp320x: Fix readout of negative voltages
	iio: adc: mcp320x: Fix oops on module unload
	uwb: properly check kthread_run return value
	uwb: ensure that endpoint is interrupt
	staging: vchiq_2835_arm: Fix NULL ptr dereference in free_pagelist
	mm, oom_reaper: skip mm structs with mmu notifiers
	lib/ratelimit.c: use deferred printk() version
	lsm: fix smack_inode_removexattr and xattr_getsecurity memleak
	ALSA: compress: Remove unused variable
	Revert "ALSA: echoaudio: purge contradictions between dimension matrix members and total number of members"
	ALSA: usx2y: Suppress kernel warning at page allocation failures
	mlxsw: spectrum: Prevent mirred-related crash on removal
	net: sched: fix use-after-free in tcf_action_destroy and tcf_del_walker
	sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
	tcp: update skb->skb_mstamp more carefully
	bpf/verifier: reject BPF_ALU64|BPF_END
	tcp: fix data delivery rate
	udpv6: Fix the checksum computation when HW checksum does not apply
	ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header
	net: phy: Fix mask value write on gmii2rgmii converter speed register
	ip6_tunnel: do not allow loading ip6_tunnel if ipv6 is disabled in cmdline
	net/sched: cls_matchall: fix crash when used with classful qdisc
	tcp: fastopen: fix on syn-data transmit failure
	net: emac: Fix napi poll list corruption
	packet: hold bind lock when rebinding to fanout hook
	bpf: one perf event close won't free bpf program attached by another perf event
	isdn/i4l: fetch the ppp_write buffer in one shot
	net_sched: always reset qdisc backlog in qdisc_reset()
	net: qcom/emac: specify the correct size when mapping a DMA buffer
	vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit
	l2tp: Avoid schedule while atomic in exit_net
	l2tp: fix race condition in l2tp_tunnel_delete
	tun: bail out from tun_get_user() if the skb is empty
	net: dsa: Fix network device registration order
	packet: in packet_do_bind, test fanout with bind_lock held
	packet: only test po->has_vnet_hdr once in packet_snd
	net: Set sk_prot_creator when cloning sockets to the right proto
	netlink: do not proceed if dump's start() errs
	ip6_gre: ip6gre_tap device should keep dst
	ip6_tunnel: update mtu properly for ARPHRD_ETHER tunnel device in tx path
	tipc: use only positive error codes in messages
	net: rtnetlink: fix info leak in RTM_GETSTATS call
	socket, bpf: fix possible use after free
	powerpc/64s: Use emergency stack for kernel TM Bad Thing program checks
	powerpc/tm: Fix illegal TM state in signal handler
	percpu: make this_cpu_generic_read() atomic w.r.t. interrupts
	driver core: platform: Don't read past the end of "driver_override" buffer
	Drivers: hv: fcopy: restore correct transfer length
	stm class: Fix a use-after-free
	ftrace: Fix kmemleak in unregister_ftrace_graph
	HID: i2c-hid: allocate hid buffers for real worst case
	HID: wacom: leds: Don't try to control the EKR's read-only LEDs
	HID: wacom: Always increment hdev refcount within wacom_get_hdev_data
	HID: wacom: bits shifted too much for 9th and 10th buttons
	rocker: fix rocker_tlv_put_* functions for KASAN
	netlink: fix nla_put_{u8,u16,u32} for KASAN
	iwlwifi: mvm: use IWL_HCMD_NOCOPY for MCAST_FILTER_CMD
	iwlwifi: add workaround to disable wide channels in 5GHz
	scsi: sd: Do not override max_sectors_kb sysfs setting
	brcmfmac: add length check in brcmf_cfg80211_escan_handler()
	brcmfmac: setup passive scan if requested by user-space
	drm/i915/bios: ignore HDMI on port A
	nvme-pci: Use PCI bus address for data/queues in CMB
	mmc: core: add driver strength selection when selecting hs400es
	sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs
	vfs: deny copy_file_range() for non regular files
	ext4: fix data corruption for mmap writes
	ext4: Don't clear SGID when inheriting ACLs
	ext4: don't allow encrypted operations without keys
	f2fs: don't allow encrypted operations without keys
	KVM: x86: fix singlestepping over syscall
	Linux 4.9.55

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-12 22:31:24 +02:00
Peter Zijlstra
ba15518c26 sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs
commit 50e7663233 upstream.

Cpusets vs. suspend-resume is _completely_ broken. And it got noticed
because it now resulted in non-cpuset usage breaking too.

On suspend cpuset_cpu_inactive() doesn't call into
cpuset_update_active_cpus() because it doesn't want to move tasks about,
there is no need, all tasks are frozen and won't run again until after
we've resumed everything.

But this means that when we finally do call into
cpuset_update_active_cpus() after resuming the last frozen cpu in
cpuset_cpu_active(), the top_cpuset will not have any difference with
the cpu_active_mask and this it will not in fact do _anything_.

So the cpuset configuration will not be restored. This was largely
hidden because we would unconditionally create identity domains and
mobile users would not in fact use cpusets much. And servers what do use
cpusets tend to not suspend-resume much.

An addition problem is that we'd not in fact wait for the cpuset work to
finish before resuming the tasks, allowing spurious migrations outside
of the specified domains.

Fix the rebuild by introducing cpuset_force_rebuild() and fix the
ordering with cpuset_wait_for_hotplug().

Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: deb7aa308e ("cpuset: reorganize CPU / memory hotplug handling")
Link: http://lkml.kernel.org/r/20170907091338.orwxrqkbfkki3c24@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12 11:51:25 +02:00
Shu Wang
a3ec104976 ftrace: Fix kmemleak in unregister_ftrace_graph
commit 2b0b8499ae upstream.

The trampoline allocated by function tracer was overwriten by function_graph
tracer, and caused a memory leak. The save_global_trampoline should have
saved the previous trampoline in register_ftrace_graph() and restored it in
unregister_ftrace_graph(). But as it is implemented, save_global_trampoline was
only used in unregister_ftrace_graph as default value 0, and it overwrote the
previous trampoline's value. Causing the previous allocated trampoline to be
lost.

kmmeleak backtrace:
    kmemleak_vmalloc+0x77/0xc0
    __vmalloc_node_range+0x1b5/0x2c0
    module_alloc+0x7c/0xd0
    arch_ftrace_update_trampoline+0xb5/0x290
    ftrace_startup+0x78/0x210
    register_ftrace_function+0x8b/0xd0
    function_trace_init+0x4f/0x80
    tracing_set_tracer+0xe6/0x170
    tracing_set_trace_write+0x90/0xd0
    __vfs_write+0x37/0x170
    vfs_write+0xb2/0x1b0
    SyS_write+0x55/0xc0
    do_syscall_64+0x67/0x180
    return_from_SYSCALL_64+0x0/0x6a

[
  Looking further into this, I found that this was left over from when the
  function and function graph tracers shared the same ftrace_ops. But in
  commit 5f151b2401 ("ftrace: Fix function_profiler and function tracer
  together"), the two were separated, and the save_global_trampoline no
  longer was necessary (and it may have been broken back then too).
  -- Steven Rostedt
]

Link: http://lkml.kernel.org/r/20170912021454.5976-1-shuwang@redhat.com

Fixes: 5f151b2401 ("ftrace: Fix function_profiler and function tracer together")
Signed-off-by: Shu Wang <shuwang@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12 11:51:24 +02:00
Yonghong Song
0dee549f79 bpf: one perf event close won't free bpf program attached by another perf event
[ Upstream commit ec9dd352d5 ]

This patch fixes a bug exhibited by the following scenario:
  1. fd1 = perf_event_open with attr.config = ID1
  2. attach bpf program prog1 to fd1
  3. fd2 = perf_event_open with attr.config = ID1
     <this will be successful>
  4. user program closes fd2 and prog1 is detached from the tracepoint.
  5. user program with fd1 does not work properly as tracepoint
     no output any more.

The issue happens at step 4. Multiple perf_event_open can be called
successfully, but only one bpf prog pointer in the tp_event. In the
current logic, any fd release for the same tp_event will free
the tp_event->prog.

The fix is to free tp_event->prog only when the closing fd
corresponds to the one which registered the program.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12 11:51:21 +02:00
Edward Cree
e159492b3c bpf/verifier: reject BPF_ALU64|BPF_END
[ Upstream commit e67b8a685c ]

Neither ___bpf_prog_run nor the JITs accept it.
Also adds a new test case.

Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12 11:51:20 +02:00
Joel Fernandes
2b3a26c86b FROMLIST: tracing: Add support for preempt and irq enable/disable events
Preempt and irq trace events can be used for tracing the start and
end of an atomic section which can be used by a trace viewer like
systrace to graphically view the start and end of an atomic section and
correlate them with latencies and scheduling issues.

This also serves as a prelude to using synthetic events or probes to
rewrite the preempt and irqsoff tracers, along with numerous benefits of
using trace events features for these events.

Change-Id: I4f992714c0161a415b17a03aa14cce823d8ca3bb
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zilstra <peterz@infradead.org>
Cc: kernel-team@android.com
Link: https://patchwork.kernel.org/patch/9988157/
Signed-off-by: Joel Fernandes <joelaf@google.com>
2017-10-06 16:20:25 -07:00
Joel Fernandes
a7e410c135 FROMLIST: tracing: Prepare to add preempt and irq trace events
In preparation of adding irqsoff and preemptsoff enable and disable trace
events, move required functions and code to make it easier to add these events
in a later patch. This patch is just code movement and no functional change.

Change-Id: I587d411da5efbc4959bcccd7a05c7a66c231e1e0
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@android.com
Link: https://patchwork.kernel.org/patch/9988159/
Signed-off-by: Joel Fernandes <joelaf@google.com>
2017-10-06 16:19:52 -07:00
Greg Kroah-Hartman
379e3b2a6d Merge 4.9.53 into android-4.9
Changes in 4.9.53
	cifs: release cifs root_cred after exit_cifs
	cifs: release auth_key.response for reconnect.
	fs/proc: Report eip/esp in /prod/PID/stat for coredumping
	mac80211: fix VLAN handling with TXQs
	mac80211_hwsim: Use proper TX power
	mac80211: flush hw_roc_start work before cancelling the ROC
	genirq: Make sparse_irq_lock protect what it should protect
	KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()
	KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables list
	tracing: Fix trace_pipe behavior for instance traces
	tracing: Erase irqsoff trace with empty write
	md/raid5: fix a race condition in stripe batch
	md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list
	scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly
	drm/radeon: disable hard reset in hibernate for APUs
	crypto: drbg - fix freeing of resources
	crypto: talitos - Don't provide setkey for non hmac hashing algs.
	crypto: talitos - fix sha224
	crypto: talitos - fix hashing
	security/keys: properly zero out sensitive key material in big_key
	security/keys: rewrite all of big_key crypto
	KEYS: fix writing past end of user-supplied buffer in keyring_read()
	KEYS: prevent creating a different user's keyrings
	KEYS: prevent KEYCTL_READ on negative key
	powerpc/pseries: Fix parent_dn reference leak in add_dt_node()
	powerpc/tm: Flush TM only if CPU has TM feature
	powerpc/ftrace: Pass the correct stack pointer for DYNAMIC_FTRACE_WITH_REGS
	s390/mm: fix write access check in gup_huge_pmd()
	PM: core: Fix device_pm_check_callbacks()
	Fix SMB3.1.1 guest authentication to Samba
	SMB3: Warn user if trying to sign connection that authenticated as guest
	SMB: Validate negotiate (to protect against downgrade) even if signing off
	SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags
	vfs: Return -ENXIO for negative SEEK_HOLE / SEEK_DATA offsets
	nl80211: check for the required netlink attributes presence
	bsg-lib: don't free job in bsg_prepare_job
	iw_cxgb4: remove the stid on listen create failure
	iw_cxgb4: put ep reference in pass_accept_req()
	selftests/seccomp: Support glibc 2.26 siginfo_t.h
	seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter()
	arm64: Make sure SPsel is always set
	arm64: fault: Route pte translation faults via do_translation_fault
	KVM: VMX: extract __pi_post_block
	KVM: VMX: avoid double list add with VT-d posted interrupts
	KVM: VMX: simplify and fix vmx_vcpu_pi_load
	kvm/x86: Handle async PF in RCU read-side critical sections
	KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
	kvm: nVMX: Don't allow L2 to access the hardware CR8
	xfs: validate bdev support for DAX inode flag
	etnaviv: fix gem object list corruption
	PCI: Fix race condition with driver_override
	btrfs: fix NULL pointer dereference from free_reloc_roots()
	btrfs: propagate error to btrfs_cmp_data_prepare caller
	btrfs: prevent to set invalid default subvolid
	x86/mm: Fix fault error path using unsafe vma pointer
	x86/fpu: Don't let userspace set bogus xcomp_bv
	gfs2: Fix debugfs glocks dump
	timer/sysclt: Restrict timer migration sysctl values to 0 and 1
	KVM: VMX: do not change SN bit in vmx_update_pi_irte()
	KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt
	cxl: Fix driver use count
	KVM: VMX: use cmpxchg64
	video: fbdev: aty: do not leak uninitialized padding in clk to userspace
	swiotlb-xen: implement xen_swiotlb_dma_mmap callback
	Linux 4.9.53

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-05 10:37:37 +02:00
Myungho Jung
4c00015385 timer/sysclt: Restrict timer migration sysctl values to 0 and 1
commit b94bf594cf upstream.

timer_migration sysctl acts as a boolean switch, so the allowed values
should be restricted to 0 and 1.

Add the necessary extra fields to the sysctl table entry to enforce that.

[ tglx: Rewrote changelog ]

Signed-off-by: Myungho Jung <mhjungk@gmail.com>
Link: http://lkml.kernel.org/r/1492640690-3550-1-git-send-email-mhjungk@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kazuhiro Hayashi <kazuhiro3.hayashi@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:44:04 +02:00
Oleg Nesterov
be69c4c00a seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter()
commit 66a733ea6b upstream.

As Chris explains, get_seccomp_filter() and put_seccomp_filter() can end
up using different filters. Once we drop ->siglock it is possible for
task->seccomp.filter to have been replaced by SECCOMP_FILTER_FLAG_TSYNC.

Fixes: f8e529ed94 ("seccomp, ptrace: add support for dumping seccomp filters")
Reported-by: Chris Salls <chrissalls5@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[tycho: add __get_seccomp_filter vs. open coding refcount_inc()]
Signed-off-by: Tycho Andersen <tycho@docker.com>
[kees: tweak commit log]
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:44:02 +02:00
Bo Yan
5fb4be27da tracing: Erase irqsoff trace with empty write
commit 8dd33bcb70 upstream.

One convenient way to erase trace is "echo > trace". However, this
is currently broken if the current tracer is irqsoff tracer. This
is because irqsoff tracer use max_buffer as the default trace
buffer.

Set the max_buffer as the one to be cleared when it's the trace
buffer currently in use.

Link: http://lkml.kernel.org/r/1505754215-29411-1-git-send-email-byan@nvidia.com

Cc: <mingo@redhat.com>
Fixes: 4acd4d00f ("tracing: give easy way to clear trace buffer")
Signed-off-by: Bo Yan <byan@nvidia.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:43:59 +02:00
Tahsin Erdogan
97d402e6ee tracing: Fix trace_pipe behavior for instance traces
commit 75df6e688c upstream.

When reading data from trace_pipe, tracing_wait_pipe() performs a
check to see if tracing has been turned off after some data was read.
Currently, this check always looks at global trace state, but it
should be checking the trace instance where trace_pipe is located at.

Because of this bug, cat instances/i1/trace_pipe in the following
script will immediately exit instead of waiting for data:

cd /sys/kernel/debug/tracing
echo 0 > tracing_on
mkdir -p instances/i1
echo 1 > instances/i1/tracing_on
echo 1 > instances/i1/events/sched/sched_process_exec/enable
cat instances/i1/trace_pipe

Link: http://lkml.kernel.org/r/20170917102348.1615-1-tahsin@google.com

Fixes: 10246fa35d ("tracing: give easy way to clear trace buffer")
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:43:59 +02:00
Thomas Gleixner
3d5960c8c6 genirq: Make sparse_irq_lock protect what it should protect
commit 12ac1d0f6c upstream.

for_each_active_irq() iterates the sparse irq allocation bitmap. The caller
must hold sparse_irq_lock. Several code pathes expect that an active bit in
the sparse bitmap also has a valid interrupt descriptor.

Unfortunately that's not true. The (de)allocation is a two step process,
which holds the sparse_irq_lock only across the queue/remove from the radix
tree and the set/clear in the allocation bitmap.

If a iteration locks sparse_irq_lock between the two steps, then it might
see an active bit but the corresponding irq descriptor is NULL. If that is
dereferenced unconditionally, then the kernel oopses. Of course, all
iterator sites could be audited and fixed, but....

There is no reason why the sparse_irq_lock needs to be dropped between the
two steps, in fact the code becomes simpler when the mutex is held across
both and the semantics become more straight forward, so future problems of
missing NULL pointer checks in the iteration are avoided and all existing
sites are fixed in one go.

Expand the lock held sections so both operations are covered and the bitmap
and the radixtree are in sync.

Fixes: a05a900a51 ("genirq: Make sparse_lock a mutex")
Reported-and-tested-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:43:58 +02:00
Steve Muckle
96a28fcc7c ANDROID: add script to fetch android kernel config fragments
The Android kernel config fragments now live in a separate repository.
To prevent others from having to search for this location, add a script
to fetch and unpack the fragments.

Update .gitignore to include these fragments.

Change-Id: If2d4a59b86e4573b0a9b3190025dfe4191870b46
Signed-off-by: Steve Muckle <smuckle@google.com>
2017-10-03 17:19:26 +00:00
Greg Kroah-Hartman
c30c69c76c Merge 4.9.52 into android-4.9
Changes in 4.9.52
	SUNRPC: Refactor svc_set_num_threads()
	NFSv4: Fix callback server shutdown
	mm: prevent double decrease of nr_reserved_highatomic
	orangefs: Don't clear SGID when inheriting ACLs
	IB/{qib, hfi1}: Avoid flow control testing for RDMA write operation
	drm/sun4i: Implement drm_driver lastclose to restore fbdev console
	IB/addr: Fix setting source address in addr6_resolve()
	tty: improve tty_insert_flip_char() fast path
	tty: improve tty_insert_flip_char() slow path
	tty: fix __tty_insert_flip_char regression
	pinctrl/amd: save pin registers over suspend/resume
	Input: i8042 - add Gigabyte P57 to the keyboard reset table
	MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix quiet NaN propagation
	MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix cases of both inputs zero
	MIPS: math-emu: <MAX|MIN>.<D|S>: Fix cases of both inputs negative
	MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of input values with opposite signs
	MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of both infinite inputs
	MIPS: math-emu: MINA.<D|S>: Fix some cases of infinity and zero inputs
	MIPS: math-emu: Handle zero accumulator case in MADDF and MSUBF separately
	MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Fix NaN propagation
	MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Fix some cases of infinite inputs
	MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Fix some cases of zero inputs
	MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Clean up "maddf_flags" enumeration
	MIPS: math-emu: <MADDF|MSUBF>.S: Fix accuracy (32-bit case)
	MIPS: math-emu: <MADDF|MSUBF>.D: Fix accuracy (64-bit case)
	crypto: ccp - Fix XTS-AES-128 support on v5 CCPs
	crypto: AF_ALG - remove SGL terminator indicator when chaining
	ext4: fix incorrect quotaoff if the quota feature is enabled
	ext4: fix quota inconsistency during orphan cleanup for read-only mounts
	powerpc: Fix DAR reporting when alignment handler faults
	block: Relax a check in blk_start_queue()
	md/bitmap: disable bitmap_resize for file-backed bitmaps.
	skd: Avoid that module unloading triggers a use-after-free
	skd: Submit requests to firmware before triggering the doorbell
	scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
	scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path
	scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records
	scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA
	scsi: zfcp: fix missing trace records for early returns in TMF eh handlers
	scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records
	scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response
	scsi: zfcp: trace high part of "new" 64 bit SCSI LUN
	scsi: megaraid_sas: set minimum value of resetwaittime to be 1 secs
	scsi: megaraid_sas: Check valid aen class range to avoid kernel panic
	scsi: megaraid_sas: Return pended IOCTLs with cmd_status MFI_STAT_WRONG_STATE in case adapter is dead
	scsi: storvsc: fix memory leak on ring buffer busy
	scsi: sg: remove 'save_scat_len'
	scsi: sg: use standard lists for sg_requests
	scsi: sg: off by one in sg_ioctl()
	scsi: sg: factor out sg_fill_request_table()
	scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE
	scsi: qla2xxx: Correction to vha->vref_count timeout
	scsi: qla2xxx: Fix an integer overflow in sysfs code
	ftrace: Fix selftest goto location on error
	ftrace: Fix memleak when unregistering dynamic ops when tracing disabled
	tracing: Add barrier to trace_printk() buffer nesting modification
	tracing: Apply trace_clock changes to instance max buffer
	ARC: Re-enable MMU upon Machine Check exception
	PCI: shpchp: Enable bridge bus mastering if MSI is enabled
	PCI: pciehp: Report power fault only once until we clear it
	net/netfilter/nf_conntrack_core: Fix net_conntrack_lock()
	s390/mm: fix local TLB flushing vs. detach of an mm address space
	s390/mm: fix race on mm->context.flush_mm
	media: v4l2-compat-ioctl32: Fix timespec conversion
	media: uvcvideo: Prevent heap overflow when accessing mapped controls
	PM / devfreq: Fix memory leak when fail to register device
	bcache: initialize dirty stripes in flash_dev_run()
	bcache: Fix leak of bdev reference
	bcache: do not subtract sectors_to_gc for bypassed IO
	bcache: correct cache_dirty_target in __update_writeback_rate()
	bcache: Correct return value for sysfs attach errors
	bcache: fix for gc and write-back race
	bcache: fix bch_hprint crash and improve output
	Linux 4.9.52

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-09-27 14:56:06 +02:00
Baohong Liu
cf052336d0 tracing: Apply trace_clock changes to instance max buffer
commit 170b3b1050 upstream.

Currently trace_clock timestamps are applied to both regular and max
buffers only for global trace. For instance trace, trace_clock
timestamps are applied only to regular buffer. But, regular and max
buffers can be swapped, for example, following a snapshot. So, for
instance trace, bad timestamps can be seen following a snapshot.
Let's apply trace_clock timestamps to instance max buffer as well.

Link: http://lkml.kernel.org/r/ebdb168d0be042dcdf51f81e696b17fabe3609c1.1504642143.git.tom.zanussi@linux.intel.com

Fixes: 277ba0446 ("tracing: Add interface to allow multiple trace buffers")
Signed-off-by: Baohong Liu <baohong.liu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 14:39:23 +02:00
Steven Rostedt (VMware)
96cf918df4 tracing: Add barrier to trace_printk() buffer nesting modification
commit 3d9622c12c upstream.

trace_printk() uses 4 buffers, one for each context (normal, softirq, irq
and NMI), such that it does not need to worry about one context preempting
the other. There's a nesting counter that gets incremented to figure out
which buffer to use. If the context gets preempted by another context which
calls trace_printk() it will increment the counter and use the next buffer,
and restore the counter when it is finished.

The problem is that gcc may optimize the modification of the buffer nesting
counter and it may not be incremented in memory before the buffer is used.
If this happens, and the context gets interrupted by another context, it
could pick the same buffer and corrupt the one that is being used.

Compiler barriers need to be added after the nesting variable is incremented
and before it is decremented to prevent usage of the context buffers by more
than one context at the same time.

Cc: Andy Lutomirski <luto@kernel.org>
Fixes: e2ace00117 ("tracing: Choose static tp_printk buffer by explicit nesting count")
Hat-tip-to: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 14:39:23 +02:00
Steven Rostedt (VMware)
100553e197 ftrace: Fix memleak when unregistering dynamic ops when tracing disabled
commit edb096e007 upstream.

If function tracing is disabled by the user via the function-trace option or
the proc sysctl file, and a ftrace_ops that was allocated on the heap is
unregistered, then the shutdown code exits out without doing the proper
clean up. This was found via kmemleak and running the ftrace selftests, as
one of the tests unregisters with function tracing disabled.

 # cat kmemleak
unreferenced object 0xffffffffa0020000 (size 4096):
  comm "swapper/0", pid 1, jiffies 4294668889 (age 569.209s)
  hex dump (first 32 bytes):
    55 ff 74 24 10 55 48 89 e5 ff 74 24 18 55 48 89  U.t$.UH...t$.UH.
    e5 48 81 ec a8 00 00 00 48 89 44 24 50 48 89 4c  .H......H.D$PH.L
  backtrace:
    [<ffffffff81d64665>] kmemleak_vmalloc+0x85/0xf0
    [<ffffffff81355631>] __vmalloc_node_range+0x281/0x3e0
    [<ffffffff8109697f>] module_alloc+0x4f/0x90
    [<ffffffff81091170>] arch_ftrace_update_trampoline+0x160/0x420
    [<ffffffff81249947>] ftrace_startup+0xe7/0x300
    [<ffffffff81249bd2>] register_ftrace_function+0x72/0x90
    [<ffffffff81263786>] trace_selftest_ops+0x204/0x397
    [<ffffffff82bb8971>] trace_selftest_startup_function+0x394/0x624
    [<ffffffff81263a75>] run_tracer_selftest+0x15c/0x1d7
    [<ffffffff82bb83f1>] init_trace_selftests+0x75/0x192
    [<ffffffff81002230>] do_one_initcall+0x90/0x1e2
    [<ffffffff82b7d620>] kernel_init_freeable+0x350/0x3fe
    [<ffffffff81d61ec3>] kernel_init+0x13/0x122
    [<ffffffff81d72c6a>] ret_from_fork+0x2a/0x40
    [<ffffffffffffffff>] 0xffffffffffffffff

Fixes: 12cce594fa ("ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 14:39:23 +02:00
Steven Rostedt (VMware)
df865f86b0 ftrace: Fix selftest goto location on error
commit 46320a6acc upstream.

In the second iteration of trace_selftest_ops(), the error goto label is
wrong in the case where trace_selftest_test_global_cnt is off. In the
case of error, it leaks the dynamic ops that was allocated.

Fixes: 95950c2e ("ftrace: Add self-tests for multiple function trace users")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 14:39:23 +02:00
Vincent Guittot
0b4a2f1d16 UPSTREAM: sched/fair: Fix FTQ noise bench regression
commit bc4278987e upstream.

A regression of the FTQ noise has been reported by Ying Huang,
on the following hardware:

  8 threads Intel(R) Core(TM)i7-4770 CPU @ 3.40GHz with 8G memory

... which was caused by this commit:

  commit 4e5160766f ("sched/fair: Propagate asynchrous detach")

The only part of the patch that can increase the noise is the update
of blocked load of group entity in update_blocked_averages().

We can optimize this call and skip the update of group entity if its load
and utilization are already null and there is no pending propagation of load
in the task group.

This optimization partly restores the noise score. A more agressive
optimization has been tried but has shown worse score.

Reported-by: ying.huang@linux.intel.com
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: ying.huang@intel.com
Fixes: 4e5160766f ("sched/fair: Propagate asynchrous detach")
Link: http://lkml.kernel.org/r/1489758442-2877-1-git-send-email-vincent.guittot@linaro.org
[ Fixed typos, improved layout. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Change-Id: Iba93bbda71125ce14519aa30b1beac2bf3c52e1e
Fixes: Change-Id: I5d1a9f273111c0bb7503ca3c9ad2406d12867cb5
       ("UPSTREAM: sched/fair: Propagate asynchrous detach")
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2017-09-23 01:23:27 +00:00
Steve Muckle
faf1269d4b ANDROID: configs: remove config fragments
The kernel config fragments for Android have moved into
their own repository located at

https://android.googlesource.com/kernel/configs/

Bug: 63994171
Change-Id: I837bac54cb5c90e6a6eb0f6f0ad5c90588c1a46a
Signed-off-by: Steve Muckle <smuckle@google.com>
2017-09-15 16:17:26 -07:00
Greg Kroah-Hartman
f7d2974f34 Merge 4.9.50 into android-4.9
Changes in 4.9.50
	mtd: nand: mxc: Fix mxc_v1 ooblayout
	mtd: nand: qcom: fix read failure without complete bootchain
	mtd: nand: qcom: fix config error for BCH
	nvme-fabrics: generate spec-compliant UUID NQNs
	btrfs: resume qgroup rescan on rw remount
	selftests/x86/fsgsbase: Test selectors 1, 2, and 3
	mm/memory.c: fix mem_cgroup_oom_disable() call missing
	locktorture: Fix potential memory leak with rw lock test
	ALSA: msnd: Optimize / harden DSP and MIDI loops
	Bluetooth: Properly check L2CAP config option output buffer length
	ARM64: dts: marvell: armada-37xx: Fix GIC maintenance interrupt
	ARM: 8692/1: mm: abort uaccess retries upon fatal signal
	NFS: Fix 2 use after free issues in the I/O code
	NFS: Sync the correct byte range during synchronous writes
	xfs: XFS_IS_REALTIME_INODE() should be false if no rt device present
	Linux 4.9.50

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-09-14 09:14:13 -07:00
Yang Shi
d21f3eaa09 locktorture: Fix potential memory leak with rw lock test
commit f4dbba5919 upstream.

When running locktorture module with the below commands with kmemleak enabled:

$ modprobe locktorture torture_type=rw_lock_irq
$ rmmod locktorture

The below kmemleak got caught:

root@10:~# echo scan > /sys/kernel/debug/kmemleak
[  323.197029] kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
root@10:~# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffffffc07592d500 (size 128):
  comm "modprobe", pid 368, jiffies 4294924118 (age 205.824s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 c3 7b 02 00 00 00 00 00  .........{......
    00 00 00 00 00 00 00 00 d7 9b 02 00 00 00 00 00  ................
  backtrace:
    [<ffffff80081e5a88>] create_object+0x110/0x288
    [<ffffff80086c6078>] kmemleak_alloc+0x58/0xa0
    [<ffffff80081d5acc>] __kmalloc+0x234/0x318
    [<ffffff80006fa130>] 0xffffff80006fa130
    [<ffffff8008083ae4>] do_one_initcall+0x44/0x138
    [<ffffff800817e28c>] do_init_module+0x68/0x1cc
    [<ffffff800811c848>] load_module+0x1a68/0x22e0
    [<ffffff800811d340>] SyS_finit_module+0xe0/0xf0
    [<ffffff80080836f0>] el0_svc_naked+0x24/0x28
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffffffc07592d480 (size 128):
  comm "modprobe", pid 368, jiffies 4294924118 (age 205.824s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 3b 6f 01 00 00 00 00 00  ........;o......
    00 00 00 00 00 00 00 00 23 6a 01 00 00 00 00 00  ........#j......
  backtrace:
    [<ffffff80081e5a88>] create_object+0x110/0x288
    [<ffffff80086c6078>] kmemleak_alloc+0x58/0xa0
    [<ffffff80081d5acc>] __kmalloc+0x234/0x318
    [<ffffff80006fa22c>] 0xffffff80006fa22c
    [<ffffff8008083ae4>] do_one_initcall+0x44/0x138
    [<ffffff800817e28c>] do_init_module+0x68/0x1cc
    [<ffffff800811c848>] load_module+0x1a68/0x22e0
    [<ffffff800811d340>] SyS_finit_module+0xe0/0xf0
    [<ffffff80080836f0>] el0_svc_naked+0x24/0x28
    [<ffffffffffffffff>] 0xffffffffffffffff

It is because cxt.lwsa and cxt.lrsa don't get freed in module_exit, so free
them in lock_torture_cleanup() and free writer_tasks if reader_tasks is
failed at memory allocation.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: 石洋 <yang.s@alibaba-inc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13 14:13:36 -07:00
Steve Muckle
9983305173 ANDROID: configs: require SYNC_FILE
CONFIG_SYNC_FILE is required in the absence of CONFIG_SYNC.

Change-Id: Ia52488a273c9e7b3080defe8cea19afc4ad21aaa
Signed-off-by: Steve Muckle <smuckle@google.com>
2017-09-07 08:47:51 +00:00
Greg Kroah-Hartman
85e1c0178a Merge 4.9.48 into android-4.9
Changes in 4.9.48
	irqchip: mips-gic: SYNC after enabling GIC region
	i2c: ismt: Don't duplicate the receive length for block reads
	i2c: ismt: Return EMSGSIZE for block reads with bogus length
	crypto: algif_skcipher - only call put_page on referenced and used pages
	mm, uprobes: fix multiple free of ->uprobes_state.xol_area
	mm, madvise: ensure poisoned pages are removed from per-cpu lists
	ceph: fix readpage from fscache
	cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs
	cpuset: Fix incorrect memory_pressure control file mapping
	alpha: uapi: Add support for __SANE_USERSPACE_TYPES__
	CIFS: Fix maximum SMB2 header size
	CIFS: remove endian related sparse warning
	wl1251: add a missing spin_lock_init()
	lib/mpi: kunmap after finishing accessing buffer
	xfrm: policy: check policy direction value
	drm/ttm: Fix accounting error when fail to get pages for pool
	kvm: arm/arm64: Force reading uncached stage2 PGD
	epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()
	Linux 4.9.48

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-09-07 10:32:23 +02:00
Waiman Long
309e4dbfaf cpuset: Fix incorrect memory_pressure control file mapping
commit 1c08c22c87 upstream.

The memory_pressure control file was incorrectly set up without
a private value (0, by default). As a result, this control
file was treated like memory_migrate on read. By adding back the
FILE_MEMORY_PRESSURE private value, the correct memory pressure value
will be returned.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 7dbdb199d3 ("cgroup: replace cftype->mode with CFTYPE_WORLD_WRITABLE")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-07 08:35:40 +02:00
Eric Biggers
17c564f629 mm, uprobes: fix multiple free of ->uprobes_state.xol_area
commit 355627f518 upstream.

Commit 7c05126793 ("mm, fork: make dup_mmap wait for mmap_sem for
write killable") made it possible to kill a forking task while it is
waiting to acquire its ->mmap_sem for write, in dup_mmap().

However, it was overlooked that this introduced an new error path before
the new mm_struct's ->uprobes_state.xol_area has been set to NULL after
being copied from the old mm_struct by the memcpy in dup_mm().  For a
task that has previously hit a uprobe tracepoint, this resulted in the
'struct xol_area' being freed multiple times if the task was killed at
just the right time while forking.

Fix it by setting ->uprobes_state.xol_area to NULL in mm_init() rather
than in uprobe_dup_mmap().

With CONFIG_UPROBE_EVENTS=y, the bug can be reproduced by the same C
program given by commit 2b7e8665b4 ("fork: fix incorrect fput of
->exe_file causing use-after-free"), provided that a uprobe tracepoint
has been set on the fork_thread() function.  For example:

    $ gcc reproducer.c -o reproducer -lpthread
    $ nm reproducer | grep fork_thread
    0000000000400719 t fork_thread
    $ echo "p $PWD/reproducer:0x719" > /sys/kernel/debug/tracing/uprobe_events
    $ echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable
    $ ./reproducer

Here is the use-after-free reported by KASAN:

    BUG: KASAN: use-after-free in uprobe_clear_state+0x1c4/0x200
    Read of size 8 at addr ffff8800320a8b88 by task reproducer/198

    CPU: 1 PID: 198 Comm: reproducer Not tainted 4.13.0-rc7-00015-g36fde05f3fb5 #255
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
    Call Trace:
     dump_stack+0xdb/0x185
     print_address_description+0x7e/0x290
     kasan_report+0x23b/0x350
     __asan_report_load8_noabort+0x19/0x20
     uprobe_clear_state+0x1c4/0x200
     mmput+0xd6/0x360
     do_exit+0x740/0x1670
     do_group_exit+0x13f/0x380
     get_signal+0x597/0x17d0
     do_signal+0x99/0x1df0
     exit_to_usermode_loop+0x166/0x1e0
     syscall_return_slowpath+0x258/0x2c0
     entry_SYSCALL_64_fastpath+0xbc/0xbe

    ...

    Allocated by task 199:
     save_stack_trace+0x1b/0x20
     kasan_kmalloc+0xfc/0x180
     kmem_cache_alloc_trace+0xf3/0x330
     __create_xol_area+0x10f/0x780
     uprobe_notify_resume+0x1674/0x2210
     exit_to_usermode_loop+0x150/0x1e0
     prepare_exit_to_usermode+0x14b/0x180
     retint_user+0x8/0x20

    Freed by task 199:
     save_stack_trace+0x1b/0x20
     kasan_slab_free+0xa8/0x1a0
     kfree+0xba/0x210
     uprobe_clear_state+0x151/0x200
     mmput+0xd6/0x360
     copy_process.part.8+0x605f/0x65d0
     _do_fork+0x1a5/0xbd0
     SyS_clone+0x19/0x20
     do_syscall_64+0x22f/0x660
     return_from_SYSCALL_64+0x0/0x7a

Note: without KASAN, you may instead see a "Bad page state" message, or
simply a general protection fault.

Link: http://lkml.kernel.org/r/20170830033303.17927-1-ebiggers3@gmail.com
Fixes: 7c05126793 ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-07 08:35:39 +02:00