This is the 3.10.88 stable release
* tag 'v3.10.88' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (48 commits)
Linux 3.10.88
arm64/mm: Remove hack in mmap randomize layout
crypto: caam - fix memory corruption in ahash_final_ctx
libfc: Fix fc_fcp_cleanup_each_cmd()
drm/radeon: add new OLAND pci id
EDAC, ppc4xx: Access mci->csrows array elements properly
localmodconfig: Use Kbuild files too
dm thin metadata: delete btrees when releasing metadata snapshot
perf: Fix fasync handling on inherited events
mm/hwpoison: fix page refcount of unknown non LRU page
ipc/sem.c: update/correct memory barriers
ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
Linux 3.10.87
mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
md/bitmap: return an error when bitmap superblock is corrupt.
kvm: x86: fix kvm_apic_has_events to check for NULL pointer
signal: fix information leak in copy_siginfo_from_user32
signal: fix information leak in copy_siginfo_to_user
signalfd: fix information leak in signalfd_copyinfo
ARM: 7819/1: fiq: Cast the first argument of flush_icache_range()
...
commit fed66e2cdd upstream.
Vince reported that the fasync signal stuff doesn't work proper for
inherited events. So fix that.
Installing fasync allocates memory and sets filp->f_flags |= FASYNC,
which upon the demise of the file descriptor ensures the allocation is
freed and state is updated.
Now for perf, we can have the events stick around for a while after the
original FD is dead because of references from child events. So we
cannot copy the fasync pointer around. We can however consistently use
the parent's fasync, as that will be updated.
Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho deMelo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c00cb5e68 upstream.
This function can leak kernel stack data when the user siginfo_t has a
positive si_code value. The top 16 bits of si_code descibe which fields
in the siginfo_t union are active, but they are treated inconsistently
between copy_siginfo_from_user32, copy_siginfo_to_user32 and
copy_siginfo_to_user.
copy_siginfo_from_user32 is called from rt_sigqueueinfo and
rt_tgsigqueueinfo in which the user has full control overthe top 16 bits
of si_code.
This fixes the following information leaks:
x86: 8 bytes leaked when sending a signal from a 32-bit process to
itself. This leak grows to 16 bytes if the process uses x32.
(si_code = __SI_CHLD)
x86: 100 bytes leaked when sending a signal from a 32-bit process to
a 64-bit process. (si_code = -1)
sparc: 4 bytes leaked when sending a signal from a 32-bit process to a
64-bit process. (si_code = any)
parsic and s390 have similar bugs, but they are not vulnerable because
rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code
to a different process. These bugs are also fixed for consistency.
Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26135022f8 upstream.
This function may copy the si_addr_lsb, si_lower and si_upper fields to
user mode when they haven't been initialized, which can leak kernel
stack data to user mode.
Just checking the value of si_code is insufficient because the same
si_code value is shared between multiple signals. This is solved by
checking the value of si_signo in addition to si_code.
Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Conflicts:
fs/exec.c
Resolution summary:
Conflict between upstream/LTS commit 9eae8ac6ab (fs: take
i_mutex during prepare_binprm for set[ug]id executables) and
android commit 9d0ff694bc (sched: move no_new_privs into new
atomic flags). Resolution: move task_no_new_privs() usage into
new function created by upstream/LTS comit.
This is the 3.10.86 stable release
* tag 'v3.10.86': (132 commits)
Linux 3.10.86
efi: fix 32bit kernel boot failed problem using efi
iscsi-target: Fix iser explicit logout TX kthread leak
iscsi-target: Fix use-after-free during TPG session shutdown
vhost: actually track log eventfd file
rds: rds_ib_device.refcount overflow
xhci: prevent bus_suspend if SS port resuming in phase 1
xhci: report U3 when link is in resume state
xhci: Calculate old endpoints correctly on device reset
usb-storage: ignore ZTE MF 823 card reader in mode 0x1225
ata: pmp: add quirk for Marvell 4140 SATA PMP
blkcg: fix gendisk reference leak in blkg_conf_prep()
Input: usbtouchscreen - avoid unresponsive TSC-30 touch screen
tile: use free_bootmem_late() for initrd
md/raid1: fix test for 'was read error from last working device'.
mmc: sdhci-pxav3: fix platform_data is not initialized
mmc: sdhci-esdhc: Make 8BIT bus work
mac80211: clear subdir_stations when removing debugfs
st: null pointer dereference panic caused by use after kref_put by st_open
ALSA: hda - Fix MacBook Pro 5,2 quirk
...
commit 75a06189fc upstream.
The resend mechanism happily calls the interrupt handler of interrupts
which are marked IRQ_NESTED_THREAD from softirq context. This can
result in crashes because the interrupt handler is not the proper way
to invoke the device handlers. They must be invoked via
handle_nested_irq.
Prevent the resend even if the interrupt has no valid parent irq
set. Its better to have a lost interrupt than a crashing machine.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6224beb12e upstream.
Fengguang Wu's tests triggered a bug in the branch tracer's start up
test when CONFIG_DEBUG_PREEMPT set. This was because that config
adds some debug logic in the per cpu field, which calls back into
the branch tracer.
The branch tracer has its own recursive checks, but uses a per cpu
variable to implement it. If retrieving the per cpu variable calls
back into the branch tracer, you can see how things will break.
Instead of using a per cpu variable, use the trace_recursion field
of the current task struct. Simply set a bit when entering the
branch tracing and clear it when leaving. If the bit is set on
entry, just don't do the tracing.
There's also the case with lockdep, as the local_irq_save() called
before the recursion can also trigger code that can call back into
the function. Changing that to a raw_local_irq_save() will protect
that as well.
This prevents the recursion and the inevitable crash that follows.
Link: http://lkml.kernel.org/r/20150630141803.GA28071@wfg-t540p.sh.intel.com
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b88f44e16 upstream.
While debugging a WARN_ON() for filtering, I found that it is possible
for the filter string to be referenced after its end. With the filter:
# echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter
The filter_parse() function can call infix_get_op() which calls
infix_advance() that updates the infix filter pointers for the cnt
and tail without checking if the filter is already at the end, which
will put the cnt to zero and the tail beyond the end. The loop then calls
infix_next() that has
ps->infix.cnt--;
return ps->infix.string[ps->infix.tail++];
The cnt will now be below zero, and the tail that is returned is
already passed the end of the filter string. So far the allocation
of the filter string usually has some buffer that is zeroed out, but
if the filter string is of the exact size of the allocated buffer
there's no guarantee that the charater after the nul terminating
character will be zero.
Luckily, only root can write to the filter.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b4875bbe7e upstream.
When testing the fix for the trace filter, I could not come up with
a scenario where the operand count goes below zero, so I added a
WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case
that it can happen (although the filter would be wrong).
# echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter
That is, a single operation without any operands will hit the path
where the WARN_ON_ONCE() can trigger. Although this is harmless,
and the filter is reported as a error. But instead of spitting out
a warning to the kernel dmesg, just fail nicely and report it via
the proper channels.
Link: http://lkml.kernel.org/r/558C6082.90608@oracle.com
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is the 3.10.83 stable release
* tag 'v3.10.83' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (29 commits)
Linux 3.10.83
bus: mvebu: pass the coherency availability information at init time
KVM: nSVM: Check for NRIPS support before updating control field
ARM: clk-imx6q: refine sata's parent
d_walk() might skip too much
ipv6: update ip6_rt_last_gc every time GC is run
ipv6: prevent fib6_run_gc() contention
xfrm: Increase the garbage collector threshold
Btrfs: make xattr replace operations atomic
x86/microcode/intel: Guard against stack overflow in the loader
fs: take i_mutex during prepare_binprm for set[ug]id executables
hpsa: add missing pci_set_master in kdump path
hpsa: refine the pci enable/disable handling
sb_edac: Fix erroneous bytes->gigabytes conversion
ACPICA: Utilities: Cleanup to remove useless ACPI_PRINTF/FORMAT_xxx helpers.
ACPICA: Utilities: Cleanup to convert physical address printing formats.
__ptrace_may_access() should not deny sub-threads
include/linux/sched.h: don't use task->pid/tgid in same_thread_group/has_group_leader_pid
netfilter: Zero the tuple in nfnl_cthelper_parse_tuple()
netfilter: nfnetlink_cthelper: Remove 'const' and '&' to avoid warnings
...
commit 73af963f9f upstream.
__ptrace_may_access() checks get_dumpable/ptrace_has_cap/etc if task !=
current, this can can lead to surprising results.
For example, a sub-thread can't readlink("/proc/self/exe") if the
executable is not readable. setup_new_exec()->would_dump() notices that
inode_permission(MAY_READ) fails and then it does
set_dumpable(suid_dumpable). After that get_dumpable() fails.
(It is not clear why proc_pid_readlink() checks get_dumpable(), perhaps we
could add PTRACE_MODE_NODUMPABLE)
Change __ptrace_may_access() to use same_thread_group() instead of "task
== current". Any security check is pointless when the tasks share the
same ->mm.
Signed-off-by: Mark Grondona <mgrondona@llnl.gov>
Signed-off-by: Ben Woodard <woodard@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2cf30dc180 upstream.
When the following filter is used it causes a warning to trigger:
# cd /sys/kernel/debug/tracing
# echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
# cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: No error
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1223 at kernel/trace/trace_events_filter.c:1640 replace_preds+0x3c5/0x990()
Modules linked in: bnep lockd grace bluetooth ...
CPU: 3 PID: 1223 Comm: bash Tainted: G W 4.1.0-rc3-test+ #450
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
0000000000000668 ffff8800c106bc98 ffffffff816ed4f9 ffff88011ead0cf0
0000000000000000 ffff8800c106bcd8 ffffffff8107fb07 ffffffff8136b46c
ffff8800c7d81d48 ffff8800d4c2bc00 ffff8800d4d4f920 00000000ffffffea
Call Trace:
[<ffffffff816ed4f9>] dump_stack+0x4c/0x6e
[<ffffffff8107fb07>] warn_slowpath_common+0x97/0xe0
[<ffffffff8136b46c>] ? _kstrtoull+0x2c/0x80
[<ffffffff8107fb6a>] warn_slowpath_null+0x1a/0x20
[<ffffffff81159065>] replace_preds+0x3c5/0x990
[<ffffffff811596b2>] create_filter+0x82/0xb0
[<ffffffff81159944>] apply_event_filter+0xd4/0x180
[<ffffffff81152bbf>] event_filter_write+0x8f/0x120
[<ffffffff811db2a8>] __vfs_write+0x28/0xe0
[<ffffffff811dda43>] ? __sb_start_write+0x53/0xf0
[<ffffffff812e51e0>] ? security_file_permission+0x30/0xc0
[<ffffffff811dc408>] vfs_write+0xb8/0x1b0
[<ffffffff811dc72f>] SyS_write+0x4f/0xb0
[<ffffffff816f5217>] system_call_fastpath+0x12/0x6a
---[ end trace e11028bd95818dcd ]---
Worse yet, reading the error message (the filter again) it says that
there was no error, when there clearly was. The issue is that the
code that checks the input does not check for balanced ops. That is,
having an op between a closed parenthesis and the next token.
This would only cause a warning, and fail out before doing any real
harm, but it should still not caues a warning, and the error reported
should work:
# cd /sys/kernel/debug/tracing
# echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
# cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: Meaningless filter expression
And give no kernel warning.
Link: http://lkml.kernel.org/r/20150615175025.7e809215@gandalf.local.home
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[ luis: backported to 3.16:
- unconditionally decrement cnt as the OP_NOT logic was introduced only
by e12c09cf30 ("tracing: Add NOT to filtering logic") ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* 'linux-3.10.y' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (30 commits)
Linux 3.10.81
btrfs: cleanup orphans while looking up default subvolume
btrfs: incorrect handling for fiemap_fill_next_extent return
cfg80211: wext: clear sinfo struct before calling driver
mm/memory_hotplug.c: set zone->wait_table to null after freeing it
drm/i915: Fix DDC probe for passive adapters
pata_octeon_cf: fix broken build
ozwpan: unchecked signed subtraction leads to DoS
ozwpan: divide-by-zero leading to panic
ozwpan: Use proper check to prevent heap overflow
MIPS: Fix enabling of DEBUG_STACKOVERFLOW
ring-buffer-benchmark: Fix the wrong sched_priority of producer
USB: serial: ftdi_sio: Add support for a Motion Tracker Development Board
USB: cp210x: add ID for HubZ dual ZigBee and Z-Wave dongle
block: fix ext_dev_lock lockdep report
Input: elantech - fix detection of touchpads where the revision matches a known rate
ALSA: usb-audio: add MAYA44 USB+ mixer control names
ALSA: usb-audio: Add mic volume fix quirk for Logitech Quickcam Fusion
ALSA: hda/realtek - Add a fixup for another Acer Aspire 9420
iio: adis16400: Compute the scan mask from channel indices
...
* 'linaro-android-3.10-lsk' of git://android.git.linaro.org/kernel/linaro-android:
sched: cpufreq: update power usage only if cpufreq_stat is enabled
uid_cputime: Extends the cputime functionality to report power per uid
sched: cpufreq: Adds a field cpu_power in the task_struct
cpufreq_stats: Adds the fucntionality to load current values for each frequency for all the cores.
android: drivers: workaround debugfs race in binder
neigh: Better handling of transition to NUD_PROBE state
New Build Breakage in branch: kernel-m-dev-tegra-flounder-3.10 @ 1960706
net/unix: sk_socket can disappear when state is unlocked
selinux: enable genfscon labeling for sysfs and pstore files
selinux: enable per-file labeling for debugfs files.
ext4: don't save the error information if the block device is read-only
suspend: Return error when pending wakeup source is found.
proc: uid_cputime: fix show_uid_stat permission
power: increment wakeup_count when save_wakeup_count failed.
power: validate wakeup source before activating it.
Call acct_update_power() to track power usage of task only if
CONFIG_CPU_FREQ_STAT is enabled, otherwise we run into
following build failure:
---------------
kernel/built-in.o: In function `account_user_time':
kernel/sched/cputime.c:155: undefined reference to `acct_update_power'
kernel/built-in.o: In function `__account_system_time':
kernel/sched/cputime.c:208: undefined reference to `acct_update_power'
make: *** [vmlinux] Error 1
---------------
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
* android-3.10:
uid_cputime: Extends the cputime functionality to report power per uid
sched: cpufreq: Adds a field cpu_power in the task_struct
cpufreq_stats: Adds the fucntionality to load current values for each frequency for all the cores.
android: drivers: workaround debugfs race in binder
neigh: Better handling of transition to NUD_PROBE state
New Build Breakage in branch: kernel-m-dev-tegra-flounder-3.10 @ 1960706
net/unix: sk_socket can disappear when state is unlocked
selinux: enable genfscon labeling for sysfs and pstore files
selinux: enable per-file labeling for debugfs files.
ext4: don't save the error information if the block device is read-only
suspend: Return error when pending wakeup source is found.
proc: uid_cputime: fix show_uid_stat permission
power: increment wakeup_count when save_wakeup_count failed.
power: validate wakeup source before activating it.
cpu_power has been added to keep track of amount of power each task is
consuming. cpu_power is updated whenever stime and utime are updated for
a task. power is computed by taking into account the frequency at which
the current core was running and the current for cpu actively
running at hat frequency.
Change-Id: Ic535941e7b339aab5cae9081a34049daeb44b248
Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
If a wakeup source is found to be pending in the last stage of suspend
after syscore suspend then the device doesn't suspend but the error is
not propogated which causes an error in the accounting for the number
of suspend aborts and successful suspends.
Change-Id: Ib63b4ead755127eaf03e3b303aab3c782ad02ed1
Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
Doing a Exponential moving average per nr_running++/-- does not
guarantee a fixed sample rate which induces errors if there are lots of
threads being enqueued/dequeued from the rq (Linpack mt). Instead of
keeping track of the avg, the scheduler now keeps track of the integral
of nr_running and allows the readers to perform filtering on top.
Implemented a proper exponential moving average for the runnables
governor and a straight 100ms average for the balanced governor. Tweaked
the thresholds for the runnables governor to minimize latency. Also,
decreased sample_rate for the runnables governor to the absolute minimum
of 10msecs.
Signed-off-by: Huang, Tao <huangtao@rock-chips.com>
commit fc21c0cff2 upstream.
This commit was incomplete in that code to remove items from the per-cpu
lists was missing and never acquired a user in the 5 years it has been in
the tree. We're going to implement what it seems to try to archive in a
simpler way, and this code is in the way of doing so.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pan Xinhui <xinhuix.pan@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The following method of CPU hotplug callback registration is not safe
due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock
and the cpu_hotplug.lock.
get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);
register_cpu_notifier(&foobar_cpu_notifier);
put_online_cpus();
The deadlock is shown below:
CPU 0 CPU 1
----- -----
Acquire cpu_hotplug.lock
[via get_online_cpus()]
CPU online/offline operation
takes cpu_add_remove_lock
[via cpu_maps_update_begin()]
Try to acquire
cpu_add_remove_lock
[via register_cpu_notifier()]
CPU online/offline operation
tries to acquire cpu_hotplug.lock
[via cpu_hotplug_begin()]
*** DEADLOCK! ***
The problem here is that callback registration takes the locks in one order
whereas the CPU hotplug operations take the same locks in the opposite order.
To avoid this issue and to provide a race-free method to register CPU hotplug
callbacks (along with initialization of already online CPUs), introduce new
variants of the callback registration APIs that simply register the callbacks
without holding the cpu_add_remove_lock during the registration. That way,
we can avoid the ABBA scenario. However, we will need to hold the
cpu_add_remove_lock throughout the entire critical section, to protect updates
to the callback/notifier chain.
This can be achieved by writing the callback registration code as follows:
cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ]
for_each_online_cpu(cpu)
init_cpu(cpu);
/* This doesn't take the cpu_add_remove_lock */
__register_cpu_notifier(&foobar_cpu_notifier);
cpu_maps_update_done(); [ or cpu_notifier_register_done(); see below ]
Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin()
because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD
notifiers, and hence get_online_cpus() cannot provide the necessary
synchronization to protect the callback/notifier chains against concurrent
reads and writes. On the other hand, since the cpu_add_remove_lock protects
the entire hotplug operation (including CPU_POST_DEAD), we can use
cpu_maps_update_begin/done() to guarantee proper synchronization.
Also, since cpu_maps_update_begin/done() is like a super-set of
get/put_online_cpus(), the former naturally protects the critical sections
from concurrent hotplug operations.
Since the names cpu_maps_update_begin/done() don't make much sense in CPU
hotplug callback registration scenarios, we'll introduce new APIs named
cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done().
In summary, introduce the lockless variants of un/register_cpu_notifier() and
also export the cpu_notifier_register_begin/done() APIs for use by modules.
This way, we provide a race-free way to register hotplug callbacks as well as
perform initialization for the CPUs that are already online.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 93ae4f978c)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
commit 28423ad283 upstream.
While debugging an issue with excessive softirq usage, I encountered the
following note in commit 3e339b5dae ("softirq: Use hotplug thread
infrastructure"):
[ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]
...but despite this note, the patch still calls RCU with IRQs disabled.
This seemingly innocuous change caused a significant regression in softirq
CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
introducing 0.01% packet loss, the softirq usage would jump to around 25%,
spiking as high as 50%. Before the change, the usage would never exceed 5%.
Moving the call to rcu_note_context_switch() after the cond_sched() call,
as it was originally before the hotplug patch, completely eliminated this
problem.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b72c186999 upstream.
ptrace_resume() is called when the tracee is still __TASK_TRACED. We set
tracee->exit_code and then wake_up_state() changes tracee->state. If the
tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
wrongly looks like another report from tracee.
This confuses debugger, and since wait_task_stopped() clears ->exit_code
the tracee can miss a signal.
Test-case:
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <pthread.h>
#include <assert.h>
int pid;
void *waiter(void *arg)
{
int stat;
for (;;) {
assert(pid == wait(&stat));
assert(WIFSTOPPED(stat));
if (WSTOPSIG(stat) == SIGHUP)
continue;
assert(WSTOPSIG(stat) == SIGCONT);
printf("ERR! extra/wrong report:%x\n", stat);
}
}
int main(void)
{
pthread_t thread;
pid = fork();
if (!pid) {
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
for (;;)
kill(getpid(), SIGHUP);
}
assert(pthread_create(&thread, NULL, waiter, NULL) == 0);
for (;;)
ptrace(PTRACE_CONT, pid, 0, SIGCONT);
return 0;
}
Note for stable: the bug is very old, but without 9899d11f65 "ptrace:
ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
should use lock_task_sighand(child).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Pavel Labath <labath@google.com>
Tested-by: Pavel Labath <labath@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 80a9b64e2c upstream.
It has come to my attention that this_cpu_read/write are horrible on
architectures other than x86. Worse yet, they actually disable
preemption or interrupts! This caused some unexpected tracing results
on ARM.
101.356868: preempt_count_add <-ring_buffer_lock_reserve
101.356870: preempt_count_sub <-ring_buffer_lock_reserve
The ring_buffer_lock_reserve has recursion protection that requires
accessing a per cpu variable. But since preempt_disable() is traced, it
too got traced while accessing the variable that is suppose to prevent
recursion like this.
The generic version of this_cpu_read() and write() are:
#define this_cpu_generic_read(pcp) \
({ typeof(pcp) ret__; \
preempt_disable(); \
ret__ = *this_cpu_ptr(&(pcp)); \
preempt_enable(); \
ret__; \
})
#define this_cpu_generic_to_op(pcp, val, op) \
do { \
unsigned long flags; \
raw_local_irq_save(flags); \
*__this_cpu_ptr(&(pcp)) op val; \
raw_local_irq_restore(flags); \
} while (0)
Which is unacceptable for locations that know they are within preempt
disabled or interrupt disabled locations.
Paul McKenney stated that __this_cpu_() versions produce much better code on
other architectures than this_cpu_() does, if we know that the call is done in
a preempt disabled location.
I also changed the recursive_unlock() to use two local variables instead
of accessing the per_cpu variable twice.
Link: http://lkml.kernel.org/r/20150317114411.GE3589@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20150317104038.312e73d1@gandalf.local.home
Acked-by: Christoph Lameter <cl@linux.com>
Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Tested-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 946e51f2bf upstream.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ben Hutchings <ben@decadent.org.uk>
[hujianyang: Backported to 3.10 refer to the work of Ben Hutchings in 3.2:
- Apply name changes in all the different places we use d_alias and d_child
- Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()]
Signed-off-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30a22c215a upstream.
commit 6ae9200f2c ("enlarge console.name") increased the storage
for the console name to 16 bytes, but not the corresponding
struct console_cmdline::name storage. Console names longer than
8 bytes cause read beyond end-of-string and failure to match
console; I'm not sure if there are other unexpected consequences.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit cd5c2cc93d (hmp: Remove potential for task_struct access
race) introduced a put_task_struct() to prevent races, but in doing so
introduced potential spinlock recursion. (This change was further
consolidated in commit 0baa5811ba -- sched: hmp: unify active
migration code.)
Unfortunately, the put_task_struct() is done while the runqueue
spinlock is held, but put_task_struct() can also cause a reschedule
causing the runqueue lock to be acquired recursively.
To fix, move the put_task_struct() outside the runqueue spinlock.
Reported-by: Victor Lixin <victor.lixin@hisilicon.com>
Cc: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Reviewed-by: Jon Medhurst <tixy@linaro.org>
Reviewed-by: Alex Shi <alex.shi@linaro.org>
Reviewed-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
commit d525211f9d upstream.
Vince reported a watchdog lockup like:
[<ffffffff8115e114>] perf_tp_event+0xc4/0x210
[<ffffffff810b4f8a>] perf_trace_lock+0x12a/0x160
[<ffffffff810b7f10>] lock_release+0x130/0x260
[<ffffffff816c7474>] _raw_spin_unlock_irqrestore+0x24/0x40
[<ffffffff8107bb4d>] do_send_sig_info+0x5d/0x80
[<ffffffff811f69df>] send_sigio_to_task+0x12f/0x1a0
[<ffffffff811f71ce>] send_sigio+0xae/0x100
[<ffffffff811f72b7>] kill_fasync+0x97/0xf0
[<ffffffff8115d0b4>] perf_event_wakeup+0xd4/0xf0
[<ffffffff8115d103>] perf_pending_event+0x33/0x60
[<ffffffff8114e3fc>] irq_work_run_list+0x4c/0x80
[<ffffffff8114e448>] irq_work_run+0x18/0x40
[<ffffffff810196af>] smp_trace_irq_work_interrupt+0x3f/0xc0
[<ffffffff816c99bd>] trace_irq_work_interrupt+0x6d/0x80
Which is caused by an irq_work generating new irq_work and therefore
not allowing forward progress.
This happens because processing the perf irq_work triggers another
perf event (tracepoint stuff) which in turn generates an irq_work ad
infinitum.
Avoid this by raising the recursion counter in the irq_work -- which
effectively disables all software events (including tracepoints) from
actually triggering again.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>