commit 5ea80f76a5 upstream.
This reverts commit df54d6fa54.
The commit isn't necessarily wrong, but because it recalculates the
random mmap_base every time, it seems to confuse user memory allocators
that expect contiguous mmap allocations even when the mmap address isn't
specified.
In particular, the MATLAB Java runtime seems to be unhappy. See
https://bugzilla.kernel.org/show_bug.cgi?id=60774
So we'll want to apply the random offset only once, and Radu has a patch
for that. Revert this older commit in order to apply the other one.
Reported-by: Jeff Shorey <shoreyjeff@gmail.com>
Cc: Radu Caragea <sinaelgl@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d79ff14262 upstream.
This patch adds wait_event_interruptible_lock_irq_timeout(), which is a
straight-forward descendant of wait_event_interruptible_timeout() and
wait_event_interruptible_lock_irq().
The zfcp driver used to call wait_event_interruptible_timeout()
in combination with some intricate and error-prone locking. Using
wait_event_interruptible_lock_irq_timeout() as a replacement
nicely cleans up that locking.
This rework removes a situation that resulted in a locking imbalance
in zfcp_qdio_sbal_get():
BUG: workqueue leaked lock or atomic: events/1/0xffffff00/10
last function: zfcp_fc_wka_port_offline+0x0/0xa0 [zfcp]
It was introduced by commit c2af7545aa
"[SCSI] zfcp: Do not wait for SBALs on stopped queue", which had a new
code path related to ZFCP_STATUS_ADAPTER_QDIOUP that took an early exit
without a required lock being held. The problem occured when a
special, non-SCSI I/O request was being submitted in process context,
when the adapter's queues had been torn down. In this case the bug
surfaced when the Fibre Channel port connection for a well-known address
was closed during a concurrent adapter shut-down procedure, which is a
rare constellation.
This patch also fixes these warnings from the sparse tool (make C=1):
drivers/s390/scsi/zfcp_qdio.c:224:12: warning: context imbalance in
'zfcp_qdio_sbal_check' - wrong count at exit
drivers/s390/scsi/zfcp_qdio.c:244:5: warning: context imbalance in
'zfcp_qdio_sbal_get' - unexpected unlock
Last but not least, we get rid of that crappy lock-unlock-lock
sequence at the beginning of the critical section.
It is okay to call zfcp_erp_adapter_reopen() with req_q_lock held.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2816c551c7 upstream.
Change trace_remove_event_call(call) to return the error if this
call is active. This is what the callers assume but can't verify
outside of the tracing locks. Both trace_kprobe.c/trace_uprobe.c
need the additional changes, unregister_trace_probe() should abort
if trace_remove_event_call() fails.
The caller is going to free this call/file so we must ensure that
nobody can use them after trace_remove_event_call() succeeds.
debugfs should be fine after the previous changes and event_remove()
does TRACE_REG_UNREGISTER, but still there are 2 reasons why we need
the additional checks:
- There could be a perf_event(s) attached to this tp_event, so the
patch checks ->perf_refcount.
- TRACE_REG_UNREGISTER can be suppressed by FTRACE_EVENT_FL_SOFT_MODE,
so we simply check FTRACE_EVENT_FL_ENABLED protected by event_mutex.
Link: http://lkml.kernel.org/r/20130729175033.GB26284@redhat.com
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 60f75b8e97 upstream.
In theory, under a given ACPI namespace node there should be only
one child device object with _ADR whose value matches a given bus
address exactly. In practice, however, there are systems in which
multiple child device objects under a given parent have _ADR matching
exactly the same address. In those cases we use _STA to determine
which of the multiple matching devices is enabled, since some systems
are known to indicate which ACPI device object to associate with the
given physical (usually PCI) device this way.
Unfortunately, as it turns out, there are systems in which many
device objects under the same parent have _ADR matching exactly the
same bus address and none of them has _STA, in which case they all
should be regarded as enabled according to the spec. Still, if
those device objects are supposed to represent bridges (e.g. this
is the case for device objects corresponding to PCIe ports), we can
try harder and skip the ones that have no child device objects in the
ACPI namespace. With luck, we can avoid using device objects that we
are not expected to use this way.
Although this only works for bridges whose children also have ACPI
namespace representation, it is sufficient to address graphics
adapter detection issues on some systems, so rework the code finding
a matching device ACPI handle for a given bus address to implement
this idea.
Introduce a new function, acpi_find_child(), taking three arguments:
the ACPI handle of the device's parent, a bus address suitable for
the device's bus type and a bool indicating if the device is a
bridge and make it work as outlined above. Reimplement the function
currently used for this purpose, acpi_get_child(), as a call to
acpi_find_child() with the last argument set to 'false' and make
the PCI subsystem use acpi_find_child() with the bridge information
passed as the last argument to it. [Lan Tianyu notices that it is
not sufficient to use pci_is_bridge() for that, because the device's
subordinate pointer hasn't been set yet at this point, so use
hdr_type instead.]
This change fixes a regression introduced inadvertently by commit
33f767d (ACPI: Rework acpi_get_child() to be more efficient) which
overlooked the fact that for acpi_walk_namespace() "post-order" means
"after all children have been visited" rather than "on the way back",
so for device objects without children and for namespace walks of
depth 1, as in the acpi_get_child() case, the "post-order" callbacks
ordering is actually the same as the ordering of "pre-order" ones.
Since that commit changed the namespace walk in acpi_get_child() to
terminate after finding the first matching object instead of going
through all of them and returning the last one, it effectively
changed the result returned by that function in some rare cases and
that led to problems (the switch from a "pre-order" to a "post-order"
callback was supposed to prevent that from happening, but it was
ineffective).
As it turns out, the systems where the change made by commit
33f767d actually matters are those where there are multiple ACPI
device objects representing the same PCIe port (which effectively
is a bridge). Moreover, only one of them, and the one we are
expected to use, has child device objects in the ACPI namespace,
so the regression can be addressed as described above.
References: https://bugzilla.kernel.org/show_bug.cgi?id=60561
Reported-by: Peter Wu <lekensteyn@gmail.com>
Tested-by: Vladimir Lalov <mail@vlalov.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Peter Wu <lekensteyn@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b047252d0 upstream.
Ben Tebulin reported:
"Since v3.7.2 on two independent machines a very specific Git
repository fails in 9/10 cases on git-fsck due to an SHA1/memory
failures. This only occurs on a very specific repository and can be
reproduced stably on two independent laptops. Git mailing list ran
out of ideas and for me this looks like some very exotic kernel issue"
and bisected the failure to the backport of commit 53a59fc67f ("mm:
limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").
That commit itself is not actually buggy, but what it does is to make it
much more likely to hit the partial TLB invalidation case, since it
introduces a new case in tlb_next_batch() that previously only ever
happened when running out of memory.
The real bug is that the TLB gather virtual memory range setup is subtly
buggered. It was introduced in commit 597e1c3580 ("mm/mmu_gather:
enable tlb flush range in generic mmu_gather"), and the range handling
was already fixed at least once in commit e6c495a96c ("mm: fix the TLB
range flushed when __tlb_remove_page() runs out of slots"), but that fix
was not complete.
The problem with the TLB gather virtual address range is that it isn't
set up by the initial tlb_gather_mmu() initialization (which didn't get
the TLB range information), but it is set up ad-hoc later by the
functions that actually flush the TLB. And so any such case that forgot
to update the TLB range entries would potentially miss TLB invalidates.
Rather than try to figure out exactly which particular ad-hoc range
setup was missing (I personally suspect it's the hugetlb case in
zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
did), this patch just gets rid of the problem at the source: make the
TLB range information available to tlb_gather_mmu(), and initialize it
when initializing all the other tlb gather fields.
This makes the patch larger, but conceptually much simpler. And the end
result is much more understandable; even if you want to play games with
partial ranges when invalidating the TLB contents in chunks, now the
range information is always there, and anybody who doesn't want to
bother with it won't introduce subtle bugs.
Ben verified that this fixes his problem.
Reported-bisected-and-tested-by: Ben Tebulin <tebulin@googlemail.com>
Build-testing-by: Stephen Rothwell <sfr@canb.auug.org.au>
Build-testing-by: Richard Weinberger <richard.weinberger@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d50235b7bc upstream.
There's a race between elevator switching and normal io operation.
Because the allocation of struct elevator_queue and struct elevator_data
don't in a atomic operation.So there are have chance to use NULL
->elevator_data.
For example:
Thread A: Thread B
blk_queu_bio elevator_switch
spin_lock_irq(q->queue_block) elevator_alloc
elv_merge elevator_init_fn
Because call elevator_alloc, it can't hold queue_lock and the
->elevator_data is NULL.So at the same time, threadA call elv_merge and
nedd some info of elevator_data.So the crash happened.
Move the elevator_alloc into func elevator_init_fn, it make the
operations in a atomic operation.
Using the follow method can easy reproduce this bug
1:dd if=/dev/sdb of=/dev/null
2:while true;do echo noop > scheduler;echo deadline > scheduler;done
The test method also use this method.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Jonghwan Choi <jhbird.choi@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dfa9771a7c upstream.
Fix inadvertent breakage in the clone syscall ABI for Microblaze that
was introduced in commit f3268edbe6 ("microblaze: switch to generic
fork/vfork/clone").
The Microblaze syscall ABI for clone takes the parent tid address in the
4th argument; the third argument slot is used for the stack size. The
incorrectly-used CLONE_BACKWARDS type assigned parent tid to the 3rd
slot.
This commit restores the original ABI so that existing userspace libc
code will work correctly.
All kernel versions from v3.8-rc1 were affected.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 786615bc1c upstream.
If rpcbind causes our connection to the AF_LOCAL socket to close after
we've registered a service, then we want to be careful about reconnecting
since the mount namespace may have changed.
By simply refusing to reconnect the AF_LOCAL socket in the case of
unregister, we avoid the need to somehow save the mount namespace. While
this may lead to some services not unregistering properly, it should
be safe.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Nix <nix@esperi.org.uk>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed5467da0e upstream.
tracing_read_pipe zeros all fields bellow "seq". The declaration contains
a comment about that, but it doesn't help.
The first field is "snapshot", it's true when current open file is
snapshot. Looks obvious, that it should not be zeroed.
The second field is "started". It was converted from cpumask_t to
cpumask_var_t (v2.6.28-4983-g4462344), in other words it was
converted from cpumask to pointer on cpumask.
Currently the reference on "started" memory is lost after the first read
from tracing_read_pipe and a proper object will never be freed.
The "started" is never dereferenced for trace_pipe, because trace_pipe
can't have the TRACE_FILE_ANNOTATE options.
Link: http://lkml.kernel.org/r/1375463803-3085183-1-git-send-email-avagin@openvz.org
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49ccc142f9 upstream.
regmap.h requires linux/err.h if CONFIG_REGMAP is not defined. Without it I get
error.
CC drivers/media/platform/exynos4-is/fimc-reg.o
In file included from drivers/media/platform/exynos4-is/fimc-reg.c:14:0:
include/linux/regmap.h: In function ‘regmap_write’:
include/linux/regmap.h:525:10: error: ‘EINVAL’ undeclared (first use in this function)
include/linux/regmap.h:525:10: note: each undeclared identifier is reported only once for each function it appears in
Signed-off-by: Mateusz Krawczuk <m.krawczuk@partner.samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 561bf15892 upstream
This patch moves ISCSI_OP_REJECT failures into iscsit_sequence_cmd()
in order to avoid external iscsit_reject_cmd() reject usage for all
PDU types.
It also updates PDU specific handlers for traditional iscsi-target
code to not reset the session after posting a ISCSI_OP_REJECT during
setup.
(v2: Fix CMDSN_LOWER_THAN_EXP for ISCSI_OP_SCSI to call
target_put_sess_cmd() after iscsit_sequence_cmd() failure)
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ba15991408 upstream
This patch changes iscsit_add_reject() + iscsit_add_reject_from_cmd()
usage to not sleep on iscsi_cmd->reject_comp to address a free-after-use
usage bug in v3.10 with iser-target code.
It saves ->reject_reason for use within iscsit_build_reject() so the
correct value for both transport cases. It also drops the legacy
fail_conn parameter usage throughput iscsi-target code and adds
two iscsit_add_reject_cmd() and iscsit_reject_cmd helper functions,
along with various small cleanups.
(v2: Re-enable target_put_sess_cmd() to be called from
iscsit_add_reject_from_cmd() for rejects invoked after
target_get_sess_cmd() has been called)
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0699a73af3 upstream.
Commit 18d627113b (firewire: prevent dropping of completed iso packet
header data) was intended to be an obvious bug fix, but libdc1394 and
FlyCap2 depend on the old behaviour by ignoring all returned information
and thus not noticing that not all packets have been received yet. The
result was that the video frame buffers would be saved before they
contained the correct data.
Reintroduce the old behaviour for old clients.
Tested-by: Stepan Salenikovich <stepan.salenikovich@gmail.com>
Tested-by: Josep Bosch <jep250@gmail.com>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2cb96494d upstream.
This patch addresses a bug where RDMA_CM_EVENT_DISCONNECTED may occur
before the connection shutdown has been completed by rx/tx threads,
that causes isert_free_conn() to wait indefinately on ->conn_wait.
This patch allows isert_disconnect_work code to invoke rdma_disconnect
when isert_disconnect_work() process context is started by client
session reset before isert_free_conn() code has been reached.
It also adds isert_conn->conn_mutex protection for ->state within
isert_disconnect_work(), isert_cq_comp_err() and isert_free_conn()
code, along with isert_check_state() for wait_event usage.
(v2: Add explicit iscsit_cause_connection_reinstatement call
during isert_disconnect_work() to force conn reset)
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d4b812dea4 ]
In commit 48cc32d38a
("vlan: don't deliver frames for unknown vlans to protocols")
Florian made sure we set pkt_type to PACKET_OTHERHOST
if the vlan id is set and we could find a vlan device for this
particular id.
But we also have a problem if prio bits are set.
Steinar reported an issue on a router receiving IPv6 frames with a
vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
because skb->vlan_tci is set.
Forwarded frame is completely corrupted : We can see (8100:4000)
being inserted in the middle of IPv6 source address :
16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
0x0000: 0000 0029 8000 c7c3 7103 0001 a0ae e651
0x0010: 0000 0000 ccce 0b00 0000 0000 1011 1213
0x0020: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
0x0030: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233
It seems we are not really ready to properly cope with this right now.
We can probably do better in future kernels :
vlan_get_ingress_priority() should be a netdev property instead of
a per vlan_dev one.
For stable kernels, lets clear vlan_tci to fix the bugs.
Reported-by: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cc229884d3 ]
This adds a way to check ring empty state after enable_cb outside any
locks. Will be used by virtio_net.
Note: there's room for more optimization: caller is likely to have a
memory barrier already, which means we might be able to get rid of a
barrier here. Deferring this optimization until we do some
benchmarking.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b1a5a34bd0 ]
Ver and type in pppoe_hdr should be swapped as defined by RFC2516
section-4.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e8d39240d6 upstream.
The function stub for cpufreq_cooling_get_level introduced
in 57df81069 "Thermal: exynos: fix cooling state translation"
is not syntactically correct C and needs to be fixed to avoid
this error:
In file included from drivers/thermal/db8500_thermal.c:20:0:
include/linux/cpu_cooling.h: In function 'cpufreq_cooling_get_level':
include/linux/cpu_cooling.h:57:1:
error: parameter name omitted unsigned long cpufreq_cooling_get_level(unsigned int, unsigned int) ^
include/linux/cpu_cooling.h:57:1: error: parameter name omitted
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Eduardo Valentin <eduardo.valentin@ti.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Amit Daniel kachhap <amit.daniel@samsung.com>
Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c297a6665 upstream.
Since the info_mask split, iio_channel_has_info() is not working correctly.
info_mask_separate and info_mask_shared_by_type, it is not possible to compare
them directly with the iio_chan_info_enum enum. Correct that bit using the BIT()
macro.
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c378f70adb upstream.
Currently, when a disconnect is requested by the user (via NBD_DISCONNECT
ioctl) the return from NBD_DO_IT is undefined (it is usually one of
several error codes). This means that nbd-client does not know if a
manual disconnect was performed or whether a network error occurred.
Because of this, nbd-client's persist mode (which tries to reconnect after
error, but not after manual disconnect) does not always work correctly.
This change fixes this by causing NBD_DO_IT to always return 0 if a user
requests a disconnect. This means that nbd-client can correctly either
persist the connection (if an error occurred) or disconnect (if the user
requested it).
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 14611e51a5 upstream.
task->cgroups is a RCU pointer pointing to struct css_set. A task
switches to a different css_set on cgroup migration but a css_set
doesn't change once created and its pointers to cgroup_subsys_states
aren't RCU protected.
task_subsys_state[_check]() is the macro to acquire css given a task
and subsys_id pair. It RCU-dereferences task->cgroups->subsys[] not
task->cgroups, so the RCU pointer task->cgroups ends up being
dereferenced without read_barrier_depends() after it. It's broken.
Fix it by introducing task_css_set[_check]() which does
RCU-dereference on task->cgroups. task_subsys_state[_check]() is
reimplemented to directly dereference ->subsys[] of the css_set
returned from task_css_set[_check]().
This removes some of sparse RCU warnings in cgroup.
v2: Fixed unbalanced parenthsis and there's no need to use
rcu_dereference_raw() when !CONFIG_PROVE_RCU. Both spotted by Li.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 13d60f4b6a upstream.
The futex_keys of process shared futexes are generated from the page
offset, the mapping host and the mapping index of the futex user space
address. This should result in an unique identifier for each futex.
Though this is not true when futexes are located in different subpages
of an hugepage. The reason is, that the mapping index for all those
futexes evaluates to the index of the base page of the hugetlbfs
mapping. So a futex at offset 0 of the hugepage mapping and another
one at offset PAGE_SIZE of the same hugepage mapping have identical
futex_keys. This happens because the futex code blindly uses
page->index.
Steps to reproduce the bug:
1. Map a file from hugetlbfs. Initialize pthread_mutex1 at offset 0
and pthread_mutex2 at offset PAGE_SIZE of the hugetlbfs
mapping.
The mutexes must be initialized as PTHREAD_PROCESS_SHARED because
PTHREAD_PROCESS_PRIVATE mutexes are not affected by this issue as
their keys solely depend on the user space address.
2. Lock mutex1 and mutex2
3. Create thread1 and in the thread function lock mutex1, which
results in thread1 blocking on the locked mutex1.
4. Create thread2 and in the thread function lock mutex2, which
results in thread2 blocking on the locked mutex2.
5. Unlock mutex2. Despite the fact that mutex2 got unlocked, thread2
still blocks on mutex2 because the futex_key points to mutex1.
To solve this issue we need to take the normal page index of the page
which contains the futex into account, if the futex is in an hugetlbfs
mapping. In other words, we calculate the normal page mapping index of
the subpage in the hugetlbfs mapping.
Mappings which are not based on hugetlbfs are not affected and still
use page->index.
Thanks to Mel Gorman who provided a patch for adding proper evaluation
functions to the hugetlbfs code to avoid exposing hugetlbfs specific
details to the futex code.
[ tglx: Massaged changelog ]
Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
Reviewed-by: 'Mel Gorman' <mgorman@suse.de>
Acked-by: 'Darren Hart' <dvhart@linux.intel.com>
Cc: 'Peter Zijlstra' <peterz@infradead.org>
Link: http://lkml.kernel.org/r/000101ce71a6%24a83c5880%24f8b50980%24@com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull networking fixes from David Miller:
1) Found via trinity:
If you connect up an ipv6 socket to an ipv4 mapped address then an
ipv6 one, sendmsg() can croak because ip6_sk_dst_check() assumes the
route cached in the socket is an ipv6 one. In this case there is an
ipv4 route attached, so it gets stomped on.
Reported by Dave Jones and Hannes Frederic Sowa, fixed by Eric
Dumazet.
2) AF_KEY notifications leak some kernel memory to userspace, fix from
Mathias Krause.
3) DLCI calls __dev_get_by_name() without proper locking, and dlci_del
doesn't validate that the device being deleted is actually a DLCI
one. Fixes from Li Zefan.
4) Length check on bluetooth l2cap information responses is wrong, each
response type has a different lenth, so we should make sure it's in
a given range rather than enforce one single valid length. From
Jaganath Kanakkassery.
5) Receive FIFO overflow is really easy to trigger in stress scenerios
in the sh_eth driver, but the event isn't being handled properly at
all. Specifically, the mask of error interrupts doesn't include the
event so we never clear it, resulting in the driver becomming wedged
processing an interrupt that never gets cleared.
Fix from Sergei Shtylyov.
6) qlcnic sleeps while holding a spinlock, use mdelay() instead of
msleep(). From Shahed Shaikh.
7) Missing curly braces causes SIP netfilter NAT module to always drop
packets. Fix from Balazs Peter Odor.
8) ipt_ULOG in netfilter passes the wrong value to timer setup, causing
the timer to dereference crap when it fires. Fix from Gao Feng.
9) Missing RCU protection around txq->axq_acq traversal in
ath_txq_schedule(). Fix from Felix Fietkau.
10) Idle state transition test in ath9k_htc_config() is reversed, fix
from Sujith Manoharan.
11) IPV6 forwarding handles unicast Router Alert packets incorrectly.
It tests the wrong option state. Previously opt->ra being non-zero
indicated a router alert marking in the SKB, but now it's indicated
by a bit in opt->flags. Fix from YOSHIFUJI Hideaki.
12) SKB leak in GRE tunnel GSO handling, from Eric Dumazet.
13) get_user_pages_fast() error handling in TUN and MACVTAP use the same
local variable for the base index and the loop iterator for page
traversal, oops! Fix from Michael S Tsirkin.
14) ipv6_get_lladdr() can fail, and we must therefore check it's return
value in inet6_set_iftoken(). For from Hannes Frederic Sowa.
15) If you change an interface name and meanwhile can sneak in something
that looks up the name (like SO_BINDTODEVICE or SIOCGIFNAME) we can
deadlock with CONFIG_PREEMPT=n. Fix this by providing a helper
function that properly uses raw_seqcount_begin(). From Nicolas
Schichan.
16) Chain noise calibration test is inverted in iwlwifi, fix from
Nikolay Martynov.
17) Properly set TX iwlwifi descriptor flags for back requests. Fix
from Emmanuel Grumbach.
18) We can't assume skb_transport_header() is set in xt_TCPOPTSTRAP
module, fix from Pablo Neira Ayuso.
19) Some crummy APs don't provide the proper High Throughput info in
association response frames. Add a workaround by assume we'll use
whatever is in the beacon/probe. Fix from Johannes Berg.
20) mac80211 call to rate_idx_match_mask() swaps two arguments (mask and
channel width). Fix from Simon Wunderlich.
21) xt_TCPMSS (like xt_TCPOPTSTRAP) must not try to handle fragmented
frames. Fix from Phil Oester.
22) Fix rate control regression causing iwlwifi/iwlegacy chips to use
1Mbit/s on pre-11n networks. From Moshe Benji and Stanslaw Gruszka.
23) Disable brcmsmac power-save functions, they cause regressions. From
Arend van Spriel.
24) Enforce a sane minimum MTU in l2cap_build_cmd() otherwise we can
easily crash. Fix from Anderson Lizardo.
25) If a learning packet arrives during vxlan_stop() we crash, easily
fixed by checking netif_running(). From Stephen Hemminger.
26) Static vxlan FDB entries should not be migrated, also from Stephen.
27) skb_clone() failures not handled in vxlan_xmit(), oops. Also from
Stephen.
28) Add minimal driver for AR816x/AR817x ethernet chips, from Johannes
Berg.
29) Fix regression in userspace VLAN acceleration control, added by the
802.1ad support changes. Fix from Fernando Luis Vazquez Cao.
30) Interval selection for MLD queries in the bridging code was
reversed. Fix from Linus Lüssing.
31) ipv6's ndisc_send_redirect() erroneously writes to the packet we
received not the packet we are building to send out. Fix from
Matthias Schiffer.
32) Don't free netdev before unregistering it, in usb_8dev can driver.
From Marc Kleine-Budde.
33) Fix nl80211 attribute buffer races, from Johannes Berg.
34) Although netlink_diag.h is under uapi/ it isn't present in Kbuild.
From Stephen Hemminger.
35) Wrong address and family passed to MD5 key lookups in TCP, from
Aydin Arik.
36) phy_type attribute created by SFC driver should not be writable.
From Ben Hutchings.
37) Receive/Transmit queue allocations in pxa168_eth and mv643xx_eth
should use kzalloc(). Otherwise if setup fails half-way, we'll
dereference garbage when trying to teardown the rings. From Lubomir
Rintel.
38) Fix double-allocation of dst (resulting in unfreeable net device) in
ipv6's init_loopback(). From Gao Feng.
39) Fix fragmentation handling SKB leak in netfilter conntrack, we were
freeing the wrong skb pointer. From Phil Oester.
40) Don't report "-1" (SPEED_UNKNOWN) in bond_miimon_commit(), from
Nikolay Aleksandrov.
41) davinci_cpdma doesn't check for DMA mapping errors, letting the
device scribble to random addresses. From Sebastian Siewior.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
dlci: validate the net device in dlci_del()
dlci: acquire rtnl_lock before calling __dev_get_by_name()
af_key: fix info leaks in notify messages
ipv6: ip6_sk_dst_check() must not assume ipv6 dst
net: fix kernel deadlock with interface rename and netdev name retrieval.
net/tg3: Avoid delay during MMIO access
ipv6: check return value of ipv6_get_lladdr
macvtap: fix recovery from gup errors
tun: fix recovery from gup errors
gre: fix a possible skb leak
ipv6: Process unicast packet with Router Alert by checking flag in skb.
ath9k_htc: Handle IDLE state transition properly
ath9k: fix an RCU issue in calling ieee80211_get_tx_rates
netfilter: ipt_ULOG: fix incorrect setting of ulog timer
netfilter: ctnetlink: send event when conntrack label was modified
netfilter: nf_nat_sip: fix mangling
qlcnic: Do not sleep while holding spinlock
drivers: net: cpsw: fix compilation error with cpsw driver
tcp: doc : fix the syncookies default value
sh_eth: fix misreporting of transmit abort
...
When the kernel (compiled with CONFIG_PREEMPT=n) is performing the
rename of a network interface, it can end up waiting for a workqueue
to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a
SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to
the fact that read_secklock_begin() will spin forever waiting for the
writer process (the one doing the interface rename) to update the
devnet_rename_seq sequence.
This patch fixes the problem by adding a helper (netdev_get_name())
and using it in the code handling the SIOCGIFNAME ioctl and
SO_BINDTODEVICE setsockopt.
The netdev_get_name() helper uses raw_seqcount_begin() to avoid
spinning forever, waiting for devnet_rename_seq->sequence to become
even. cond_resched() is used in the contended case, before retrying
the access to give the writer process a chance to finish.
The use of raw_seqcount_begin() will incur some unneeded work in the
reader process in the contended case, but this is better than
deadlocking the system.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 68c3316311 ("v4 GRE: Add TCP segmentation offload for GRE")
added a possible skb leak, because it frees only the head of segment
list, in case a skb_linearize() call fails.
This patch adds a kfree_skb_list() helper to fix the bug.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The interactions between the ACPI dock driver and the ACPI-based PCI
hotplug (acpiphp) are currently problematic because of ordering
issues during hot-remove operations.
First of all, the current ACPI glue code expects that physical
devices will always be deleted before deleting the companion ACPI
device objects. Otherwise, acpi_unbind_one() will fail with a
warning message printed to the kernel log, for example:
[ 185.026073] usb usb5: Oops, 'acpi_handle' corrupt
[ 185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt
[ 185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt
[ 180.013656] port1: Oops, 'acpi_handle' corrupt
This means, in particular, that struct pci_dev objects have to
be deleted before the struct acpi_device objects they are "glued"
with.
Now, the following happens the during the undocking of an ACPI-based
dock station:
1) hotplug_dock_devices() invokes registered hotplug callbacks to
destroy physical devices associated with the ACPI device objects
depending on the dock station. It calls dd->ops->handler() for
each of those device objects.
2) For PCI devices dd->ops->handler() points to
handle_hotplug_event_func() that queues up a separate work item
to execute _handle_hotplug_event_func() for the given device and
returns immediately. That work item will be executed later.
3) hotplug_dock_devices() calls dock_remove_acpi_device() for each
device depending on the dock station. This runs acpi_bus_trim()
for each of them, which causes the underlying ACPI device object
to be destroyed, but the work items queued up by
handle_hotplug_event_func() haven't been started yet.
4) _handle_hotplug_event_func() queued up in step 2) are executed
and cause the above failure to happen, because the PCI devices
they handle do not have the companion ACPI device objects any
more (those objects have been deleted in step 3).
The possible breakage doesn't end here, though, because
hotplug_dock_devices() may return before at least some of the
_handle_hotplug_event_func() work items spawned by it have a
chance to complete and then undock() will cause _DCK to be
evaluated and that will cause the devices handled by the
_handle_hotplug_event_func() to go away possibly while they are
being accessed.
This means that dd->ops->handler() for PCI devices should not point
to handle_hotplug_event_func(). Instead, it should point to a
function that will do the work of _handle_hotplug_event_func()
synchronously. For this reason, introduce such a function,
hotplug_event_func(), and modity acpiphp_dock_ops to point to
it as the handler.
Unfortunately, however, this is not sufficient, because if the dock
code were not changed further, hotplug_event_func() would now
deadlock with hotplug_dock_devices() that called it, since it would
run unregister_hotplug_dock_device() which in turn would attempt to
acquire the dock station's hp_lock mutex already acquired by
hotplug_dock_devices().
To resolve that deadlock use the observation that
unregister_hotplug_dock_device() won't need to acquire hp_lock
if PCI bridges the devices on the dock station depend on are
prevented from being removed prematurely while the first loop in
hotplug_dock_devices() is in progress.
To make that possible, introduce a mechanism by which the callers of
register_hotplug_dock_device() can provide "init" and "release"
routines that will be executed, respectively, during the addition
and removal of the physical device object associated with the
given ACPI device handle. Make acpiphp use two new functions,
acpiphp_dock_init() and acpiphp_dock_release(), that call
get_bridge() and put_bridge(), respectively, on the acpiphp bridge
holding the given device, for this purpose.
In addition to that, remove the dock station's list of
"hotplug devices" and make the dock code always walk the whole list
of "dependent devices" instead in such a way that the loops in
hotplug_dock_devices() and dock_event() (replacing the loops over
"hotplug devices") will take references to the list entries that
register_hotplug_dock_device() has been called for. That prevents
the "release" routines associated with those entries from being
called while the given entry is being processed and for PCI
devices this means that their bridges won't be removed (by a
concurrent thread) while hotplug_event_func() handling them is
being executed.
This change is based on two earlier patches from Jiang Liu.
References: https://bugzilla.kernel.org/show_bug.cgi?id=59501
Reported-and-tested-by: Alexander E. Patrakov <patrakov@gmail.com>
Tracked-down-by: Jiang Liu <jiang.liu@huawei.com>
Tested-by: Illya Klymov <xanf@xanf.me>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: 3.9+ <stable@vger.kernel.org>
Pull vfs fixes from Al Viro:
"Several fixes for bugs caught while looking through f_pos (ab)users"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aout32 coredump compat fix
splice: don't pass the address of ->f_pos to methods
mconsole: we'd better initialize pos before passing it to vfs_read()...
Pull ACPI fixes from Rafael Wysocki:
- Fix for a regression causing a failure to turn on some devices on
some systems during initialization introduced by a recent revert of
an ACPI PM change that broke something else. Fortunately, we know
exactly what devices are affected, so we can add a fix just for them
leaving everyone else alone.
- ACPI power resources initialization fix preventing a NULL pointer
from being dereferenced in the acpi_add_power_resource() error code
path.
- ACPI dock station driver fix that adds missing locking to
write_undock().
- ACPI resources allocation fix changing the scope of an old workaround
so that it doesn't affect systems that aren't actually buggy. This
was reported a couple of days ago to fix DMA problems on some new
platforms so we need it in -stable. From Mika Westerberg.
* tag 'acpi-3.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / LPSS: Power up LPSS devices during enumeration
ACPI / PM: Fix error code path for power resources initialization
ACPI / dock: Take ACPI scan lock in write_undock()
ACPI / resources: call acpi_get_override_irq() only for legacy IRQ resources
Pull scheduler fixes from Ingo Molnar:
"Two smaller fixes - plus a context tracking tracing fix that is a bit
bigger"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracing/context-tracking: Add preempt_schedule_context() for tracing
sched: Fix clear NOHZ_BALANCE_KICK
sched/x86: Construct all sibling maps if smt
Pull perf fixes from Ingo Molnar:
"Four fixes. The mmap ones are unfortunately larger than desired -
fuzzing uncovered bugs that needed perf context life time management
changes to fix properly"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
perf: Fix mmap() accounting hole
perf: Fix perf mmap bugs
kprobes: Fix to free gone and unused optprobes
Pull timer fixes from Thomas Gleixner:
- Fix inconstinant clock usage in virtual time accounting
- Fix a build error in KVM caused by the NOHZ work
- Remove a pointless timekeeping duty assignment which breaks NOHZ
- Use a proper notifier return value to avoid random behaviour
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick: Remove useless timekeeping duty attribution to broadcast source
nohz: Fix notifier return val that enforce timekeeping
kvm: Move guest entry/exit APIs to context_tracking
vtime: Use consistent clocks among nohz accounting
After addition of 8021AD h_vlan_proto can be either ETH_P_8021Q or
ETH_P_8021AD.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netlink_diag.h is in include/uapi/linux but not in the Kbuild necessary
to cause it to be exported by make headers_install.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7cd8407 (ACPI / PM: Do not execute _PS0 for devices without
_PSC during initialization) introduced a regression on some systems
with Intel Lynxpoint Low-Power Subsystem (LPSS) where some devices
need to be powered up during initialization, but their device objects
in the ACPI namespace have _PS0 and _PS3 only (without _PSC or power
resources).
To work around this problem, make the ACPI LPSS driver power up
devices it knows about by using a new helper function
acpi_device_fix_up_power() that does all of the necessary
sanity checks and calls acpi_dev_pm_explicit_set() to put the
device into D0.
Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Dave Jones hit the following bug report:
===============================
[ INFO: suspicious RCU usage. ]
3.10.0-rc2+ #1 Not tainted
-------------------------------
include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
other info that might help us debug this:
RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
2 locks held by cc1/63645:
#0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
#1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0
CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
Call Trace:
[<ffffffff816ae383>] dump_stack+0x19/0x1b
[<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
[<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
[<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
[<ffffffff8108dffc>] update_curr+0xec/0x240
[<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
[<ffffffff816b3a71>] __schedule+0x161/0x9b0
[<ffffffff816b4721>] preempt_schedule+0x51/0x80
[<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
[<ffffffff816b6824>] ? retint_careful+0x12/0x2e
[<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
[<ffffffff816be280>] ftrace_call+0x5/0x2f
[<ffffffff816b681d>] ? retint_careful+0xb/0x2e
[<ffffffff816b4805>] ? schedule_user+0x5/0x70
[<ffffffff816b4805>] ? schedule_user+0x5/0x70
[<ffffffff816b6824>] ? retint_careful+0x12/0x2e
------------[ cut here ]------------
What happened was that the function tracer traced the schedule_user() code
that tells RCU that the system is coming back from userspace, and to
add the CPU back to the RCU monitoring.
Because the function tracer does a preempt_disable/enable_notrace() calls
the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
then preempt_schedule() is called. But this is called before the user_exit()
function can inform the kernel that the CPU is no longer in user mode and
needs to be accounted for by RCU.
The fix is to create a new preempt_schedule_context() that checks if
the kernel is still in user mode and if so to switch it to kernel mode
before calling schedule. It also switches back to user mode coming back
from schedule in need be.
The only user of this currently is the preempt_enable_notrace(), which is
only used by the tracing subsystem.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull media fixes from Mauro Carvalho Chehab:
"Series of fixes for 3.10. There are some usual driver fixes (mostly
on s5p/exynos playform drivers), plus some fixes at V4L2 core"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (40 commits)
[media] soc_camera: error dev remove and v4l2 call
[media] sh_veu: fix the buffer size calculation
[media] sh_veu: keep power supply until the m2m context is released
[media] sh_veu: invoke v4l2_m2m_job_finish() even if a job has been aborted
[media] v4l2-ioctl: don't print the clips list
[media] v4l2-ctrls: V4L2_CTRL_CLASS_FM_RX controls are also valid radio controls
[media] cx88: fix NULL pointer dereference
[media] DocBook/media/v4l: update version number
[media] exynos4-is: Remove "sysreg" clock handling
[media] exynos4-is: Fix reported colorspace at FIMC-IS-ISP subdev
[media] exynos4-is: Ensure fimc-is clocks are not enabled until properly configured
[media] exynos4-is: Prevent NULL pointer dereference when firmware isn't loaded
[media] s5p-mfc: Add NULL check for allocated buffer
[media] s5p-mfc: added missing end-of-lines in debug messages
[media] s5p-mfc: v4l2 controls setup routine moved to initialization code
[media] s5p-mfc: separate encoder parameters for h264 and mpeg4
[media] s5p-mfc: Remove special clock usage in driver
[media] s5p-mfc: Remove unused s5p_mfc_get_decoded_status_v6() function
[media] v4l2: mem2mem: save irq flags correctly
[media] coda: v4l2-compliance fix: add VIDIOC_CREATE_BUFS support
...
Pull networking fixes from David Miller:
1) Fix RTNL locking in batman-adv, from Matthias Schiffer.
2) Don't allow non-passthrough macvlan devices to set NOPROMISC via
netlink, otherwise we can end up with corrupted promisc counter
values on the device. From Michael S Tsirkin.
3) Fix stmmac driver build with debugging defines enabled, from Dinh
Nguyen.
4) Make sure name string we give in socket address in AF_PACKET is NULL
terminated, from Daniel Borkmann.
5) Fix leaking of two uninitialized bytes of memory to userspace in
l2tp, from Guillaume Nault.
6) Clear IPCB(skb) before tunneling otherwise we touch dangling IP
options state and crash. From Saurabh Mohan.
7) Fix suspend/resume for davinci_mdio by using suspend_late and
resume_early. From Mugunthan V N.
8) Don't tag ip_tunnel_init_net and ip_tunnel_delete_net with
__net_{init,exit}, they can be called outside of those contexts.
From Eric Dumazet.
9) Fix RX length error in sh_eth driver, from Yoshihiro Shimoda.
10) Fix missing sctp_outq initialization in some code paths of SCTP
stack, from Neil Horman.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
sctp: fully initialize sctp_outq in sctp_outq_init
netiucv: Hold rtnl between name allocation and device registration.
tulip: Properly check dma mapping result
net: sh_eth: fix incorrect RX length error if R8A7740
ip_tunnel: remove __net_init/exit from exported functions
drivers: net: davinci_mdio: restore mdio clk divider in mdio resume
drivers: net: davinci_mdio: moving mdio resume earlier than cpsw ethernet driver
net/ipv4: ip_vti clear skb cb before tunneling.
tg3: Wait for boot code to finish after power on
l2tp: Fix sendmsg() return value
l2tp: Fix PPP header erasure and memory leak
bonding: fix igmp_retrans type and two related races
bonding: reset master mac on first enslave failure
packet: packet_getname_spkt: make sure string is always 0-terminated
net: ethernet: stmicro: stmmac: Fix compile error when STMMAC_XMIT_DEBUG used
be2net: Fix 32-bit DMA Mask handling
xen-netback: don't de-reference vif pointer after having called xenvif_put()
macvlan: don't touch promisc without passthrough
batman-adv: Don't handle address updates when bla is disabled
batman-adv: forward late OGMs from best next hop
...
Thanks to commit f91eb62f71 ("init: scream bloody murder if interrupts
are enabled too early"), "bloody murder" is now being screamed.
With a MIPS OCTEON config, we use on_each_cpu() in our
irq_chip.irq_bus_sync_unlock() function. This gets called in early as a
result of the time_init() call. Because the !SMP version of
on_each_cpu() unconditionally enables irqs, we get:
WARNING: at init/main.c:560 start_kernel+0x250/0x410()
Interrupts were enabled early
CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801
Call Trace:
show_stack+0x68/0x80
warn_slowpath_common+0x78/0xb0
warn_slowpath_fmt+0x38/0x48
start_kernel+0x250/0x410
Suggested fix: Do what we already do in the SMP version of
on_each_cpu(), and use local_irq_save/local_irq_restore. Because we
need a flags variable, make it a static inline to avoid name space
issues.
[ Change from v1: Convert on_each_cpu to a static inline function, add
#include <linux/irqflags.h> to avoid build breakage on some files.
on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as
on_each_cpu(), but they are not causing !SMP bugs for me, so I will
defer changing them to a less urgent patch. ]
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull RCU fixes from Paul McKenney:
"I must confess that this past merge window was not RCU's best showing.
This series contains three more fixes for RCU regressions:
1. A fix to __DECLARE_TRACE_RCU() that causes it to act as an
interrupt from idle rather than as a task switch from idle.
This change is needed due to the recent use of _rcuidle()
tracepoints that can be invoked from interrupt handlers as well
as from idle. Without this fix, invoking _rcuidle() tracepoints
from interrupt handlers results in splats and (more seriously)
confusion on RCU's part as to whether a given CPU is idle or not.
This confusion can in turn result in too-short grace periods and
therefore random memory corruption.
2. A fix to a subtle deadlock that could result due to RCU doing
a wakeup while holding one of its rcu_node structure's locks.
Although the probability of occurrence is low, it really
does happen. The fix, courtesy of Steven Rostedt, uses
irq_work_queue() to avoid the deadlock.
3. A fix to a silent deadlock (invisible to lockdep) due to the
interaction of timeouts posted by RCU debug code enabled by
CONFIG_PROVE_RCU_DELAY=y, grace-period initialization, and CPU
hotplug operations. This will not occur in production kernels,
but really does occur in randconfig testing. Diagnosis courtesy
of Steven Rostedt"
* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration
rcu: Don't call wakeup() with rcu_node structure ->lock held
trace: Allow idle-safe tracepoints to be called from irq
Pull ASoC sound updates from Mark Brown:
"Takashi is travelling at the minute and it'd be good to get the
MAINTAINERS update in here merged so sending directly.
As well as the usual driver specifics we've got a couple of core fixes
here, one fixing capabilities for unidirectional streams and the other
fixing suspend while audio streams are active.
The suspend fix is a little involved but mostly as a result of
removing some special casing that was doing the wrong thing."
* tag 'asoc-v3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound:
ASoC: tlv320aic3x: Remove deadlock from snd_soc_dapm_put_volsw_aic3x()
ASoC: dapm: Treat DAI widgets like AIF widgets for power
ASoC: arizona: Correct AEC loopback enable
ASoC: pcm: Require both CODEC and CPU support when declaring stream caps
MAINTAINERS: Remove myself from Wolfson maintainers
ASoC: wm8994: Ensure microphone detection state is reset on removal
ASoC: wm8994: Avoid leaking pm_runtime reference on removed jack race
ASoC: cs42l52: fix hp_gain_enum shift value.
ASoC: cs42l52: use correct PCM mixer TLV dB scale to match datasheet.