Commit Graph

588765 Commits

Author SHA1 Message Date
Wei Yang
7b77061f8d PCI: Add pcibios_bus_add_device() weak function
This adds weak function pcibios_bus_add_device() for arch dependent
code could do proper setup. For example, powerpc could setup EEH
related resources for SRIOV VFs.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09 09:58:14 +11:00
Wei Yang
c194f7ea7f PCI/IOV: Rename and export virtfn_{add, remove}
During EEH recovery, hotplug is applied to the devices which don't
have drivers or their drivers don't support EEH. However, the hotplug,
which was implemented based on PCI bus, can't be applied to VF directly.
Instead, we unplug and plug individual PCI devices (VFs).

This renames virtn_{add,remove}() and exports them so they can be used
in PCI hotplug during EEH recovery.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09 09:58:13 +11:00
Gavin Shan
4eb0799ff9 powerpc/eeh: Reworked eeh_pe_bus_get()
The original implementation is ugly: unnecessary if statements and
"out" tag. This reworks the function to avoid above weaknesses. No
functional changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09 09:57:46 +11:00
Javier Martinez Canillas
ccb361d2fd thermal: exynos: Defer probe if vtmu is present but not registered
The driver doesn't check if the regulator_get_optional return value is
-EPROBE_DEFER so it will wrongly assume that the regulator couldn't be
found just because the regulator driver wasn't registered yet, i.e:

exynos-tmu 10060000.tmu: Regulator node (vtmu) not found

In this case the return value should be propagated to allow the driver
probe function to be deferred until the regulator driver is registered.

Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Reviewed-by: Andi Shyti <andi.shyti@samsung.com>
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-03-08 14:22:25 -08:00
Javier Martinez Canillas
4d3583cd1c thermal: exynos: Use devm_regulator_get_optional() for vtmu
The Exynos TMU DT binding says that the vtmu-supply is optional but the
driver uses devm_regulator_get() that creates a dummy regulator if it's
not defined in the DT. For example the following message is in the log:

10060000.tmu supply vtmu not found, using dummy regulator

Use the optional version of regulator_get() that doesn't create a dummy
regulator and instead returns a -ENODEV errno code. Since it's expected
that a regulator may not be defined and the driver will inform about it:

exynos-tmu 10060000.tmu: Regulator node (vtmu) not found

Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Reviewed-by: Andi Shyti <andi.shyti@samsung.com>
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-03-08 14:21:41 -08:00
Javier Martinez Canillas
7bc40ddfe8 thermal: exynos: List vtmu-supply as optional property in DT binding
The Exynos Thermal Management Unit binding says that the vtmu-supply
is optional but is listed in the required properties section. Add an
optional properties section and move the regulator property there.

Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Reviewed-by: Andi Shyti <andi.shyti@samsung.com>
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-03-08 14:20:21 -08:00
Krzysztof Kozlowski
3a3a5f1586 thermal: exynos: Print a message about exceeded number of supported trip-points
When DeviveTree contains more trip-points than SoC can configure
(usually more than four) and polling mode is not enabled, then the
remaining trip-points will be silently ignored. No interrupts will be
generated for them.

This might be quite dangerous when one provides DTB with a
non-configurable critical trip-point, like (assuming four supported
thresholds in TMU):
 - alert @50 C (type: active),
 - alert @60 C (type: active),
 - alert @70 C (type: active),
 - alert @80 C (type: active),
 - critical @120 C (type: critical) <- no interrupts generated.

This is a mistake in DTB so print a message in such case.

Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-03-08 14:07:32 -08:00
Don Brace
516fdcea0d cciss: update MAINTAINERS
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-08 15:06:37 -07:00
Krzysztof Kozlowski
fa7b29e8bf thermal: exynos: Document number of supported trip-points
Document the number of configurable temperature thresholds (for
trip-points in interrupt-driven mode).

Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-03-08 14:06:33 -08:00
Krzysztof Kozlowski
a41e939b27 thermal: exynos: Document compatible for Exynos5433 TMU
Commit 488c7455d7 ("thermal: exynos: Add the support for Exynos5433
TMU") added new compatible but forgot to update documentation.

Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-03-08 14:04:21 -08:00
Jon Derrick
48c7823f42 NVMe: Remove unused sq_head read in completion path
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-08 15:01:06 -07:00
Bob Moore
0dda8851a8 ACPICA: Revert "Parser: Fix for SuperName method invocation"
ACPICA commit eade8f78f2aa21e8eabc3380a5728db47273bcf1

Revert commit ae90fbf562 (ACPICA: Parser: Fix for SuperName method
invocation).

Support for method invocations as part of super_name will be
removed from the ACPI specification, since no AML interpreter
supports it.

Fixes: ae90fbf562 (ACPICA: Parser: Fix for SuperName method invocation)
Link: https://github.com/acpica/acpica/commit/eade8f78
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-08 22:58:38 +01:00
Bob Moore
1e059e20ac ACPICA: Utilities: Update trace mechinism for acquire_object
ACPICA commit 0824ab90e03c2e4239e890615f447e7962b1daa2

Was not using the correct macro. Updated a comment in
acoutput.h

Link: https://github.com/acpica/acpica/commit/0824ab90
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-08 22:56:21 +01:00
Jean Delvare
8e47e15e91 PCI/AER: Log aer_inject error injections
Log successful error injections so that injected errors can be
differentiated from real errors.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Borislav Petkov <bp@suse.de>
2016-03-08 15:52:13 -06:00
Jean Delvare
96b45ea5dc PCI/AER: Log actual error causes in aer_inject
The aer_inject driver is very quiet.  In most cases, it merely returns an
error code to user-space, leaving the user with little clue about the
actual reason for the failure.

So, log error messages for 4 of the most frequent causes of failure:
* Can't find the root port of the specified device.
* Device doesn't support AER.
* Root port doesn't support AER.
* AER device not found.

This gives the user a chance to understand why aer-inject failed.

Based on a preliminary patch by Thomas Renninger.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Borislav Petkov <bp@suse.de>
CC: Thomas Renninger <trenn@suse.de>
2016-03-08 15:51:25 -06:00
Jean Delvare
3bc1185141 PCI/AER: Use dev_warn() in aer_inject
dev_warn() is better than printk(LOG_WARNING...) as it records which device
the message relates to.  Also add a prefix "aer_inject:" to help
differentiate real errors from injected errors.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Borislav Petkov <bp@suse.de>
2016-03-08 15:50:42 -06:00
Jean Delvare
20ac75e563 PCI/AER: Fix aer_inject error codes
EPERM means "Operation not permitted", which doesn't reflect the lack of
support for AER.  EPROTONOSUPPORT (Protocol not supported) is a better
choice of error code if the device or its root port lack support for AER.

Likewise, EINVAL means "Invalid argument", which is not suitable for cases
where the AER error device is missing or unusable.  ENODEV and
EPROTONOSUPPORT, respectively, fit better.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Borislav Petkov <bp@suse.de>
CC: Prarit Bhargava <prarit@redhat.com>
2016-03-08 15:48:56 -06:00
Thierry Reding
e32faa303f PCI: tegra: Remove misleading PHYS_OFFSET
BARs are disabled when the size register is 0, so it's misleading to write
a base address into the start register.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-08 15:42:56 -06:00
Thierry Reding
56e75e2a15 PCI: tegra: Track bus -> CPU mapping
Track the offsets of the bus -> CPU mapping for I/O and memory.  This is
cosmetic for current Tegra chips because the offset is always 0.  But to
properly support legacy use-cases, like VGA, this would be needed so that
PCI bus addresses can be relocated.

While at it, also request the I/O resource both in physical memory and I/O
space to make /proc/iomem consistent, as well as add the I/O region to the
list of host bridge resources.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-08 15:42:07 -06:00
Thierry Reding
8493a95243 PCI: tegra: Remove unused struct tegra_pcie.num_ports field
The num_ports field of the tegra_pcie structure is never used so remove it.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-08 15:41:38 -06:00
Thierry Reding
b4d18d71ec PCI: tegra: Implement ->{add,remove}_bus() callbacks
The configuration space mapping on Tegra is somewhat special, and in order
to avoid wasting virtual address space the configuration space for each bus
needs to be stitched together from several blocks which form a single
continuous virtual address range for accessors.

Currently the configuration space is mapped upon the first access to one of
its registers.  However, the mapping operation may sleep under certain
circumstances, so doing it from the configuration space accessors (they are
protected by a spin lock) will trigger a warning.

To avoid the warning, use the ->add_bus() callback to perform the mapping
at enumeration time when the operation is allowed to sleep.  Also add an
implementation of ->remove_bus() that undoes the mapping established by the
->add_bus() callback.  While it isn't currently possible to unload the
module, there is work underway to remedy this, and this code will come in
handy when that happens.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-08 15:40:37 -06:00
Thierry Reding
057bd2e052 PCI: Add pci_ops.{add,remove}_bus() callbacks
Add pci_ops.{add,remove}_bus() callbacks, which will be called on every
newly created bus and when a bus is being removed, respectively.  This can
be used by drivers to implement driver-specific initialization and teardown
of the bus, in addition to the architecture-specifics implemented by the
pcibios_add_bus() and the pcibios_remove_bus() functions.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-08 15:40:37 -06:00
Eduardo Valentin
74e5053cd7 thermal: mtk: allow compile testing on UM
Following the fix on thermal Kconfig, this
patch adds dependency on HAS_IOMEM so driver
properly compile test on UM arch.

Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mediatek@lists.infradead.org
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-03-08 13:34:25 -08:00
Dave Chinner
ab9d1e4f7b Merge branch 'xfs-misc-fixes-4.6-3' into for-next 2016-03-09 08:18:30 +11:00
Luis de Bethencourt
a5fd276bdc xfs: remove impossible condition
bp_release is set to 0 just before the breakpoint of the for loop before
the conditional check (in line 458). The other breakpoint is a goto that
skips the dead code.

Addresses-Coverity-Id: 102338

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-03-09 08:17:56 +11:00
Darrick J. Wong
30cbc591c3 xfs: check sizes of XFS on-disk structures at compile time
Check the sizes of XFS on-disk structures when compiling the kernel.
Use this to catch inadvertent changes in structure size due to padding
and alignment issues, etc.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-03-09 08:15:14 +11:00
Bjorn Helgaas
e7e127e3c7 PCI: Include pci/hotplug Kconfig directly from pci/Kconfig
Include pci/hotplug/Kconfig directly from pci/Kconfig, so arches don't
have to source both pci/Kconfig and pci/hotplug/Kconfig.

Note that this effectively adds pci/hotplug/Kconfig to the following
arches, because they already sourced drivers/pci/Kconfig but they
previously did not source drivers/pci/hotplug/Kconfig:

  alpha
  arm
  avr32
  frv
  m68k
  microblaze
  mn10300
  sparc
  unicore32

Inspired-by-patch-from: Bogicevic Sasa <brutallesale@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-08 15:10:48 -06:00
Wei Ni
5e25166f32 thermal: tegra_soctherm: fix sign bit of temperature
The sign bit of temperature readback is bit 0, not bit 1.
Change to BIT(0) to fix it.

Signed-off-by: Wei Ni <wni@nvidia.com>
Reviewed-by: Matt Longnecker <mlongnecker@nvidia.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-03-08 12:41:40 -08:00
Bogicevic Sasa
5f8fc43217 PCI: Include pci/pcie/Kconfig directly from pci/Kconfig
Include pci/pcie/Kconfig directly from pci/Kconfig, so arches don't
have to source both pci/Kconfig and pci/pcie/Kconfig.

Note that this effectively adds pci/pcie/Kconfig to the following
arches, because they already sourced drivers/pci/Kconfig but they
previously did not source drivers/pci/pcie/Kconfig:

  alpha
  avr32
  blackfin
  frv
  m32r
  m68k
  microblaze
  mn10300
  parisc
  sparc
  unicore32
  xtensa

[bhelgaas: changelog, source pci/pcie/Kconfig at top of pci/Kconfig, whitespace]
Signed-off-by: Sasa Bogicevic <brutallesale@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-08 14:36:48 -06:00
David S. Miller
f14b488d50 Merge branch 'bpf-map-prealloc'
Alexei Starovoitov says:

====================
bpf: map pre-alloc

v1->v2:
. fix few issues spotted by Daniel
. converted stackmap into pre-allocation as well
. added a workaround for lockdep false positive
. added pcpu_freelist_populate to be used by hashmap and stackmap

this path set switches bpf hash map to use pre-allocation by default
and introduces BPF_F_NO_PREALLOC flag to keep old behavior for cases
where full map pre-allocation is too memory expensive.

Some time back Daniel Wagner reported crashes when bpf hash map is
used to compute time intervals between preempt_disable->preempt_enable
and recently Tom Zanussi reported a dead lock in iovisor/bcc/funccount
tool if it's used to count the number of invocations of kernel
'*spin*' functions. Both problems are due to the recursive use of
slub and can only be solved by pre-allocating all map elements.

A lot of different solutions were considered. Many implemented,
but at the end pre-allocation seems to be the only feasible answer.
As far as pre-allocation goes it also was implemented 4 different ways:
- simple free-list with single lock
- percpu_ida with optimizations
- blk-mq-tag variant customized for bpf use case
- percpu_freelist
For bpf style of alloc/free patterns percpu_freelist is the best
and implemented in this patch set.
Detailed performance numbers in patch 3.
Patch 2 introduces percpu_freelist
Patch 1 fixes simple deadlocks due to missing recursion checks
Patch 5: converts stackmap to pre-allocation
Patches 6-9: prepare test infra
Patch 10: stress test for hash map infra. It attaches to spin_lock
functions and bpf_map_update/delete are called from different contexts
Patch 11: stress for bpf_get_stackid
Patch 12: map performance test

Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Reported-by: Tom Zanussi <tom.zanussi@linux.intel.com>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:28:33 -05:00
Alexei Starovoitov
c3f85cffc5 samples/bpf: test both pre-alloc and normal maps
extend test coveraged to include pre-allocated and run-time alloc maps

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:28:32 -05:00
Alexei Starovoitov
89b9760701 samples/bpf: add map_flags to bpf loader
note old loader is compatible with new kernel.
map_flags are optional

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:28:32 -05:00
Alexei Starovoitov
3622e7e493 samples/bpf: move ksym_search() into library
move ksym search from offwaketime into library to be reused
in other tests

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:28:32 -05:00
Alexei Starovoitov
618ec9a7b1 samples/bpf: make map creation more verbose
map creation is typically the first one to fail when rlimits are
too low, not enough memory, etc
Make this failure scenario more verbose

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:28:32 -05:00
Alexei Starovoitov
557c0c6e7d bpf: convert stackmap to pre-allocation
It was observed that calling bpf_get_stackid() from a kprobe inside
slub or from spin_unlock causes similar deadlock as with hashmap,
therefore convert stackmap to use pre-allocated memory.

The call_rcu is no longer feasible mechanism, since delayed freeing
causes bpf_get_stackid() to fail unpredictably when number of actual
stacks is significantly less than user requested max_entries.
Since elements are no longer freed into slub, we can push elements into
freelist immediately and let them be recycled.
However the very unlikley race between user space map_lookup() and
program-side recycling is possible:
     cpu0                          cpu1
     ----                          ----
user does lookup(stackidX)
starts copying ips into buffer
                                   delete(stackidX)
                                   calls bpf_get_stackid()
				   which recyles the element and
                                   overwrites with new stack trace

To avoid user space seeing a partial stack trace consisting of two
merged stack traces, do bucket = xchg(, NULL); copy; xchg(,bucket);
to preserve consistent stack trace delivery to user space.
Now we can move memset(,0) of left-over element value from critical
path of bpf_get_stackid() into slow-path of user space lookup.
Also disallow lookup() from bpf program, since it's useless and
program shouldn't be messing with collected stack trace.

Note that similar race between user space lookup and kernel side updates
is also present in hashmap, but it's not a new race. bpf programs were
always allowed to modify hash and array map elements while user space
is copying them.

Fixes: d5a3b1f691 ("bpf: introduce BPF_MAP_TYPE_STACK_TRACE")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:28:31 -05:00
Alexei Starovoitov
823707b68d bpf: check for reserved flag bits in array and stack maps
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:28:31 -05:00
Alexei Starovoitov
6c90598174 bpf: pre-allocate hash map elements
If kprobe is placed on spin_unlock then calling kmalloc/kfree from
bpf programs is not safe, since the following dead lock is possible:
kfree->spin_lock(kmem_cache_node->lock)...spin_unlock->kprobe->
bpf_prog->map_update->kmalloc->spin_lock(of the same kmem_cache_node->lock)
and deadlocks.

The following solutions were considered and some implemented, but
eventually discarded
- kmem_cache_create for every map
- add recursion check to slow-path of slub
- use reserved memory in bpf_map_update for in_irq or in preempt_disabled
- kmalloc via irq_work

At the end pre-allocation of all map elements turned out to be the simplest
solution and since the user is charged upfront for all the memory, such
pre-allocation doesn't affect the user space visible behavior.

Since it's impossible to tell whether kprobe is triggered in a safe
location from kmalloc point of view, use pre-allocation by default
and introduce new BPF_F_NO_PREALLOC flag.

While testing of per-cpu hash maps it was discovered
that alloc_percpu(GFP_ATOMIC) has odd corner cases and often
fails to allocate memory even when 90% of it is free.
The pre-allocation of per-cpu hash elements solves this problem as well.

Turned out that bpf_map_update() quickly followed by
bpf_map_lookup()+bpf_map_delete() is very common pattern used
in many of iovisor/bcc/tools, so there is additional benefit of
pre-allocation, since such use cases are must faster.

Since all hash map elements are now pre-allocated we can remove
atomic increment of htab->count and save few more cycles.

Also add bpf_map_precharge_memlock() to check rlimit_memlock early to avoid
large malloc/free done by users who don't have sufficient limits.

Pre-allocation is done with vmalloc and alloc/free is done
via percpu_freelist. Here are performance numbers for different
pre-allocation algorithms that were implemented, but discarded
in favor of percpu_freelist:

1 cpu:
pcpu_ida	2.1M
pcpu_ida nolock	2.3M
bt		2.4M
kmalloc		1.8M
hlist+spinlock	2.3M
pcpu_freelist	2.6M

4 cpu:
pcpu_ida	1.5M
pcpu_ida nolock	1.8M
bt w/smp_align	1.7M
bt no/smp_align	1.1M
kmalloc		0.7M
hlist+spinlock	0.2M
pcpu_freelist	2.0M

8 cpu:
pcpu_ida	0.7M
bt w/smp_align	0.8M
kmalloc		0.4M
pcpu_freelist	1.5M

32 cpu:
kmalloc		0.13M
pcpu_freelist	0.49M

pcpu_ida nolock is a modified percpu_ida algorithm without
percpu_ida_cpu locks and without cross-cpu tag stealing.
It's faster than existing percpu_ida, but not as fast as pcpu_freelist.

bt is a variant of block/blk-mq-tag.c simlified and customized
for bpf use case. bt w/smp_align is using cache line for every 'long'
(similar to blk-mq-tag). bt no/smp_align allocates 'long'
bitmasks continuously to save memory. It's comparable to percpu_ida
and in some cases faster, but slower than percpu_freelist

hlist+spinlock is the simplest free list with single spinlock.
As expeceted it has very bad scaling in SMP.

kmalloc is existing implementation which is still available via
BPF_F_NO_PREALLOC flag. It's significantly slower in single cpu and
in 8 cpu setup it's 3 times slower than pre-allocation with pcpu_freelist,
but saves memory, so in cases where map->max_entries can be large
and number of map update/delete per second is low, it may make
sense to use it.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:28:31 -05:00
Alexei Starovoitov
e19494edab bpf: introduce percpu_freelist
Introduce simple percpu_freelist to keep single list of elements
spread across per-cpu singly linked lists.

/* push element into the list */
void pcpu_freelist_push(struct pcpu_freelist *, struct pcpu_freelist_node *);

/* pop element from the list */
struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *);

The object is pushed to the current cpu list.
Pop first trying to get the object from the current cpu list,
if it's empty goes to the neigbour cpu list.

For bpf program usage pattern the collision rate is very low,
since programs push and pop the objects typically on the same cpu.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:28:31 -05:00
Alexei Starovoitov
b121d1e74d bpf: prevent kprobe+bpf deadlocks
if kprobe is placed within update or delete hash map helpers
that hold bucket spin lock and triggered bpf program is trying to
grab the spinlock for the same bucket on the same cpu, it will
deadlock.
Fix it by extending existing recursion prevention mechanism.

Note, map_lookup and other tracing helpers don't have this problem,
since they don't hold any locks and don't modify global data.
bpf_trace_printk has its own recursive check and ok as well.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:28:30 -05:00
Bharat Kumar Gogada
01cf9d524f microblaze/PCI: Support generic Xilinx AXI PCIe Host Bridge IP driver
Modify the Microblaze PCI subsystem to work with the generic
drivers/pci/host/pcie-xilinx.c driver on Microblaze and Zynq.

[bhelgaas: changelog]
Signed-off-by: Bharat Kumar Gogada <bharatku@xilinx.com>
Signed-off-by: Ravi Kiran Gummaluri <rgummal@xilinx.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
2016-03-08 14:25:57 -06:00
Bharat Kumar Gogada
e5d4b2006c PCI: xilinx: Update Zynq binding with Microblaze node
Update Zynq PCI binding documentation with Microblaze node.

[bhelgaas: fix "microbalze_0_intc" typo]
Signed-off-by: Bharat Kumar Gogada <bharatku@xilinx.com>
Signed-off-by: Ravi Kiran Gummaluri <rgummal@xilinx.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Michal Simek <michal.simek@xilinx.com>
2016-03-08 14:25:49 -06:00
Krzysztof Kozlowski
bf82c350e9 thermal: Fix build error of missing devm_ioremap_resource on UM
The devres.o gets linked if HAS_IOMEM is present so on ARCH=um
allyesconfig (COMPILE_TEST) failed on many files with:

drivers/built-in.o: In function `kirkwood_thermal_probe':
kirkwood_thermal.c:(.text+0x390a25): undefined reference to `devm_ioremap_resource'
drivers/built-in.o: In function `exynos_tmu_probe':
exynos_tmu.c:(.text+0x39246b): undefined reference to `devm_ioremap'

The users of devm_ioremap_resource() which are compile-testable should
depend on HAS_IOMEM.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-03-08 12:22:35 -08:00
David S. Miller
8aba8b8312 Merge branch 'ipv6-per-netns-gc'
Michal Kubecek says:

====================
ipv6: per netns FIB6 walkers and garbage collector

Commit 2ac3ac8f86 ("ipv6: prevent fib6_run_gc() contention") reduced
the risk of contention on FIB6 garbage collector lock on systems with
many CPUs. However, one of our customers can still observe heavy
contention on fib6_gc_lock which can even trigger the soft lockup
detector.

This is caused by garbage collector running in forced mode from a timer.
While there is one timer per network namespace, the instances of
fib6_run_gc() running from them are protected by one global spinlock so
that only one garbage collector can run at any moment and other
namespaces have to wait. As most relevant data structures are separated
per netns, there is little reason for garbage collectors blocking each
other.

Similar problem exists for walkers: changes in one tree do not need to
adjust (and block) walkers traversing FIB trees in other namespaces.

This series separates both the walkers infrastructure and garbage
collector so that they work independently in network namespaces.

v2: get rid of ifdef in ipv6_route_seq_setup_walk(), pass net from
callers instead
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:16:51 -05:00
Michal Kubeček
3dc94f93be ipv6: per netns FIB garbage collection
One of our customers observed issues with FIB6 garbage collectors
running in different network namespaces blocking each other, resulting
in soft lockups (fib6_run_gc() initiated from timer runs always in
forced mode).

Now that FIB6 walkers are separated per namespace, there is no more need
for instances of fib6_run_gc() in different namespaces blocking each
other. There is still a call to icmp6_dst_gc() which operates on shared
data but this function is protected by its own shared lock.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:16:51 -05:00
Michal Kubeček
9a03cd8f38 ipv6: per netns fib6 walkers
The IPv6 FIB data structures are separated per network namespace but
there is still only one global walkers list and one global walker list
lock. This means changes in one namespace unnecessarily interfere with
walkers in other namespaces.

Replace the global list with per-netns lists (and give each its own
lock).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:16:51 -05:00
Michal Kubeček
3570df914f ipv6: replace global gc_args with local variable
Global variable gc_args is only used in fib6_run_gc() and functions
called from it. As fib6_run_gc() makes sure there is at most one
instance of fib6_clean_all() running at any moment, we can replace
gc_args with a local variable which will be needed once multiple
instances (per netns) of garbage collector are allowed.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:16:50 -05:00
Bharat Kumar Gogada
2c51391d25 PCI: xilinx: Don't call pci_fixup_irqs() on Microblaze
The Xilinx AXI PCIe Host Bridge Soft IP driver was previously only
supported on ARM (in particular, on ARCH_ZYNC), and pci_fixup_irqs() is
available there.  But Microblaze will do IRQ fixup in pcibios_add_device(),
so pci_fixup_irqs() is not available on Microblaze.

Don't call pci_fixup_irqs() on Microblaze, so the driver can work on both
Zynq and Microblaze Architectures.

[bhelgaas: revise changelog to show similarity to bdb8a1844f ("PCI: iproc: Call pci_fixup_irqs() for ARM64 as well as ARM")]
Signed-off-by: Bharat Kumar Gogada <bharatku@xilinx.com>
Signed-off-by: Ravi Kiran Gummaluri <rgummal@xilinx.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michal Simek <michal.simek@xilinx.com>
2016-03-08 14:06:17 -06:00
Bharat Kumar Gogada
4c01f3b089 PCI: xilinx: Remove dependency on ARM-specific struct hw_pci
The Xilinx PCIe host controller driver uses pci_common_init_dev(), which
is ARM-specific and requires the ARM struct hw_pci.  The part of
pci_common_init_dev() that is needed is limited and can be done here
without using hw_pci.

Create and scan the root bus directly without using the ARM
pci_common_init_dev() interface.

[bhelgaas: revise changelog to show similarity to 79953dd22c ("PCI: rcar: Remove dependency on ARM-specific struct hw_pci")]
Signed-off-by: Bharat Kumar Gogada <bharatku@xilinx.com>
Signed-off-by: Ravi Kiran Gummaluri <rgummal@xilinx.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michal Simek <michal.simek@xilinx.com>
2016-03-08 14:06:03 -06:00
Marcelo Ricardo Leitner
133800d1f0 sctp: fix copying more bytes than expected in sctp_add_bind_addr
Dmitry reported that sctp_add_bind_addr may read more bytes than
expected in case the parameter is a IPv4 addr supplied by the user
through calls such as sctp_bindx_add(), because it always copies
sizeof(union sctp_addr) while the buffer may be just a struct
sockaddr_in, which is smaller.

This patch then fixes it by limiting the memcpy to the min between the
union size and a (new parameter) provided addr size. Where possible this
parameter still is the size of that union, except for reading from
user-provided buffers, which then it accounts for protocol type.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 15:04:08 -05:00
Dan Carpenter
1336919425 thermal: ti-soc-thermal: clean up the error handling a bit
We don't need to initialize "ret".  We can move the IS_ERR() checks into
the if condition instead of doing an assignment first.  Also there is a
check for "ret" when we know it is zero so we can remove that.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-03-08 11:57:37 -08:00