Commit Graph

388088 Commits

Author SHA1 Message Date
Aruna Balakrishnaiah
a5e4797b0f powerpc/pseries: Read common partition via pstore
This patch exploits pstore subsystem to read details of common partition
in NVRAM to a separate file in /dev/pstore. For instance, common partition
details will be stored in a file named [common-nvram-6].

Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:02 +10:00
Aruna Balakrishnaiah
f33f748c96 powerpc/pseries: Read of-config partition via pstore
This patch set exploits the pstore subsystem to read details of
of-config partition in NVRAM to a separate file in /dev/pstore.
For instance, of-config partition details will be stored in a
file named [of-nvram-5].

Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:59 +10:00
Aruna Balakrishnaiah
edf38465a3 powerpc/pseries: Distinguish between a os-partition and non-os partition
Introduce os_partition member in nvram_os_partition structure to identify
if the partition is an os partition or not. This will be useful to handle
non-os partitions of-config and common.

Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:56 +10:00
Aruna Balakrishnaiah
69020eea97 powerpc/pseries: Read rtas partition via pstore
This patch set exploits the pstore subsystem to read details of rtas partition
in NVRAM to a separate file in /dev/pstore. For instance, rtas details will be
stored in a file named [rtas-nvram-4].

Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:52 +10:00
Aruna Balakrishnaiah
d7563c94f7 powerpc/pseries: Read/Write oops nvram partition via pstore
IBM's p series machines provide persistent storage for LPARs through NVRAM.
NVRAM's lnx,oops-log partition is used to log oops messages.
Currently the kernel provides the contents of p-series NVRAM only as a
simple stream of bytes via /dev/nvram, which must be interpreted in user
space by the nvram command in the powerpc-utils package.

This patch set exploits the pstore subsystem to expose oops partition in
NVRAM as a separate file in /dev/pstore. For instance, Oops messages will be
stored in a file named [dmesg-nvram-2]. In case pstore registration fails it
will fall back to kmsg_dump mechanism.

This patch will read/write the oops messages from/to this partition via pstore.

Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:49 +10:00
Aruna Balakrishnaiah
126746101e powerpc/pseries: Introduce generic read function to read nvram-partitions
Introduce generic read function to read nvram partitions other than rtas.
nvram_read_error_log will be retained which is used to read rtas partition
from rtasd. nvram_read_partition is the generic read function to read from
any nvram partition.

Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:46 +10:00
Aruna Balakrishnaiah
b1f70e1f72 powerpc/pseries: Add version and timestamp to oops header
Introduce version and timestamp information in the oops header.
oops_log_info (oops header) holds version (to distinguish between old
and new format oops header), length of the oops text
(compressed or uncompressed) and timestamp.

The version field will sit in the same place as the length in old
headers. version is assigned 5000 (greater than oops partition size)
so that existing tools will refuse to dump new style partitions as
the length is too large. The updated tools will work with both
old and new format headers.

Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:43 +10:00
Aruna Balakrishnaiah
1bf247f8df powerpc/pseries: Remove syslog prefix in uncompressed oops text
Removal of syslog prefix in the uncompressed oops text will
help in capturing more oops data.

Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:39 +10:00
Gavin Shan
2d5c121678 powerpc/eeh: Enhance converting EEH dev
Under some special circumstances, the EEH device doesn't have the
associated device tree node or PCI device. The patch enhances those
functions converting EEH device to device tree node or PCI device
accordingly to avoid unnecessary system crash.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:36 +10:00
Gavin Shan
5fb621698e powerpc/eeh: Fix fetching bus for single-dev-PE
While running Linux as guest on top of phyp, we possiblly have
PE that includes single PCI device. However, we didn't return
its PCI bus correctly and it leads to failure on recovery from
EEH errors for single-dev-PE. The patch fixes the issue.

Cc: <stable@vger.kernel.org> # v3.7+
Cc: Steve Best <sbest@us.ibm.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:33 +10:00
Anton Blanchard
475e68cfdd powerpc: Align thread->fpr to 16 bytes
On newer CPUs we use VSX loads and stores to the thread->fpr array.
For best performance we need to ensure 16 byte alignment.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:30 +10:00
liguang
1b7e0cbe49 powerpc/pseries: Use 'true' instead of '1' for orderly_poweroff
orderly_poweroff is expecting a bool parameter, so
use 'true' instead '1'

Signed-off-by: liguang <lig.fnst@cn.fujitsu.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:26 +10:00
liguang
a5b45ded09 powerpc/smp: Use '==' instead of '<' for system_state
'system_state < SYSTEM_RUNNING' will have same effect
with 'system_state == SYSTEM_BOOTING', but the later
one is more clearer.

Signed-off-by: liguang <lig.fnst@cn.fujitsu.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:23 +10:00
Bharat Bhushan
13d543cd79 powerpc: Restore dbcr0 on user space exit
On BookE (Branch taken + Single Step) is as same as Branch Taken
on BookS and in Linux we simulate BookS behavior for BookE as well.
When doing so, in Branch taken handling we want to set DBCR0_IC but
we update the current->thread->dbcr0 and not DBCR0.

Now on 64bit the current->thread.dbcr0 (and other debug registers)
is synchronized ONLY on context switch flow. But after handling
Branch taken in debug exception if we return back to user space
without context switch then single stepping change (DBCR0_ICMP)
does not get written in h/w DBCR0 and Instruction Complete exception
does not happen.

This fixes using ptrace reliably on BookE-PowerPC

lmbench latency test (lat_syscall) Results are (they varies a little
on each run)

1) ./lat_syscall <action> /dev/shm/uImage

action:	Open	read	write	stat	fstat	null
Before:	3.8618	0.2017	0.2851	1.6789	0.2256	0.0856
After:	3.8580	0.2017	0.2851	1.6955	0.2255	0.0856

1) ./lat_syscall -P 2 -N 10 <action> /dev/shm/uImage
action:	Open	read	write	stat	fstat	null
Before:	4.1388	0.2238	0.3066	1.7106	0.2256	0.0856
After:	4.1413	0.2236	0.3062	1.7107	0.2256	0.0856

[ Slightly modified to avoid extra branch in the fast path
  on Book3S and fix build on all non-BookE 64-bit -- BenH
]

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:19 +10:00
Bharat Bhushan
d8899bb2be powerpc: Debug control and status registers are 32bit
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:16 +10:00
Pawel Moll
c7f6e2d8ff clk: vexpress: Make the clock drivers directly available for arm64
The new arm64 architecture has no idea of platform or machine, so
it doesn't have to define ARCH_VEXPRESS configuration option at
all. To allow user to select the drivers at all, make it depend
on ARM64 as well.

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
2013-06-20 00:02:25 -07:00
Pawel Moll
e95a49b429 clk: vexpress: Use full node name to identify individual clocks
Previously all the clocks were reported as "osc". Now it will be
something like "/dcc/osc@0".

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
2013-06-20 00:02:18 -07:00
Alexey Kardashevskiy
5b25199eff powerpc/vfio: Enable on pSeries platform
The enables VFIO on the pSeries platform, enabling user space
programs to access PCI devices directly.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:15 +10:00
Alexey Kardashevskiy
5ffd229c02 powerpc/vfio: Implement IOMMU driver for VFIO
VFIO implements platform independent stuff such as
a PCI driver, BAR access (via read/write on a file descriptor
or direct mapping when possible) and IRQ signaling.

The platform dependent part includes IOMMU initialization
and handling.  This implements an IOMMU driver for VFIO
which does mapping/unmapping pages for the guest IO and
provides information about DMA window (required by a POWER
guest).

Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:14 +10:00
Alexey Kardashevskiy
4e13c1ac6b powerpc/vfio: Enable on PowerNV platform
This initializes IOMMU groups based on the IOMMU configuration
discovered during the PCI scan on POWERNV (POWER non virtualized)
platform.  The IOMMU groups are to be used later by the VFIO driver,
which is used for PCI pass through.

It also implements an API for mapping/unmapping pages for
guest PCI drivers and providing DMA window properties.
This API is going to be used later by QEMU-VFIO to handle
h_put_tce hypercalls from the KVM guest.

The iommu_put_tce_user_mode() does only a single page mapping
as an API for adding many mappings at once is going to be
added later.

Although this driver has been tested only on the POWERNV
platform, it should work on any platform which supports
TCE tables.  As h_put_tce hypercall is received by the host
kernel and processed by the QEMU (what involves calling
the host kernel again), performance is not the best -
circa 220MB/s on 10Gb ethernet network.

To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config
option and configure VFIO as required.

Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:14 +10:00
Alistair Popple
ab9a4183fd powerpc: Update currituck pci/usb fixup for new board revision
The currituck board uses a different IRQ for the pci usb host
controller depending on the board revision. This patch adds support
for newer board revisions by retrieving the board revision from the
FPGA and mapping the appropriate IRQ.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Acked-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:13 +10:00
Michael Neuling
70a54a4fae powerpc: Fix single step emulation of 32bit overflowed branches
Check truncate_if_32bit() on final write to nip.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:13 +10:00
Alistair Popple
b9ef7d6b11 powerpc: Update default configurations
Update default configurations for systems with CONFIG_BOOTX_TEXT
selected so that they continue to print early debug messages as is
currently the case.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:12 +10:00
Alistair Popple
071df9422a powerpc: Add a configuration option for early BootX/OpenFirmware debug
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:12 +10:00
Jeremy Kerr
0962e8004e powerpc/prom: Scan reserved-ranges node for memory reservations
Based on benh's proposal at
https://lists.ozlabs.org/pipermail/linuxppc-dev/2012-September/101237.html,
this change provides support for reserving memory from the
reserved-ranges node at the root of the device tree.

We just call memblock_reserve on these ranges for now.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:11 +10:00
Daniel Walker
d5d8ec895c powerpc/mm: Make mmap_64.c compile on 32bit powerpc
There appears to be no good reason to keep this as 64bit only. It works
on 32bit also, and has checks so that it can work correctly with 32bit
binaries on 64bit hardware which is why I think this works.

I tested this on qemu using the virtex-ml507 machine type.

Before,

/bin2 # ./test & cat /proc/${!}/maps
00100000-00103000 r-xp 00000000 00:00 0          [vdso]
10000000-10007000 r-xp 00000000 00:01 454        /bin2/test
10017000-10018000 rw-p 00007000 00:01 454        /bin2/test
48000000-48020000 r-xp 00000000 00:01 224        /lib/ld-2.11.3.so
48021000-48023000 rw-p 00021000 00:01 224        /lib/ld-2.11.3.so
bfd03000-bfd24000 rw-p 00000000 00:00 0          [stack]
/bin2 # ./test & cat /proc/${!}/maps
00100000-00103000 r-xp 00000000 00:00 0          [vdso]
0fe6e000-0ffd8000 r-xp 00000000 00:01 214        /lib/libc-2.11.3.so
0ffd8000-0ffe8000 ---p 0016a000 00:01 214        /lib/libc-2.11.3.so
0ffe8000-0ffed000 rw-p 0016a000 00:01 214        /lib/libc-2.11.3.so
0ffed000-0fff0000 rw-p 00000000 00:00 0
10000000-10007000 r-xp 00000000 00:01 454        /bin2/test
10017000-10018000 rw-p 00007000 00:01 454        /bin2/test
48000000-48020000 r-xp 00000000 00:01 224        /lib/ld-2.11.3.so
48020000-48021000 rw-p 00000000 00:00 0
48021000-48023000 rw-p 00021000 00:01 224        /lib/ld-2.11.3.so
bf98a000-bf9ab000 rw-p 00000000 00:00 0          [stack]
/bin2 # ./test & cat /proc/${!}/maps
00100000-00103000 r-xp 00000000 00:00 0          [vdso]
0fe6e000-0ffd8000 r-xp 00000000 00:01 214        /lib/libc-2.11.3.so
0ffd8000-0ffe8000 ---p 0016a000 00:01 214        /lib/libc-2.11.3.so
0ffe8000-0ffed000 rw-p 0016a000 00:01 214        /lib/libc-2.11.3.so
0ffed000-0fff0000 rw-p 00000000 00:00 0
10000000-10007000 r-xp 00000000 00:01 454        /bin2/test
10017000-10018000 rw-p 00007000 00:01 454        /bin2/test
48000000-48020000 r-xp 00000000 00:01 224        /lib/ld-2.11.3.so
48020000-48021000 rw-p 00000000 00:00 0
48021000-48023000 rw-p 00021000 00:01 224        /lib/ld-2.11.3.so
bfa54000-bfa75000 rw-p 00000000 00:00 0          [stack]

After,

bash-4.1# ./test & cat /proc/${!}/maps
[7] 803
00100000-00103000 r-xp 00000000 00:00 0          [vdso]
10000000-10007000 r-xp 00000000 00:01 454        /bin2/test
10017000-10018000 rw-p 00007000 00:01 454        /bin2/test
b7eb0000-b7ed0000 r-xp 00000000 00:01 224        /lib/ld-2.11.3.so
b7ed1000-b7ed3000 rw-p 00021000 00:01 224        /lib/ld-2.11.3.so
bfbc0000-bfbe1000 rw-p 00000000 00:00 0          [stack]
bash-4.1# ./test & cat /proc/${!}/maps
[8] 805
00100000-00103000 r-xp 00000000 00:00 0          [vdso]
10000000-10007000 r-xp 00000000 00:01 454        /bin2/test
10017000-10018000 rw-p 00007000 00:01 454        /bin2/test
b7b03000-b7b23000 r-xp 00000000 00:01 224        /lib/ld-2.11.3.so
b7b24000-b7b26000 rw-p 00021000 00:01 224        /lib/ld-2.11.3.so
bfc27000-bfc48000 rw-p 00000000 00:00 0          [stack]
bash-4.1# ./test & cat /proc/${!}/maps
[9] 807
00100000-00103000 r-xp 00000000 00:00 0          [vdso]
10000000-10007000 r-xp 00000000 00:01 454        /bin2/test
10017000-10018000 rw-p 00007000 00:01 454        /bin2/test
b7f37000-b7f57000 r-xp 00000000 00:01 224        /lib/ld-2.11.3.so
b7f58000-b7f5a000 rw-p 00021000 00:01 224        /lib/ld-2.11.3.so
bff96000-bffb7000 rw-p 00000000 00:00 0          [stack]

Signed-off-by: Daniel Walker <dwalker@fifo90.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:11 +10:00
Kevin Hao
3139b0a797 powerpc: Remove the unneeded trigger of decrementer interrupt in decrementer_check_overflow
Previously in order to handle the edge sensitive decrementers,
we choose to set the decrementer to 1 to trigger a decrementer
interrupt when re-enabling interrupts. But with the rework of the
lazy EE, we would replay the decrementer interrupt when re-enabling
interrupts if a decrementer interrupt occurs with irq soft-disabled.
So there is no need to trigger a decrementer interrupt in this case
any more.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:10 +10:00
Scott Wood
39a421ff0b powerpc/mm/nohash: Ignore NULL stale_map entries
This happens with threads that are offline due to CPU hotplug
(including threads that were never "plugged in" to begin with because
SMT is disabled).

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:10 +10:00
Suzuki K. Poulose
35fd219a26 powerpc: Move the single step enable code to a generic path
This patch moves the single step enable code used by kprobe to a generic
routine header so that, it can be re-used by other code, in this case,
uprobes. No functional changes.

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc:	Ananth N Mavinakaynahalli <ananth@in.ibm.com>
Cc:	Kumar Gala <galak@kernel.crashing.org>
Cc:	linuxppc-dev@ozlabs.org
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:09 +10:00
Suzuki K. Poulose
85f395c5b0 powerpc/kprobes: Do not disable External interrupts during single step
External/Decrement exceptions have lower priority than the Debug Exception.
So, we don't have to disable the External interrupts before a single step.
However, on BookE, Critical Input Exception(CE) has higher priority than a
Debug Exception. Hence we mask them.

Signed-off-by: 	Suzuki K. Poulose <suzuki@in.ibm.com>
Cc:		Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc:		Ananth N Mavinakaynahalli <ananth@in.ibm.com>
Cc:		Kumar Gala <galak@kernel.crashing.org>
Cc:		linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:09 +10:00
Thomas Gleixner
e80034047b powerpc: Mark low level irq handlers NO_THREAD
These low level handlers cannot be threaded. Mark them NO_THREAD

Reported-by: leroy christophe <christophe.leroy@c-s.fr>
Tested-by: leroy christophe <christophe.leroy@c-s.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:08 +10:00
Aneesh Kumar K.V
fce144b477 mm/THP: deposit the transpare huge pgtable before set_pmd
Architectures like powerpc use the deposited pgtable to store hash index
values.  We need to make the deposted pgtable is visible to other cpus
before we are ready to take a hash fault.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:08 +10:00
Aneesh Kumar K.V
fde52796d4 mm/THP: don't use HPAGE_SHIFT in transparent hugepage code
For architectures like powerpc that support multiple explicit hugepage
sizes, HPAGE_SHIFT indicate the default explicit hugepage shift.  For THP
to work the hugepage size should be same as PMD_SIZE.  So use PMD_SHIFT
directly.  So move the define outside CONFIG_TRANSPARENT_HUGEPAGE #ifdef
because we want to use these defines in generic code with if
(pmd_trans_huge()) conditional.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:08 +10:00
Aneesh Kumar K.V
a6bf2bb03e mm/THP: withdraw the pgtable after pmdp related operations
For architectures like ppc64 we look at deposited pgtable when calling
pmdp_get_and_clear.  So do the pgtable_trans_huge_withdraw after finishing
pmdp related operations.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:07 +10:00
Aneesh Kumar K.V
6b0b50b061 mm/THP: add pmd args to pgtable deposit and withdraw APIs
This will be later used by powerpc THP support.  In powerpc we want to use
pgtable for storing the hash index values.  So instead of adding them to
mm_context list, we would like to store them in the second half of pmd

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:07 +10:00
Aneesh Kumar K.V
8663890a9e mm/thp: use the correct function when updating access flags
We should use pmdp_set_access_flags to update access flags.  Archs like
powerpc use extra checks(_PAGE_BUSY) when updating a hugepage PTE.  A
set_pmd_at doesn't do those checks.  We should use set_pmd_at only when
updating a none hugepage PTE.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>a
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:06 +10:00
Joe Perches
fedaf4ffc2 ndisc: Convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 23:18:07 -07:00
Joe Perches
9e8cda3ba8 ipv6: Convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 23:18:07 -07:00
Olaf Hering
a7bf58040f net: vlan: fix comment for vlan_ethhdr->h_vlan_proto
After addition of 8021AD h_vlan_proto can be either ETH_P_8021Q or
ETH_P_8021AD.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 23:16:50 -07:00
Rami Rosen
af92e5425e inet: frag , remove an empty ifdef.
This patch removes an empty ifdef from inet_frag_intern()
in net/ipv4/inet_fragment.c.

commit b67bfe0d42
(hlist: drop the node parameter from iterators) removed hlist from
net/ipv4/inet_fragment.c, but did not remove the enclosing ifdef command,
which is now empty.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 23:06:52 -07:00
Eric Dumazet
c9364636dc htb: refactor struct htb_sched fields for performance
htb_sched structures are big, and source of false sharing on SMP.

Every time a packet is queued or dequeue, many cache lines must be
touched because structures are not lay out properly.

By carefully splitting htb_sched in two parts, and define sub structures
to increase data locality, we can improve performance dramatically on
SMP.

New htb_prio structure can also be used in htb_class to increase data
locality.

I got 26 % performance increase on a 24 threads machine, with 200
concurrent netperf in TCP_RR mode, using a HTB hierarchy of 4 classes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 23:06:52 -07:00
Cong Wang
bcefe17cff tcp: introduce a per-route knob for quick ack
In previous discussions, I tried to find some reasonable heuristics
for delayed ACK, however this seems not possible, according to Eric:

	"ACKS might also be delayed because of bidirectional
	traffic, and is more controlled by the application
	response time. TCP stack can not easily estimate it."

	"ACK can be incredibly useful to recover from losses in
	a short time.

	The vast majority of TCP sessions are small lived, and we
	send one ACK per received segment anyway at beginning or
	retransmits to let the sender smoothly increase its cwnd,
	so an auto-tuning facility wont help them that much."

and according to David:

	"ACKs are the only information we have to detect loss.

	And, for the same reasons that TCP VEGAS is fundamentally
	broken, we cannot measure the pipe or some other
	receiver-side-visible piece of information to determine
	when it's "safe" to stretch ACK.

	And even if it's "safe", we should not do it so that losses are
	accurately detected and we don't spuriously retransmit.

	The only way to know when the bandwidth increases is to
	"test" it, by sending more and more packets until drops happen.
	That's why all successful congestion control algorithms must
	operate on explicited tested pieces of information.

	Similarly, it's not really possible to universally know if
	it's safe to stretch ACK or not."

It still makes sense to enable or disable quick ack mode like
what TCP_QUICK_ACK does.

Similar to TCP_QUICK_ACK option, but for people who can't
modify the source code and still wants to control
TCP delayed ACK behavior. As David suggested, this should belong
to per-path scope, since different pathes may want different
behaviors.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Rick Jones <rick.jones2@hp.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 23:06:51 -07:00
Gao feng
a881ae1f62 ipv6: don't call addrconf_dst_alloc again when enable lo
If we disable all of the net interfaces, and enable
un-lo interface before lo interface, we already allocated
the addrconf dst in ipv6_add_addr. So we shouldn't allocate
it again when we enable lo interface.

Otherwise the message below will be triggered.
unregister_netdevice: waiting for sit1 to become free. Usage count = 1

This problem is introduced by commit 25fb6ca4ed
"net IPv6 : Fix broken IPv6 routing table after loopback down-up"

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 23:04:50 -07:00
Dave Jones
2c0740e4e1 sctp: Convert __list_for_each use to list_for_each
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 23:02:49 -07:00
Aneesh Kumar K.V
8bbd9f04b7 powerpc: Fix bad pmd error with book3E config
Book3E uses the hugepd at PMD level and don't encode pte directly
at the pmd level. So it will find the lower bits of pmd set
and the pmd_bad check throws error. Infact the current code
will never take the free_hugepd_range call at all because it will
clear the pmd if it find a hugepd pointer. Fix this by clearing
bad pmd only if it is not a hugepd pointer.

This is regression introduced by e2b3d202d1
"powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format"

Reported-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 15:25:21 +10:00
Yijing Wang
8576827131 bnx2: use pdev->pm_cap instead of pci_find_capability(.., PCI_CAP_ID_PM)
Pci core has been saved pm cap register offset by pdev->pm_cap in pci_pm_init()
in init path. So we can use pdev->pm_cap instead of using
pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and simplified code.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Michael Chan <mchan@broadcom.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 22:22:57 -07:00
Yijing Wang
f9c7da5eaf amd8111e: use pdev->pm_cap instead of pci_find_capability(.., PCI_CAP_ID_PM)
Pci core has been saved pm cap register offset by pdev->pm_cap in pci_pm_init()
in init path. So we can use pdev->pm_cap instead of using
pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and simplified code.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: netdev@vger.kernel.org (open list:NETWORKING DRIVERS)
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 22:22:56 -07:00
Yijing Wang
b8a39dd292 Bnx2x: remove redundant D0 power state set
Pci_enable_device() will set device power state to D0,
so it's no need to do it again in bnx2x_init_dev().
Also remove redundant PM Cap find code, because pci core
has been saved the pci device pm cap value.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 22:22:56 -07:00
Ben Hutchings
2206209e75 net: Add missing dependencies on NETDEVICES
ETRAX_ETHERNET selects ETHERNET and MII, which depend on NETDEVICES.
I don't think anything should select NETDEVICES, so make it a
dependency.  It also doesn't need to select or depend on ETHERNET,
which has nothing to do with the Ethernet library functions.

BPCTL selects MII, which depends on NETDEVICES.  But everything in the
drivers/staging/silicom directory is related to net devices, so make
NET_VENDOR_SILICOM depend on NETDEVICES and remove the now-redundant
dependencies on NET.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 22:22:56 -07:00
Ben Hutchings
d6cf7a86ce at91_ether: Do not select NET_CORE
This has no dependency on any of the drivers under NET_CORE.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 22:22:56 -07:00