Commit Graph

255396 Commits

Author SHA1 Message Date
Timur Tabi
9f89c96079 ASoC: MPC5200: replace of_device with platform_device
commit 3bdf28feaf upstream.

'struct of_device' no longer exists, and its functionality has been merged
into platform_device.  Update the MPC5200 audio DMA driver (mpc5200_dma)
accordingly.  This fixes a build break.

Signed-off-by: Timur Tabi <timur@freescale.com>
Acked-by: Liam Girdwood <lrg@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:23 -07:00
Dan Williams
f3919ef812 isci: fix 32-bit operation when CONFIG_HIGHMEM64G=n
commit ee33e2b771 upstream.

The unsolicited frame control infrastructure requires a table of dma
addresses for the hardware to lookup the frame buffer location by an
index.  The hardware expects the elements of this table to be 64-bit
quantities, so we cannot reference these elements as dma_addr_t.  All
unsolicited frame protocols are affected, particularly SATA-PIO and SMP
which prevented direct-attached SATA drives and expander-attached drives
to not be discovered.

Reported-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:23 -07:00
Dan Williams
60c48a44d3 isci: fix sata response handling
commit 1a87828447 upstream.

A bug (likely copy/paste) that has been carried from the original
implementation.  The unsolicited frame handling structure returns the
d2h fis in the isci_request.stp.rsp buffer.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:22 -07:00
Jim Garlick
101e357617 fs/9p: Use protocol-defined value for lock/getlock 'type' field.
commit 51b8b4fb32 upstream.

Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:22 -07:00
Aneesh Kumar K.V
8bdb14f9c3 fs/9p: Always ask new inode in lookup for cache mode disabled
commit 73f507171c upstream.

This make sure we don't end up reusing the unlinked inode object.
The ideal way is to use inode i_generation. But i_generation is
not available in userspace always.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:22 -07:00
Aneesh Kumar K.V
8926487ad8 net/9p: Fix kernel crash with msize 512K
commit b49d8b5d70 upstream.

With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple
crashes. This patch fix those.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:22 -07:00
Aneesh Kumar K.V
a111278ea9 fs/9p: Add OS dependent open flags in 9p protocol
commit f88657ce3f upstream.

Some of the flags are OS/arch dependent we add a 9p
protocol value which maps to asm-generic/fcntl.h values in Linux
Based on the original patch from Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>

[extra comments from author as to why this needs to go to stable:

Earlier for different operation such as open we used the values of open
flag as defined by the OS.  But some of these flags such as O_DIRECT are
arch dependent. So if we have the 9p client and server running on
different architectures, we end up with client sending client
architecture value of these open flag and server will try to map these
values to what its architecture states. For ex: O_DIRECT on a x86 client
maps to

#define O_DIRECT        00040000

Where as on sparc server it will maps to

#define O_DIRECT        0x100000

Hence we need to map these open flags to OS/arch independent flag
values.  Getting these changes to an early version of kernel ensures us
that we work with different combination of client and server. We should
ideally backport this patch to all possible kernel version.]

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:21 -07:00
Aneesh Kumar K.V
29a3e8657d fs/9p: Don't update file type when updating file attributes
commit 45089142b1 upstream.

We should only update attributes that we can change on stat2inode.
Also do file type initialization in v9fs_init_inode.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:21 -07:00
Aneesh Kumar K.V
e279cdca3c fs/9p: Add fid before dentry instantiation
commit 5441ae5eb3 upstream.

d_instantiate marks the dentry positive. So a parallel lookup and mkdir of
the directory can find dentry that doesn't have fid attached. This can result
in both the code path doing v9fs_fid_add which results in v9fs_dentry leak.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:21 -07:00
Fenghua Yu
3cba74d538 ACPICA: Do not repair _TSS return package if _PSS is present
commit 8f9c91273e upstream.

We can only sort the _TSS return package if there is no _PSS
in the same scope. This is because if _PSS is present, the ACPI
specification dictates that the _TSS Power Dissipation field is
to be ignored, and therefore some BIOSs leave garbage values in
the _TSS Power field(s).  In this case, it is best to just return
the _TSS package as-is.

Reported-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:21 -07:00
Joerg Roedel
054b93a673 iommu/amd: Make sure iommu->need_sync contains correct value
commit f1ca1512e7 upstream.

The value is only set to true but never set back to false,
which causes to many completion-wait commands to be sent to
hardware. Fix it with this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:20 -07:00
Joerg Roedel
5c755fc21c iommu/amd: Don't take domain->lock recursivly
commit e33acde911 upstream.

The domain_flush_devices() function takes the domain->lock.
But this function is only called from update_domain() which
itself is already called unter the domain->lock. This causes
a deadlock situation when the dma-address-space of a domain
grows larger than 1GB.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:20 -07:00
Randy Dunlap
bfa826f82f irda: fix smsc-ircc2 section mismatch warning
commit f470e5ae34 upstream.

Fix section mismatch warning:

WARNING: drivers/net/irda/smsc-ircc2.o(.devinit.text+0x1a7): Section mismatch in reference from the function smsc_ircc_pnp_probe() to the function .init.text:smsc_ircc_open()

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:20 -07:00
Al Viro
7c0e1afbe3 9p: close ACL leaks
commit 1ec95bf34d upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:19 -07:00
Venkateswararao Jujjuri (JV)
8aeae69113 net/9p: Fix the msize calculation.
commit c9ffb05ca5 upstream.

msize represents the maximum PDU size that includes P9_IOHDRSZ.

Signed-off-by: Venkateswararao Jujjuri "<jvrao@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:18 -07:00
Aneesh Kumar K.V
926fa0b4b9 fs/9p: Always ask new inode in create
commit ed80fcfac2 upstream.

This make sure we don't end up reusing the unlinked inode object.
The ideal way is to use inode i_generation. But i_generation is
not available in userspace always.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:18 -07:00
Prem Karat
6170eea647 fs/9p: Fix invalid mount options/args
commit a2dd43bb0d upstream.

Without this fix, if any invalid mount options/args are passed while mouting
the 9p fs, no error (-EINVAL) is returned and default arg value is assigned.

This fix returns -EINVAL when an invalid arguement is found while parsing
mount options.

Signed-off-by: Prem Karat <prem.karat@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:18 -07:00
Aneesh Kumar K.V
e38b21e76b fs/9p: When doing inode lookup compare qid details and inode mode bits.
commit fd2421f544 upstream.

This make sure we don't use wrong inode from the inode hash. The inode number
of the file deleted is reused by the next file system object created
and if we only use inode number for inode hash lookup we could end up
with wrong struct inode.

Also compare inode generation number. Not all Linux file system provide
st_gen in userspace. So it could be 0;

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:18 -07:00
Aneesh Kumar K.V
a0be78ef93 fs/9p: Fid is not valid after a failed clunk.
commit 5034990e28 upstream.

free the fid even in case of failed clunk.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:17 -07:00
jvrao
0beac58515 VirtIO can transfer VIRTQUEUE_NUM of pages.
commit 7f781679dd upstream.

Signed-off-by: Venkateswararao Jujjuri "<jvrao@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:17 -07:00
jvrao
8b1aebc0be Fix the size of receive buffer packing onto VirtIO ring.
commit 114e6f3a5e upstream.

Signed-off-by: Venkateswararao Jujjuri "<jvrao@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:16 -07:00
Eric Van Hensbergen
7b551b7069 net/9p: fix client code to fail more gracefully on protocol error
commit b85f7d92d7 upstream.

There was a BUG_ON to protect against a bad id which could be dealt with
more gracefully.

Reported-by: Natalie Orlin <norlin@us.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:16 -07:00
Florian Mickler
f2685ef0fb vp7045: fix buffer setup
commit fc61ccd35f upstream.

dvb_usb_device_init calls the frontend_attach method of this driver which
uses vp7045_usb_ob. In order to have a buffer ready in vp7045_usb_op, it has to
be allocated before that happens.

Luckily we can use the whole private data as the buffer as it gets separately
allocated on the heap via kzalloc in dvb_usb_device_init and is thus apt for
use via usb_control_msg.

This fixes a
	BUG: unable to handle kernel paging request at 0000000000001e78

reported by Tino Keitel and diagnosed by Dan Carpenter.

Tested-by: Tino Keitel <tino.keitel@tikei.de>
Signed-off-by: Florian Mickler <florian@mickler.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:16 -07:00
Jarod Wilson
ac0a6fa16b nuvoton-cir: simplify raw IR sample handling
commit de4ed0c111 upstream.

The nuvoton-cir driver was storing up consecutive pulse-pulse and
space-space samples internally, for no good reason, since
ir_raw_event_store_with_filter() already merges back to back like
samples types for us. This should also fix a regression introduced late
in 3.0 that related to a timeout change, which actually becomes correct
when coupled with this change. Tested with RC6 and RC5 on my own
nuvoton-cir hardware atop vanilla 3.0.0, after verifying quirky
behavior in 3.0 due to the timeout change.

Reported-by: Stephan Raue <sraue@openelec.tv>
CC: Stephan Raue <sraue@openelec.tv>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:15 -07:00
NeilBrown
71a26cb4e0 md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.
commit 27a7b260f7 upstream.

0.90 metadata uses an unsigned 32bit number to count the number of
kilobytes used from each device.
This should allow up to 4TB per device.
However we multiply this by 2 (to get sectors) before casting to a
larger type, so sizes above 2TB get truncated.

Also we allow rdev->sectors to be larger than 4TB, so it is possible
for the array to be resized larger than the metadata can handle.
So make sure rdev->sectors never exceeds 4TB when 0.90 metadata is in
used.

Also the sanity check at the end of super_90_load should include level
1 as it used ->size too. (RAID0 and Linear don't use ->size at all).

Reported-by: Pim Zandbergen <P.Zandbergen@macroscoop.nl>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:14 -07:00
NeilBrown
97e5a85664 Avoid dereferencing a 'request_queue' after last close.
commit 94007751bb upstream.

On the last close of an 'md' device which as been stopped, the device
is destroyed and in particular the request_queue is freed.  The free
is done in a separate thread so it might happen a short time later.

__blkdev_put calls bdev_inode_switch_bdi *after* ->release has been
called.

Since commit f758eeabeb
bdev_inode_switch_bdi will dereference the 'old' bdi, which lives
inside a request_queue, to get a spin lock.  This causes the last
close on an md device to sometime take a spin_lock which lives in
freed memory - which results in an oops.

So move the called to bdev_inode_switch_bdi before the call to
->release.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:14 -07:00
Marcin Slusarz
045451f9de drm/nouveau: properly handle allocation failure in nouveau_sgdma_populate
commit 17c8b96093 upstream.

Not cleaning after alloc failure would result in crash on destroy,
because nouveau_sgdma_clear assumes "ttm_alloced" to be not null when
"pages" is not null.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:14 -07:00
Linus Walleij
766357153d ARM: davinci: fix cache flush build error
commit 897a6a1a14 upstream.

The TNET variant of DaVinci compiles some code that it shares
with other DaVinci variants, however it has a V6 CPU rather than
an ARM926T, thus the hardcoded call to arm926_flush_kern_cache_all()
in sleep.S will obviously fail, and we need to build with the
v6_flush_kern_cache_all() call instead. This was triggered by
manually altering the DaVinci config to build the TNET version.

Cc: Dave Martin <dave.martin@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:13 -07:00
Sudhakar Rajashekhara
75a9498b73 ARM: davinci: da850 EVM: read mac address from SPI flash
commit 810198bc9c upstream.

DA850/OMAP-L138 EMAC driver uses random mac address instead of
a fixed one because the mac address is not stuffed into EMAC
platform data.

This patch provides a function which reads the mac address
stored in SPI flash (registered as MTD device) and populates the
EMAC platform data. The function which reads the mac address is
registered as a callback which gets called upon addition of MTD
device.

NOTE: In case the MAC address stored in SPI flash is erased, follow
the instructions at [1] to restore it.

[1] http://processors.wiki.ti.com/index.php/GSG:_OMAP-L138_DVEVM_Additional_Procedures#Restoring_MAC_address_on_SPI_Flash

Modifications in v2:
Guarded registering the mtd_notifier only when MTD is enabled.
Earlier this was handled using mtd_has_partitions() call, but
this has been removed in Linux v3.0.

Modifications in v3:
a. Guarded da850_evm_m25p80_notify_add() function and
   da850evm_spi_notifier structure with CONFIG_MTD macros.
b. Renamed da850_evm_register_mtd_user() function to
   da850_evm_setup_mac_addr() and removed the struct mtd_notifier
   argument to this function.
c. Passed the da850evm_spi_notifier structure to register_mtd_user()
   function.

Modifications in v4:
Moved the da850_evm_setup_mac_addr() function within the first
CONFIG_MTD ifdef construct.

Signed-off-by: Sudhakar Rajashekhara <sudhakar.raj@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:13 -07:00
Linus Walleij
8a19c4e575 ARM: 7081/1: mach-integrator: fix the clocksource
commit bb9ea77846 upstream.

I was intrigued by the fact that the clock stood still on
the Integrator, but it wasn't strange at all, because the
timer was set up all wrong and probably has been for a
while. With this patch the clock starts ticking again:
make the timer periodic (reload), |= on the divisor bit
and load the timer before starting it.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:13 -07:00
Guenter Roeck
a677ecd3c4 hwmon: (max16065) Fix current calculation
commit ff71c182f4 upstream.

Current calculation is completely wrong. Add missing brackets to fix it.

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:12 -07:00
Konrad Rzeszutek Wilk
0b129e1ec4 xen/smp: Warn user why they keel over - nosmp or noapic and what to use instead.
commit ed467e69f1 upstream.

We have hit a couple of customer bugs where they would like to
use those parameters to run an UP kernel - but both of those
options turn of important sources of interrupt information so
we end up not being able to boot. The correct way is to
pass in 'dom0_max_vcpus=1' on the Xen hypervisor line and
the kernel will patch itself to be a UP kernel.

Fixes bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637308

Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:12 -07:00
Igor Mammedov
1f51b5d99c xen: x86_32: do not enable iterrupts when returning from exception in interrupt context
commit d198d49914 upstream.

If vmalloc page_fault happens inside of interrupt handler with interrupts
disabled then on exit path from exception handler when there is no pending
interrupts, the following code (arch/x86/xen/xen-asm_32.S:112):

	cmpw $0x0001, XEN_vcpu_info_pending(%eax)
	sete XEN_vcpu_info_mask(%eax)

will enable interrupts even if they has been previously disabled according to
eflags from the bounce frame (arch/x86/xen/xen-asm_32.S:99)

	testb $X86_EFLAGS_IF>>8, 8+1+ESP_OFFSET(%esp)
	setz XEN_vcpu_info_mask(%eax)

Solution is in setting XEN_vcpu_info_mask only when it should be set
according to
	cmpw $0x0001, XEN_vcpu_info_pending(%eax)
but not clearing it if there isn't any pending events.

Reproducer for bug is attached to RHBZ 707552

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:12 -07:00
Girish K S
1c72a517f7 mmc: sdhci-s3c: Fix mmc card I/O problem
commit 49bb1e6195 upstream.

This patch fixes the problem in sdhci-s3c host driver for Samsung Soc's.
During the card identification stage the mmc core driver enumerates for
the best bus width in combination with the highest available data rate.
It starts enumerating from the highest bus width (8) to lowest width (1).

In case of few MMC cards the 4-bit bus enumeration fails and tries
the 1-bit bus enumeration. When switched to 1-bit bus mode the host driver
has to clear the previous bus width setting and apply the new setting.

The current patch will clear the previous bus mode and apply the new
mode setting.

Signed-off-by: Girish K S <girish.shivananjappa@linaro.org>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:11 -07:00
Mika Westerberg
6942511b6b mmc: core: use non-reentrant workqueue for clock gating
commit 50a50f9248 upstream.

The default multithread workqueue can cause the same work to be executed
concurrently on a different CPUs. This isn't really suitable for clock
gating as it might already gated the clock and gating it twice results both
host->clk_old and host->ios.clock to be set to 0.

To prevent this from happening we use system_nrt_wq instead.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:10 -07:00
Mika Westerberg
0bd01aeeec mmc: core: prevent aggressive clock gating racing with ios updates
commit 778e277cb8 upstream.

We have seen at least two different races when clock gating kicks in in a
middle of ios structure update.

First one happens when ios->clock is changed outside of aggressive clock
gating framework, for example via mmc_set_clock(). The race might happen
when we run following code:

mmc_set_ios():
	...
	if (ios->clock > 0)
		mmc_set_ungated(host);

Now if gating kicks in right after the condition check we end up setting
host->clk_gated to false even though we have just gated the clock. Next
time a request is started we try to ungate and restore the clock in
mmc_host_clk_hold(). However since we have host->clk_gated set to false the
original clock is not restored.

This eventually will cause the host controller to hang since its clock is
disabled while we are trying to issue a request. For example on Intel
Medfield platform we see:

[   13.818610] mmc2: Timeout waiting for hardware interrupt.
[   13.818698] sdhci: =========== REGISTER DUMP (mmc2)===========
[   13.818753] sdhci: Sys addr: 0x00000000 | Version:  0x00008901
[   13.818804] sdhci: Blk size: 0x00000000 | Blk cnt:  0x00000000
[   13.818853] sdhci: Argument: 0x00000000 | Trn mode: 0x00000000
[   13.818903] sdhci: Present:  0x1fff0000 | Host ctl: 0x00000001
[   13.818951] sdhci: Power:    0x0000000d | Blk gap:  0x00000000
[   13.819000] sdhci: Wake-up:  0x00000000 | Clock:    0x00000000
[   13.819049] sdhci: Timeout:  0x00000000 | Int stat: 0x00000000
[   13.819098] sdhci: Int enab: 0x00ff00c3 | Sig enab: 0x00ff00c3
[   13.819147] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000
[   13.819196] sdhci: Caps:     0x6bee32b2 | Caps_1:   0x00000000
[   13.819245] sdhci: Cmd:      0x00000000 | Max curr: 0x00000000
[   13.819292] sdhci: Host ctl2: 0x00000000
[   13.819331] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000
[   13.819377] sdhci: ===========================================
[   13.919605] mmc2: Reset 0x2 never completed.

and it never recovers.

Second race might happen while running mmc_power_off():

static void mmc_power_off(struct mmc_host *host)
{
	host->ios.clock = 0;
	host->ios.vdd = 0;

[ clock gating kicks in here ]

	/*
	 * Reset ocr mask to be the highest possible voltage supported for
	 * this mmc host. This value will be used at next power up.
	 */
	host->ocr = 1 << (fls(host->ocr_avail) - 1);

	if (!mmc_host_is_spi(host)) {
		host->ios.bus_mode = MMC_BUSMODE_OPENDRAIN;
		host->ios.chip_select = MMC_CS_DONTCARE;
	}
	host->ios.power_mode = MMC_POWER_OFF;
	host->ios.bus_width = MMC_BUS_WIDTH_1;
	host->ios.timing = MMC_TIMING_LEGACY;
	mmc_set_ios(host);
}

If the clock gating worker kicks in while we are only partially updated the
ios structure the host controller gets incomplete ios and might not work as
supposed. Again on Intel Medfield platform we get:

[    4.185349] kernel BUG at drivers/mmc/host/sdhci.c:1155!
[    4.185422] invalid opcode: 0000 [#1] PREEMPT SMP
[    4.185509] Modules linked in:
[    4.185565]
[    4.185608] Pid: 4, comm: kworker/0:0 Not tainted 3.0.0+ #240 Intel Corporation Medfield/iCDKA
[    4.185742] EIP: 0060:[<c136364e>] EFLAGS: 00010083 CPU: 0
[    4.185827] EIP is at sdhci_set_power+0x3e/0xd0
[    4.185891] EAX: f5ff98e0 EBX: f5ff98e0 ECX: 00000000 EDX: 00000001
[    4.185970] ESI: f5ff977c EDI: f5ff9904 EBP: f644fe98 ESP: f644fe94
[    4.186049]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[    4.186125] Process kworker/0:0 (pid: 4, ti=f644e000 task=f644c0e0 task.ti=f644e000)
[    4.186219] Stack:
[    4.186257]  f5ff98e0 f644feb0 c1365173 00000282 f5ff9460 f5ff96e0 f5ff96e0 f644feec
[    4.186418]  c1355bd8 f644c0e0 c1499c3d f5ff96e0 f644fed4 00000006 f5ff96e0 00000286
[    4.186579]  f644fedc c107922b f644feec 00000286 f5ff9460 f5ff9700 f644ff10 c135839e
[    4.186739] Call Trace:
[    4.186802]  [<c1365173>] sdhci_set_ios+0x1c3/0x340
[    4.186883]  [<c1355bd8>] mmc_gate_clock+0x68/0x120
[    4.186963]  [<c1499c3d>] ? _raw_spin_unlock_irqrestore+0x4d/0x60
[    4.187052]  [<c107922b>] ? trace_hardirqs_on+0xb/0x10
[    4.187134]  [<c135839e>] mmc_host_clk_gate_delayed+0xbe/0x130
[    4.187219]  [<c105ec09>] ? process_one_work+0xf9/0x5b0
[    4.187300]  [<c135841d>] mmc_host_clk_gate_work+0xd/0x10
[    4.187379]  [<c105ec82>] process_one_work+0x172/0x5b0
[    4.187457]  [<c105ec09>] ? process_one_work+0xf9/0x5b0
[    4.187538]  [<c1358410>] ? mmc_host_clk_gate_delayed+0x130/0x130
[    4.187625]  [<c105f3c8>] worker_thread+0x118/0x330
[    4.187700]  [<c1496cee>] ? preempt_schedule+0x2e/0x50
[    4.187779]  [<c105f2b0>] ? rescuer_thread+0x1f0/0x1f0
[    4.187857]  [<c1062cf4>] kthread+0x74/0x80
[    4.187931]  [<c1062c80>] ? __init_kthread_worker+0x60/0x60
[    4.188015]  [<c149acfa>] kernel_thread_helper+0x6/0xd
[    4.188079] Code: 81 fa 00 00 04 00 0f 84 a7 00 00 00 7f 21 81 fa 80 00 00 00 0f 84 92 00 00 00 81 fa 00 00 0
[    4.188780] EIP: [<c136364e>] sdhci_set_power+0x3e/0xd0 SS:ESP 0068:f644fe94
[    4.188898] ---[ end trace a7b23eecc71777e4 ]---

This BUG() comes from the fact that ios.power_mode was still in previous
value (MMC_POWER_ON) and ios.vdd was set to zero.

We prevent these by inhibiting the clock gating while we update the ios
structure.

Both problems can be reproduced by simply running the device in a reboot
loop.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:10 -07:00
Mika Westerberg
a24cf48a9c mmc: rename mmc_host_clk_{ungate|gate} to mmc_host_clk_{hold|release}
commit 08c14071fd upstream.

As per suggestion by Linus Walleij:

  > If you think the names of the functions are confusing then
  > you may rename them, say like this:
  >
  > mmc_host_clk_ungate() -> mmc_host_clk_hold()
  > mmc_host_clk_gate() -> mmc_host_clk_release()
  >
  > Which would make the usecases more clear

(This is CC'd to stable@ because the next two patches, which fix
observable races, depend on it.)

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:10 -07:00
Andrey Vagin
2dd9ce0663 x86, perf: Check that current->mm is alive before getting user callchain
commit 20afc60f89 upstream.

An event may occur when an mm is already released.

I added an event in dequeue_entity() and caught a panic with
the following backtrace:

[  434.421110] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[  434.421258] IP: [<ffffffff810464ac>] __get_user_pages_fast+0x9c/0x120
...
[  434.421258] Call Trace:
[  434.421258]  [<ffffffff8101ae81>] copy_from_user_nmi+0x51/0xf0
[  434.421258]  [<ffffffff8109a0d5>] ? sched_clock_local+0x25/0x90
[  434.421258]  [<ffffffff8101b048>] perf_callchain_user+0x128/0x170
[  434.421258]  [<ffffffff811154cd>] ? __perf_event_header__init_id+0xed/0x100
[  434.421258]  [<ffffffff81116690>] perf_prepare_sample+0x200/0x280
[  434.421258]  [<ffffffff81118da8>] __perf_event_overflow+0x1b8/0x290
[  434.421258]  [<ffffffff81065240>] ? tg_shares_up+0x0/0x670
[  434.421258]  [<ffffffff8104fe1a>] ? walk_tg_tree+0x6a/0xb0
[  434.421258]  [<ffffffff81118f44>] perf_swevent_overflow+0xc4/0xf0
[  434.421258]  [<ffffffff81119150>] do_perf_sw_event+0x1e0/0x250
[  434.421258]  [<ffffffff81119204>] perf_tp_event+0x44/0x70
[  434.421258]  [<ffffffff8105701f>] ftrace_profile_sched_block+0xdf/0x110
[  434.421258]  [<ffffffff8106121d>] dequeue_entity+0x2ad/0x2d0
[  434.421258]  [<ffffffff810614ec>] dequeue_task_fair+0x1c/0x60
[  434.421258]  [<ffffffff8105818a>] dequeue_task+0x9a/0xb0
[  434.421258]  [<ffffffff810581e2>] deactivate_task+0x42/0xe0
[  434.421258]  [<ffffffff814bc019>] thread_return+0x191/0x808
[  434.421258]  [<ffffffff81098a44>] ? switch_task_namespaces+0x24/0x60
[  434.421258]  [<ffffffff8106f4c4>] do_exit+0x464/0x910
[  434.421258]  [<ffffffff8106f9c8>] do_group_exit+0x58/0xd0
[  434.421258]  [<ffffffff8106fa57>] sys_exit_group+0x17/0x20
[  434.421258]  [<ffffffff8100b202>] system_call_fastpath+0x16/0x1b

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1314693156-24131-1-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:09 -07:00
WANG Cong
70a4888b98 sched: Fix a memory leak in __sdt_free()
commit feff8fa007 upstream.

This patch fixes the following memory leak:

unreferenced object 0xffff880107266800 (size 512):
  comm "sched-powersave", pid 3718, jiffies 4323097853 (age 27495.450s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81133940>] create_object+0x187/0x28b
    [<ffffffff814ac103>] kmemleak_alloc+0x73/0x98
    [<ffffffff811232ba>] __kmalloc_node+0x104/0x159
    [<ffffffff81044b98>] kzalloc_node.clone.97+0x15/0x17
    [<ffffffff8104cb90>] build_sched_domains+0xb7/0x7f3
    [<ffffffff8104d4df>] partition_sched_domains+0x1db/0x24a
    [<ffffffff8109ee4a>] do_rebuild_sched_domains+0x3b/0x47
    [<ffffffff810a00c7>] rebuild_sched_domains+0x10/0x12
    [<ffffffff8104d5ba>] sched_power_savings_store+0x6c/0x7b
    [<ffffffff8104d5df>] sched_mc_power_savings_store+0x16/0x18
    [<ffffffff8131322c>] sysdev_class_store+0x20/0x22
    [<ffffffff81193876>] sysfs_write_file+0x108/0x144
    [<ffffffff81135b10>] vfs_write+0xaf/0x102
    [<ffffffff81135d23>] sys_write+0x4d/0x74
    [<ffffffff814c8a42>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1313671017-4112-1-git-send-email-amwang@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:09 -07:00
Thomas Gleixner
f4e97b682a sched: Move blk_schedule_flush_plug() out of __schedule()
commit 9c40cef2b7 upstream.

There is no real reason to run blk_schedule_flush_plug() with
interrupts and preemption disabled.

Move it into schedule() and call it when the task is going voluntarily
to sleep. There might be false positives when the task is woken
between that call and actually scheduling, but that's not really
different from being woken immediately after switching away.

This fixes a deadlock in the scheduler where the
blk_schedule_flush_plug() callchain enables interrupts and thereby
allows a wakeup to happen of the task that's going to sleep.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-dwfxtra7yg1b5r65m32ywtct@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:09 -07:00
Thomas Gleixner
edbb7ce79e sched: Separate the scheduler entry for preemption
commit c259e01a1e upstream.

Block-IO and workqueues call into notifier functions from the
scheduler core code with interrupts and preemption disabled. These
calls should be made before entering the scheduler core.

To simplify this, separate the scheduler core code into
__schedule(). __schedule() is directly called from the places which
set PREEMPT_ACTIVE and from schedule(). This allows us to add the work
checks into schedule(), so they are only called when a task voluntary
goes to sleep.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110622174918.813258321@linutronix.de
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:08 -07:00
John Stultz
1ed2053563 rtc: Fix RTC PIE frequency limit
commit 938f97bcf1 upstream.

Thomas earlier submitted a fix to limit the RTC PIE freq, but
picked 5000Hz out of the air. Willy noticed that we should
instead use the 8192Hz max from the rtc man documentation.

Cc: Willy Tarreau <w@1wt.eu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:08 -07:00
John Stultz
c3a44b4d23 alarmtimers: Avoid possible denial of service with high freq periodic timers
commit 6af7e471e5 upstream.

Its possible to jam up the alarm timers by setting very small interval
timers, which will cause the alarmtimer subsystem to spend all of its time
firing and restarting timers. This can effectivly lock up a box.

A deeper fix is needed, closely mimicking the hrtimer code, but for now
just cap the interval to 100us to avoid userland hanging the system.

CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:07 -07:00
John Stultz
0898dd1603 alarmtimers: Memset itimerspec passed into alarm_timer_get
commit ea7802f630 upstream.

Following common_timer_get, zero out the itimerspec passed in.

CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:07 -07:00
John Stultz
26cf1a7ba1 alarmtimers: Avoid possible null pointer traversal
commit 971c90bfa2 upstream.

We don't check if old_setting is non null before assigning it, so
correct this.

CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:06 -07:00
Troy Kisky
5607cbd164 MXC: iomux-v3: correct NO_PAD_CTRL definition
commit 425933b30b upstream.

iomux-v3.c uses NO_PAD_CTRL as a 32 bit value
so it should not be shifted left by MUX_PAD_CTRL_SHIFT(41)

Previously, anything requesting NO_PAD_CTRL would get
their pad control register set to 0.

Since it is a pad control mask, place it with the other mask values.

Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
Acked-by: Lothar Waßmann <LW@KARO-electronics.de>
Tested-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Cc: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:06 -07:00
Carolyn Wyborny
0dd4154f66 igb: fix WOL on second port of i350 device
commit 6d337dce66 upstream.

This patch fixes a problem where WOL would fail on second port of i350
device.

Reported-by: Martin Wilck <martin.wilck@ts.fujitsu.com>
Reported-by: Stefan Assmann<sassmann@redhat.com>
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:05 -07:00
Mel Gorman
66d52cb7c4 mm: page allocator: reconsider zones for allocation after direct reclaim
commit 76d3fbf8fb upstream.

With zone_reclaim_mode enabled, it's possible for zones to be considered
full in the zonelist_cache so they are skipped in the future.  If the
process enters direct reclaim, the ZLC may still consider zones to be full
even after reclaiming pages.  Reconsider all zones for allocation if
direct reclaim returns successfully.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:04 -07:00
Mel Gorman
42274b5f81 mm: page allocator: initialise ZLC for first zone eligible for zone_reclaim
commit cd38b115d5 upstream.

There have been a small number of complaints about significant stalls
while copying large amounts of data on NUMA machines reported on a
distribution bugzilla.  In these cases, zone_reclaim was enabled by
default due to large NUMA distances.  In general, the complaints have not
been about the workload itself unless it was a file server (in which case
the recommendation was disable zone_reclaim).

The stalls are mostly due to significant amounts of time spent scanning
the preferred zone for pages to free.  After a failure, it might fallback
to another node (as zonelists are often node-ordered rather than
zone-ordered) but stall quickly again when the next allocation attempt
occurs.  In bad cases, each page allocated results in a full scan of the
preferred zone.

Patch 1 checks the preferred zone for recent allocation failure
        which is particularly important if zone_reclaim has failed
        recently.  This avoids rescanning the zone in the near future and
        instead falling back to another node.  This may hurt node locality
        in some cases but a failure to zone_reclaim is more expensive than
        a remote access.

Patch 2 clears the zlc information after direct reclaim.
        Otherwise, zone_reclaim can mark zones full, direct reclaim can
        reclaim enough pages but the zone is still not considered for
        allocation.

This was tested on a 24-thread 2-node x86_64 machine.  The tests were
focused on large amounts of IO.  All tests were bound to the CPUs on
node-0 to avoid disturbances due to processes being scheduled on different
nodes.  The kernels tested are

3.0-rc6-vanilla		Vanilla 3.0-rc6
zlcfirst		Patch 1 applied
zlcreconsider		Patches 1+2 applied

FS-Mark
./fs_mark  -d  /tmp/fsmark-10813  -D  100  -N  5000  -n  208  -L  35  -t  24  -S0  -s  524288
                fsmark-3.0-rc6       3.0-rc6       		3.0-rc6
                   vanilla			 zlcfirs 	zlcreconsider
Files/s  min          54.90 ( 0.00%)       49.80 (-10.24%)       49.10 (-11.81%)
Files/s  mean        100.11 ( 0.00%)      135.17 (25.94%)      146.93 (31.87%)
Files/s  stddev       57.51 ( 0.00%)      138.97 (58.62%)      158.69 (63.76%)
Files/s  max         361.10 ( 0.00%)      834.40 (56.72%)      802.40 (55.00%)
Overhead min       76704.00 ( 0.00%)    76501.00 ( 0.27%)    77784.00 (-1.39%)
Overhead mean    1485356.51 ( 0.00%)  1035797.83 (43.40%)  1594680.26 (-6.86%)
Overhead stddev  1848122.53 ( 0.00%)   881489.88 (109.66%)  1772354.90 ( 4.27%)
Overhead max     7989060.00 ( 0.00%)  3369118.00 (137.13%) 10135324.00 (-21.18%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds)        501.49    493.91    499.93
Total Elapsed Time (seconds)               2451.57   2257.48   2215.92

MMTests Statistics: vmstat
Page Ins                                       46268       63840       66008
Page Outs                                   90821596    90671128    88043732
Swap Ins                                           0           0           0
Swap Outs                                          0           0           0
Direct pages scanned                        13091697     8966863     8971790
Kswapd pages scanned                               0     1830011     1831116
Kswapd pages reclaimed                             0     1829068     1829930
Direct pages reclaimed                      13037777     8956828     8648314
Kswapd efficiency                               100%         99%         99%
Kswapd velocity                                0.000     810.643     826.346
Direct efficiency                                99%         99%         96%
Direct velocity                             5340.128    3972.068    4048.788
Percentage direct scans                         100%         83%         83%
Page writes by reclaim                             0           3           0
Slabs scanned                                 796672      720640      720256
Direct inode steals                          7422667     7160012     7088638
Kswapd inode steals                                0     1736840     2021238

Test completes far faster with a large increase in the number of files
created per second.  Standard deviation is high as a small number of
iterations were much higher than the mean.  The number of pages scanned by
zone_reclaim is reduced and kswapd is used for more work.

LARGE DD
               		3.0-rc6       3.0-rc6       3.0-rc6
                   	vanilla     zlcfirst     zlcreconsider
download tar           59 ( 0.00%)   59 ( 0.00%)   55 ( 7.27%)
dd source files       527 ( 0.00%)  296 (78.04%)  320 (64.69%)
delete source          36 ( 0.00%)   19 (89.47%)   20 (80.00%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds)        125.03    118.98    122.01
Total Elapsed Time (seconds)                624.56    375.02    398.06

MMTests Statistics: vmstat
Page Ins                                     3594216      439368      407032
Page Outs                                   23380832    23380488    23377444
Swap Ins                                           0           0           0
Swap Outs                                          0         436         287
Direct pages scanned                        17482342    69315973    82864918
Kswapd pages scanned                               0      519123      575425
Kswapd pages reclaimed                             0      466501      522487
Direct pages reclaimed                       5858054     2732949     2712547
Kswapd efficiency                               100%         89%         90%
Kswapd velocity                                0.000    1384.254    1445.574
Direct efficiency                                33%          3%          3%
Direct velocity                            27991.453  184832.737  208171.929
Percentage direct scans                         100%         99%         99%
Page writes by reclaim                             0        5082       13917
Slabs scanned                                  17280       29952       35328
Direct inode steals                           115257     1431122      332201
Kswapd inode steals                                0           0      979532

This test downloads a large tarfile and copies it with dd a number of
times - similar to the most recent bug report I've dealt with.  Time to
completion is reduced.  The number of pages scanned directly is still
disturbingly high with a low efficiency but this is likely due to the
number of dirty pages encountered.  The figures could probably be improved
with more work around how kswapd is used and how dirty pages are handled
but that is separate work and this result is significant on its own.

Streaming Mapped Writer
MMTests Statistics: duration
User/Sys Time Running Test (seconds)        124.47    111.67    112.64
Total Elapsed Time (seconds)               2138.14   1816.30   1867.56

MMTests Statistics: vmstat
Page Ins                                       90760       89124       89516
Page Outs                                  121028340   120199524   120736696
Swap Ins                                           0          86          55
Swap Outs                                          0           0           0
Direct pages scanned                       114989363    96461439    96330619
Kswapd pages scanned                        56430948    56965763    57075875
Kswapd pages reclaimed                      27743219    27752044    27766606
Direct pages reclaimed                         49777       46884       36655
Kswapd efficiency                                49%         48%         48%
Kswapd velocity                            26392.541   31363.631   30561.736
Direct efficiency                                 0%          0%          0%
Direct velocity                            53780.091   53108.759   51581.004
Percentage direct scans                          67%         62%         62%
Page writes by reclaim                           385         122        1513
Slabs scanned                                  43008       39040       42112
Direct inode steals                                0          10           8
Kswapd inode steals                              733         534         477

This test just creates a large file mapping and writes to it linearly.
Time to completion is again reduced.

The gains are mostly down to two things.  In many cases, there is less
scanning as zone_reclaim simply gives up faster due to recent failures.
The second reason is that memory is used more efficiently.  Instead of
scanning the preferred zone every time, the allocator falls back to
another zone and uses it instead improving overall memory utilisation.

This patch: initialise ZLC for first zone eligible for zone_reclaim.

The zonelist cache (ZLC) is used among other things to record if
zone_reclaim() failed for a particular zone recently.  The intention is to
avoid a high cost scanning extremely long zonelists or scanning within the
zone uselessly.

Currently the zonelist cache is setup only after the first zone has been
considered and zone_reclaim() has been called.  The objective was to avoid
a costly setup but zone_reclaim is itself quite expensive.  If it is
failing regularly such as the first eligible zone having mostly mapped
pages, the cost in scanning and allocation stalls is far higher than the
ZLC initialisation step.

This patch initialises ZLC before the first eligible zone calls
zone_reclaim().  Once initialised, it is checked whether the zone failed
zone_reclaim recently.  If it has, the zone is skipped.  As the first zone
is now being checked, additional care has to be taken about zones marked
full.  A zone can be marked "full" because it should not have enough
unmapped pages for zone_reclaim but this is excessive as direct reclaim or
kswapd may succeed where zone_reclaim fails.  Only mark zones "full" after
zone_reclaim fails if it failed to reclaim enough pages after scanning.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:04 -07:00
Alex Deucher
10927d967a drm/radeon/kms: make sure pci max read request size is valid on evergreen+ (v2)
commit d054ac16ee upstream.

If the bios or OS sets the pci max read request size to 0 or an
invalid value (6,7), it can result in a hang or slowdown.  Check
and set it to something sane if it's invalid.

Fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=42162

v2: use pci reg defines from include/linux/pci_regs.h

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:04 -07:00