Commit Graph

170330 Commits

Author SHA1 Message Date
Sheng Yang
926fd42e59 KVM: VMX: Disable unrestricted guest when EPT disabled
commit 046d87103a upstream

Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest
supported processors.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:21 -07:00
Eduardo Habkost
e075e2caeb KVM: SVM: Reset cr0 properly on vcpu reset
commit 18fa000ae4 upstream

svm_vcpu_reset() was not properly resetting the contents of the guest-visible
cr0 register, causing the following issue:
https://bugzilla.redhat.com/show_bug.cgi?id=525699

Without resetting cr0 properly, the vcpu was running the SIPI bootstrap routine
with paging enabled, making the vcpu get a pagefault exception while trying to
run it.

Instead of setting vmcb->save.cr0 directly, the new code just resets
kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on
vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0().

kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure
kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:20 -07:00
Eduardo Habkost
5b4ce46f45 KVM: VMX: Use macros instead of hex value on cr0 initialization
commit fa40052ca0 upstream

This should have no effect, it is just to make the code clearer.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:20 -07:00
Jan Kiszka
86143fe846 KVM: VMX: Update instruction length on intercepted BP
commit c573cd2293 upstream

We intercept #BP while in guest debugging mode. As VM exits due to
intercepted exceptions do not necessarily come with valid
idt_vectoring, we have to update event_exit_inst_len explicitly in such
cases. At least in the absence of migration, this ensures that
re-injections of #BP will find and use the correct instruction length.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:20 -07:00
Gleb Natapov
78ce64a384 KVM: Fix segment descriptor loading
commit c697518a86 upstream

Add proper error and permission checking. This patch also change task
switching code to load segment selectors before segment descriptors, like
SDM requires, otherwise permission checking during segment descriptor
loading will be incorrect.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:20 -07:00
Gleb Natapov
ad66f35eaa KVM: x86 emulator: Fix popf emulation
commit d4c6a1549c upstream

POPF behaves differently depending on current CPU mode. Emulate correct
logic to prevent guest from changing flags that it can't change otherwise.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:20 -07:00
Gleb Natapov
f6eb212e60 KVM: x86 emulator: Check IOPL level during io instruction emulation
commit f850e2e603 upstream

Make emulator check that vcpu is allowed to execute IN, INS, OUT,
OUTS, CLI, STI.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:20 -07:00
Gleb Natapov
2f44154668 KVM: x86 emulator: fix memory access during x86 emulation
commit 1871c6020d upstream

Currently when x86 emulator needs to access memory, page walk is done with
broadest permission possible, so if emulated instruction was executed
by userspace process it can still access kernel memory. Fix that by
providing correct memory access to page walker during emulation.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:20 -07:00
Gleb Natapov
10a505e60e KVM: x86 emulator: Add Virtual-8086 mode of emulation
commit a004475567 upstream

For some instructions CPU behaves differently for real-mode and
virtual 8086. Let emulator know which mode cpu is in, so it will
not poke into vcpu state directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:20 -07:00
Evan McClain
5774cdfd3d backlight: mbp_nvidia_bl - add five more MacBook variants
commit 36bc5ee6a8 upstream.

This adds the MacBook 1,1 2,1 3,1 4,1 and 4,2 to the DMI tables.

Signed-off-by: Evan McClain <evan.mcclain@gatech.edu>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:19 -07:00
Jiri Slaby
6abbc7bac8 resource: move kernel function inside __KERNEL__
commit 96d07d2117 upstream
From: Jiri Slaby <jslaby@suse.cz>

resource: move kernel function inside __KERNEL__

It is an internal function. Move it inside __KERNEL__ ifdef, along
with task_struct declaration.

Then we get:
#--- /usr/include/linux/resource.h       2009-09-14 15:09:29.000000000 +0200
#+++ usr/include/linux/resource.h       2010-01-04 11:30:54.000000000 +0100
#@@ -3,8 +3,6 @@
#
##include <linux/time.h>
#
#-struct task_struct;
#-
#/*
#* Resource control/accounting header file for linux
#*/
#@@ -70,6 +68,5 @@
#*/
##include <asm/resource.h>
#
#-int getrusage(struct task_struct *p, int who, struct rusage *ru);
#
##endif
#
#***********

include/linux/Kbuild is untouched, since unifdef is run even on
headers-y nowadays.

backport to 2.6.32 by maximilian attems <max@stro.at>
Patch commented out by gregkh due to quilt complaining.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:19 -07:00
Andreas Herrmann
787814b206 x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space
commit 9d260ebc09 upstream.

Use NodeId MSR to get NodeId and number of nodes per processor.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20091216144355.GB28798@alberich.amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:19 -07:00
Daniel T Chen
97d0f3f2a0 ALSA: hda: Fix 0 dB offset for Lenovo Thinkpad models using AD1981
commit b8e80cf386 upstream.

BugLink: https://launchpad.net/bugs/551606

The OR's hardware distorts at PCM 100% because it does not correspond to
0 dB. Fix this in patch_ad1981() for all models using the Thinkpad
quirk.

Reported-by: Jane Silber
Signed-off-by: Daniel T Chen <crimsun@ubuntu.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:19 -07:00
Dan Carpenter
db27624dc7 ALSA: mixart: range checking proc file
commit b0cc58a25d upstream.

The original code doesn't take into consideration that the value of
MIXART_BA0_SIZE - pos can be less than zero which would lead to a large
unsigned value for "count".

Also I moved the check that read size is a multiple of 4 bytes below
the code that adjusts "count".

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:19 -07:00
Wu Fengguang
a54573d774 readahead: fix NULL filp dereference
commit 70655c06bd upstream.

btrfs relocate_file_extent_cluster() calls us with NULL filp:

  [ 4005.426805] BUG: unable to handle kernel NULL pointer dereference at 00000021
  [ 4005.426818] IP: [<c109a130>] page_cache_sync_readahead+0x18/0x3e

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Yan Zheng <yanzheng@21cn.com>
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:19 -07:00
Anton Blanchard
957c93832c raw: fsync method is now required
commit 55ab3a1ff8 upstream.

Commit 148f948ba8 (vfs: Introduce new
helpers for syncing after writing to O_SYNC file or IS_SYNC inode) broke
the raw driver.

We now call through generic_file_aio_write -> generic_write_sync ->
vfs_fsync_range.  vfs_fsync_range has:

        if (!fop || !fop->fsync) {
                ret = -EINVAL;
                goto out;
        }

But drivers/char/raw.c doesn't set an fsync method.

We have two options: fix it or remove the raw driver completely.  I'm
happy to do either, the fact this has been broken for so long suggests it
is rarely used.

The patch below adds an fsync method to the raw driver.  My knowledge of
the block layer is pretty sketchy so this could do with a once over.

If we instead decide to remove the raw driver, this patch might still be
useful as a backport to 2.6.33 and 2.6.32.

Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Tested-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:19 -07:00
Jiri Kosina
6e313203f4 HID: fix oops in gyration_event()
commit d8e4ebf8b6 upstream.

Fix oops caused by dereferencing field->hidinput in cases where
the device hasn't been claimed by hid-input.

Reported-by: Andreas Demmer <mail@andreas-demmer.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:19 -07:00
Alan Cox
4e44a79800 pata_ali: Fix regression with old devices
commit d6250a03fa upstream.

Making the new stuff work broke some of the old chipsets. We need to go
back to the old set up values for these it seems. Unfortunately even with
documentation this is basically a mix of cargoculting and guesswork.

Chased down to the exact line by Gianluca.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:18 -07:00
Éric Piel
6807699cad lis3: fix show rate for 8 bits chips
commit 4b5d95b380 upstream.

Originally the driver was only targeted to 12bits sensors.  When support
for 8bits sensors was added, some slight difference in the registers were
overlooked.  This should fix it, both for initialization, and for
displaying the rate.

Reported-by: Kalhan Trisal <kalhan.trisal@intel.com>
Reported-by: Christoph Plattner <christoph.plattner@gmx.at>
Tested-by: Christoph Plattner <christoph.plattner@gmx.at>
Tested-by: Samu Onkalo <samu.p.onkalo@nokia.com>
Signed-off-by: Éric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:18 -07:00
Oleg Nesterov
e2278e63b1 tty: release_one_tty() forgets to put pids
commit 6da8d866d0 upstream.

release_one_tty(tty) can be called when tty still has a reference
to pgrp/session. In this case we leak the pid.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:18 -07:00
Thomas Gleixner
fae08fb3f9 genirq: Force MSI irq handlers to run with interrupts disabled
commit 753649dbc4 upstream.

Network folks reported that directing all MSI-X vectors of their multi
queue NICs to a single core can cause interrupt stack overflows when
enough interrupts fire at the same time.

This is caused by the fact that we run interrupt handlers by default
with interrupts enabled unless the driver reuqests the interrupt with
the IRQF_DISABLED set. The NIC handlers do not set this flag, so
simultaneous interrupts can nest unlimited and cause the stack
overflow.

The only safe counter measure is to run the interrupt handlers with
interrupts disabled. We can't switch to this mode in general right
now, but it is safe to do so for MSI interrupts.

Force IRQF_DISABLED for MSI interrupt handlers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@osdl.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:18 -07:00
Seth Heasley
e3abceb779 WATCHDOG: iTCO_wdt: TCO Watchdog patch for additional Intel Cougar Point DeviceIDs
commit 4c7d849204 upstream.

This patch adds the Intel Cougar Point PCH LPC Controller DeviceIDs for iTCO Watchdog.

Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:18 -07:00
Thomas Mingarelli
628f1a0323 WATCHDOG: hpwdt - fix lower timeout limit
commit 8ba42bd88c upstream.

[Novell Bug 581103] HP Watchdog driver has arbitrary (wrong) timeout limits.
Fix the lower timeout limit to a more appropriate value.

Signed-off-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:18 -07:00
Wey-Yi Guy
a850f39d86 mac80211: tear down all agg queues when restart/reconfig hw
commit 74e2bd1fa3 upstream.

When there is a need to restart/reconfig hw, tear down all the
aggregation queues and let the mac80211 and driver get in-sync to have
the opportunity to re-establish the aggregation queues again.

Need to wait until driver re-establish all the station information before tear
down the aggregation queues, driver(at least iwlwifi driver) will reject the
stop aggregation queue request if station is not ready. But also need to make
sure the aggregation queues are tear down before waking up the queues, so
mac80211 will not sending frames with aggregation bit set.

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:18 -07:00
Johannes Berg
6a45694284 mac80211: move netdev queue enabling to correct spot
commit 7236fe29fd upstream.

"mac80211: fix skb buffering issue" still left a race
between enabling the hardware queues and the virtual
interface queues. In hindsight it's totally obvious
that enabling the netdev queues for a hardware queue
when the hardware queue is enabled is wrong, because
it could well possible that we can fill the hw queue
with packets we already have pending. Thus, we must
only enable the netdev queues once all the pending
packets have been processed and sent off to the device.

In testing, I haven't been able to trigger this race
condition, but it's clearly there, possibly only when
aggregation is being enabled.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:17 -07:00
Valentin Longchamp
c7f02946f8 setup correct int pipe type in ar9170_usb_exec_cmd
commit 2d20c72c02 upstream.

An int urb is constructed but we fill it in with a bulk pipe type.

Commit f661c6f8c6 implemented a pipe type
check when CONFIG_USB_DEBUG is enabled. The check failed for all the ar9170
usb transfers and the driver could not configure the wifi dongle.

This went unnoticed until now because most people don't have
CONFIG_USB_DEBUG enabled.

Signed-off-by: Valentin Longchamp <valentin.longchamp@epfl.ch>
Acked-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:17 -07:00
Dan Carpenter
a2266949a5 iwlwifi: range checking issue
commit 8e1a53c615 upstream.

IWL_RATE_COUNT is 13 and IWL_RATE_COUNT_LEGACY is 12.

IWL_RATE_COUNT_LEGACY is the right one here because iwl3945_rates
doesn't support 60M and also that's how "rates" is defined in
iwlcore_init_geos() from drivers/net/wireless/iwlwifi/iwl-core.c.

        rates = kzalloc((sizeof(struct ieee80211_rate) * IWL_RATE_COUNT_LEGACY),
                        GFP_KERNEL);

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:17 -07:00
Stanislaw Gruszka
79ae3ed145 iwlwifi: fix nfreed--
During backporting of a120e912eb
("iwlwifi: sanity check before counting number of tfds can be free")
we forget one hunk, what make lot of messages "free more than
tfds_in_queue" show up in dmesg.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Tested-by: Adel Gadllah <adel.gadllah@gmail.com>
(picked from https://patchwork.kernel.org/patch/86722/)
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:17 -07:00
Wey-Yi Guy
e69f8f5303 iwlwifi: counting number of tfds can be free for 4965
commit be6b38bcb1 upstream.

Forget one hunk in 4965 during "iwlwifi: error checking for number of tfds
in queue" patch.

Reported-by: Shanyu Zhao <shanyu.zhao@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:17 -07:00
Matt Helsley
a0223a1cdb Freezer: Fix buggy resume test for tasks frozen with cgroup freezer
commit 5a7aadfe2f upstream.

When the cgroup freezer is used to freeze tasks we do not want to thaw
those tasks during resume. Currently we test the cgroup freezer
state of the resuming tasks to see if the cgroup is FROZEN.  If so
then we don't thaw the task. However, the FREEZING state also indicates
that the task should remain frozen.

This also avoids a problem pointed out by Oren Ladaan: the freezer state
transition from FREEZING to FROZEN is updated lazily when userspace reads
or writes the freezer.state file in the cgroup filesystem. This means that
resume will thaw tasks in cgroups which should be in the FROZEN state if
there is no read/write of the freezer.state file to trigger this
transition before suspend.

NOTE: Another "simple" solution would be to always update the cgroup
freezer state during resume. However it's a bad choice for several reasons:
Updating the cgroup freezer state is somewhat expensive because it requires
walking all the tasks in the cgroup and checking if they are each frozen.
Worse, this could easily make resume run in N^2 time where N is the number
of tasks in the cgroup. Finally, updating the freezer state from this code
path requires trickier locking because of the way locks must be ordered.

Instead of updating the freezer state we rely on the fact that lazy
updates only manage the transition from FREEZING to FROZEN. We know that
a cgroup with the FREEZING state may actually be FROZEN so test for that
state too. This makes sense in the resume path even for partially-frozen
cgroups -- those that really are FREEZING but not FROZEN.

Reported-by: Oren Ladaan <orenl@cs.columbia.edu>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:17 -07:00
Mike Christie
d3ce482f41 libiscsi: Fix recovery slowdown regression
commit 4ae0a6c15e upstream.

We could be failing/stopping a connection due to libiscsi starting
recovery/cleanup, but the xmit path or scsi eh thread path
could be dropping the connection at the same time.

As a result the session->state gets set to failed instead of in
recovery. We end up not blocking the session
and so the replacement timeout never gets started and we only end up
failing the IO when scsi_softirq_done sees that the
cmd has been running for (cmd->allowed + 1) * rq->timeout secs.

We used to fail the IO right away so users are seeing a long
delay when using dm-multipath. This problem was added in
2.6.28.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:17 -07:00
Andrew Stubbs
bbe04b45ff sh: Fix FDPIC binary loader
commit d5ab780305 upstream.

Ensure that the aux table is properly initialized, even when optional
features are missing. Without this, the FDPIC loader did not work.

Signed-off-by: Andrew Stubbs <ams@codesourcery.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:17 -07:00
Matt Fleming
69fd993218 sh: Enable the mmu in start_secondary()
commit 4bea3418c7 upstream.

For the boot, enable_mmu() is called from setup_arch() but we don't call
setup_arch() for any of the other cpus. So turn on the non-boot cpu's
mmu inside of start_secondary().

I noticed this bug on an SMP board when trying to map I/O memory
(smsc911x registers) into the kernel address space. Since the Address
Translation bit in MMUCR wasn't set, accessing the virtual address where
the smsc911x registers were supposedly mapped actually performed a
physical address access.

Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:16 -07:00
Christoph Hellwig
94aa7e9130 xfs: fix locking for inode cache radix tree tag updates
commit f1f724e4b5 upstream

The radix-tree code requires it's users to serialize tag updates
against other updates to the tree.  While XFS protects tag updates
against each other it does not serialize them against updates of the
tree contents, which can lead to tag corruption.  Fix the inode
cache to always take pag_ici_lock in exclusive mode when updating
radix tree tags.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Patrick Schreurs <patrick@news-service.com>
Tested-by: Patrick Schreurs <patrick@news-service.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:16 -07:00
Dave Chinner
e52af5078f xfs: Non-blocking inode locking in IO completion
commit 77d7a0c2ee upstream

The introduction of barriers to loop devices has created a new IO
order completion dependency that XFS does not handle. The loop
device implements barriers using fsync and so turns a log IO in the
XFS filesystem on the loop device into a data IO in the backing
filesystem. That is, the completion of log IOs in the loop
filesystem are now dependent on completion of data IO in the backing
filesystem.

This can cause deadlocks when a flush daemon issues a log force with
an inode locked because the IO completion of IO on the inode is
blocked by the inode lock. This in turn prevents further data IO
completion from occuring on all XFS filesystems on that CPU (due to
the shared nature of the completion queues). This then prevents the
log IO from completing because the log is waiting for data IO
completion as well.

The fix for this new completion order dependency issue is to make
the IO completion inode locking non-blocking. If the inode lock
can't be grabbed, simply requeue the IO completion back to the work
queue so that it can be processed later. This prevents the
completion queue from being blocked and allows data IO completion on
other inodes to proceed, hence avoiding completion order dependent
deadlocks.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:16 -07:00
Christoph Hellwig
76962b5ed7 xfs: remove invalid barrier optimization from xfs_fsync
commit e8b217e753 upstream

Date: Tue, 2 Feb 2010 10:16:26 +1100
We always need to flush the disk write cache and can't skip it just because
the no inode attributes have changed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:16 -07:00
Dave Chinner
83b546b837 xfs: don't hold onto reserved blocks on remount, ro
commit cbe132a8bd upstream

If we hold onto reserved blocks when doing a remount,ro we end
up writing the blocks used count to disk that includes the reserved
blocks. Reserved blocks are not actually used, so this results in
the values in the superblock being incorrect.

Hence if we run xfs_check or xfs_repair -n while the filesystem is
mounted remount,ro we end up with an inconsistent filesystem being
reported. Also, running xfs_copy on the remount,ro filesystem will
result in an inconsistent image being generated.

To fix this, unreserve the blocks when doing the remount,ro, and
reserved them again on remount,rw. This way a remount,ro filesystem
will appear consistent on disk to all utilities.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:16 -07:00
Christoph Hellwig
1b8e6dfb49 xfs: quota limit statvfs available blocks
commit 9b00f30762 upstream

A "df" run on an NFS client of an exported XFS file system reports
the wrong information for "available" blocks.  When a block quota is
enforced, the amount reported as free is limited by the quota, but
the amount reported available is not (and should be).

Reported-by: Guk-Bong, Kwon <gbkwon@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:16 -07:00
Dave Chinner
059f754c44 xfs: xfs_swap_extents needs to handle dynamic fork offsets
commit e09f98606d upstream

When swapping extents, we can corrupt inodes by swapping data forks
that are in incompatible formats.  This is caused by the two indoes
having different fork offsets due to the presence of an attribute
fork on an attr2 filesystem.  xfs_fsr tries to be smart about
setting the fork offset, but the trick it plays only works on attr1
(old fixed format attribute fork) filesystems.

Changing the way xfs_fsr sets up the attribute fork will prevent
this situation from ever occurring, so in the kernel code we can get
by with a preventative fix - check that the data fork in the
defragmented inode is in a format valid for the inode it is being
swapped into.  This will lead to files that will silently and
potentially repeatedly fail defragmentation, so issue a warning to
the log when this particular failure occurs to let us know that
xfs_fsr needs updating/fixing.

To help identify how to improve xfs_fsr to avoid this issue, add
trace points for the inodes being swapped so that we can determine
why the swap was rejected and to confirm that the code is making the
right decisions and modifications when swapping forks.

A further complication is even when the swap is allowed to proceed
when the fork offset is different between the two inodes then value
for the maximum number of extents the data fork can hold can be
wrong. Make sure these are also set correctly after the swap occurs.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:16 -07:00
Dave Chinner
bcb56a6645 xfs: fix stale inode flush avoidance
commit 4b6a46882c upstream

When reclaiming stale inodes, we need to guarantee that inodes are
unpinned before returning with a "clean" status. If we don't we can
reclaim inodes that are pinned, leading to use after free in the
transaction subsystem as transactions complete.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:15 -07:00
Dave Chinner
9e1e9675fb xfs: reclaim all inodes by background tree walks
commit 57817c6822 upstream

We cannot do direct inode reclaim without taking the flush lock to
ensure that we do not reclaim an inode under IO. We check the inode
is clean before doing direct reclaim, but this is not good enough
because the inode flush code marks the inode clean once it has
copied the in-core dirty state to the backing buffer.

It is the flush lock that determines whether the inode is still
under IO, even though it is marked clean, and the inode is still
required at IO completion so we can't reclaim it even though it is
clean in core. Hence the requirement that we need to take the flush
lock even on clean inodes because this guarantees that the inode
writeback IO has completed and it is safe to reclaim the inode.

With delayed write inode flushing, we could end up waiting a long
time on the flush lock even for a clean inode. The background
reclaim already handles this efficiently, so avoid all the problems
by killing the direct reclaim path altogether.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:15 -07:00
Dave Chinner
22a482c621 xfs: Avoid inodes in reclaim when flushing from inode cache
commit 018027be90 upstream

The reclaim code will handle flushing of dirty inodes before reclaim
occurs, so avoid them when determining whether an inode is a
candidate for flushing to disk when walking the radix trees.  This
is based on a test patch from Christoph Hellwig.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:15 -07:00
Dave Chinner
96ce91ba51 xfs: reclaim inodes under a write lock
commit c8e20be020 upstream

Make the inode tree reclaim walk exclusive to avoid races with
concurrent sync walkers and lookups. This is a version of a patch
posted by Christoph Hellwig that avoids all the code duplication.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:15 -07:00
Dave Chinner
74a1282e07 xfs: Ensure we force all busy extents in range to disk
commit fd45e47841 upstream

When we search for and find a busy extent during allocation we
force the log out to ensure the extent free transaction is on
disk before the allocation transaction. The current implementation
has a subtle bug in it--it does not handle multiple overlapping
ranges.

That is, if we free lots of little extents into a single
contiguous extent, then allocate the contiguous extent, the busy
search code stops searching at the first extent it finds that
overlaps the allocated range. It then uses the commit LSN of the
transaction to force the log out to.

Unfortunately, the other busy ranges might have more recent
commit LSNs than the first busy extent that is found, and this
results in xfs_alloc_search_busy() returning before all the
extent free transactions are on disk for the range being
allocated. This can lead to potential metadata corruption or
stale data exposure after a crash because log replay won't replay
all the extent free transactions that cover the allocation range.

Modified-by: Alex Elder <aelder@sgi.com>

(Dropped the "found" argument from the xfs_alloc_busysearch trace
event.)

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:15 -07:00
Dave Chinner
d4203bda89 xfs: Don't flush stale inodes
commit 44e08c45cc upstream

Because inodes remain in cache much longer than inode buffers do
under memory pressure, we can get the situation where we have
stale, dirty inodes being reclaimed but the backing storage has
been freed.  Hence we should never, ever flush XFS_ISTALE inodes
to disk as there is no guarantee that the backing buffer is in
cache and still marked stale when the flush occurs.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:15 -07:00
Christoph Hellwig
8ae95bb907 xfs: fix timestamp handling in xfs_setattr
commit d6d59bada3 upstream

We currently have some rather odd code in xfs_setattr for
updating the a/c/mtime timestamps:

 - first we do a non-transaction update if all three are updated
   together
 - second we implicitly update the ctime for various changes
   instead of relying on the ATTR_CTIME flag
 - third we set the timestamps to the current time instead of the
   arguments in the iattr structure in many cases.

This patch makes sure we update it in a consistent way:

 - always transactional
 - ctime is only updated if ATTR_CTIME is set or we do a size
   update, which is a special case
 - always to the times passed in from the caller instead of the
   current time

The only non-size caller of xfs_setattr that doesn't come from
the VFS is updated to set ATTR_CTIME and pass in a valid ctime
value.

Reported-by: Eric Blake <ebb9@byu.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:15 -07:00
Christoph Hellwig
55f4564203 xfs: check for not fully initialized inodes in xfs_ireclaim
commit b44b112627 upstream

Add an assert for inodes not added to the inode cache in xfs_ireclaim,
to make sure we're not going to introduce something like the
famous nfsd inode cache bug again.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:15 -07:00
Jason Gunthorpe
b3e6e64f48 xfs: Fix error return for fallocate() on XFS
commit 44a743f687 upstream

Noticed that through glibc fallocate would return 28 rather than -1
and errno = 28 for ENOSPC. The xfs routines uses XFS_ERROR format
positive return error codes while the syscalls use negative return
codes.  Fixup the two cases in xfs_vn_fallocate syscall to convert to
negative.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:14 -07:00
Andy Poling
ce49b28e33 xfs: Wrapped journal record corruption on read at recovery
commit fc5bc4c85c upstream

Summary of problem:

If a journal record wraps at the physical end of the journal, it has to be
read in two parts in xlog_do_recovery_pass(): a read at the physical end and a
read at the physical beginning.  If xlog_bread() has to re-align the first
read, the second read request does not take that re-alignment into account.
If the first read was re-aligned, the second read over-writes the end of the
data from the first read, effectively corrupting it.  This can happen either
when reading the record header or reading the record data.

The first sanity check in xlog_recover_process_data() is to check for a valid
clientid, so that is the error reported.

Summary of fix:

If there was a first read at the physical end, XFS_BUF_PTR() returns where the
data was requested to begin.  Conversely, because it is the result of
xlog_align(), offset indicates where the requested data for the first read
actually begins - whether or not xlog_bread() has re-aligned it.

Using offset as the base for the calculation of where to place the second read
data ensures that it will be correctly placed immediately following the data
from the first read instead of sometimes over-writing the end of it.

The attached patch has resolved the reported problem of occasional inability
to recover the journal (reporting "bad clientid").

Signed-off-by: Andy Poling <andy@realbig.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:14 -07:00
Christoph Hellwig
fe5bdcac72 xfs: I/O completion handlers must use NOFS allocations
commit 80641dc66a upstream

When completing I/O requests we must not allow the memory allocator to
recurse into the filesystem, as we might deadlock on waiting for the
I/O completion otherwise.  The only thing currently allocating normal
GFP_KERNEL memory is the allocation of the transaction structure for
the unwritten extent conversion.  Add a memflags argument to
_xfs_trans_alloc to allow controlling the allocator behaviour.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Thomas Neumann <tneumann@users.sourceforge.net>
Tested-by: Thomas Neumann <tneumann@users.sourceforge.net>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:14 -07:00