Commit Graph

34205 Commits

Author SHA1 Message Date
Ben Hutchings
c375e84ef8 powerpc/fsl: Add PCI device ids for new QoirQ chips
commit a3f62bd2b2 upstream by
Kumar Gala <galak@kernel.crashing.org>.  I have adjusted the patch
context for 2.6.32.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:00:40 -08:00
Xiaotian Feng
9396c90300 ACPI: don't cond_resched if irq is disabled
commit c084ca704a upstream.

commit 8bd108d adds preemption point after each opcode parse, then
a sleeping function called from invalid context bug was founded
during suspend/resume stage. this was fixed in commit abe1dfa by
don't cond_resched when irq_disabled. But recent commit 138d156 changes
the behaviour to don't cond_resched when in_atomic. This makes the
sleeping function called from invalid context bug happen again, which
is reported in http://lkml.org/lkml/2009/12/1/371.

This patch also fixes http://bugzilla.kernel.org/show_bug.cgi?id=14483

Reported-and-bisected-by: Larry Finger <Larry.Finger@lwfinger.net>
Reported-and-bisected-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Acked-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28 15:00:26 -08:00
Martin K. Petersen
fbe2992083 block: bdev_stack_limits wrapper
commit 17be8c2450 upstream.

DM does not want to know about partition offsets.  Add a partition-aware
wrapper that DM can use when stacking block devices.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-25 10:49:40 -08:00
James Bottomley
8cef765ff1 SCSI: enclosure: fix oops while iterating enclosure_status array
commit cc9b2e9f66 upstream.

Based on patch originally by Jeff Mahoney <jeffm@suse.com>

 enclosure_status is expected to be a NULL terminated array of strings
 but isn't actually NULL terminated. When writing an invalid value to
 /sys/class/enclosure/.../.../status, it goes off the end of the array
 and Oopses.


Fix by making the assumption true and adding NULL at the end.

Reported-by: Artur Wojcik <artur.wojcik@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-25 10:49:37 -08:00
Benjamin Herrenschmidt
2db740cb36 PCI/cardbus: Add a fixup hook and fix powerpc
commit 2d1c861871 upstream

The cardbus code creates PCI devices without ever going through the
necessary fixup bits and pieces that normal PCI devices go through.

There's in fact a commented out call to pcibios_fixup_bus() in there,
it's commented because ... it doesn't work.

I could make pcibios_fixup_bus() do the right thing on powerpc easily
but I felt it cleaner instead to provide a specific hook pci_fixup_cardbus
for which a weak empty implementation is provided by the PCI core.

This fixes cardbus on powerbooks and probably all other PowerPC
platforms which was broken completely for ever on some platforms and
since 2.6.31 on others such as PowerBooks when we made the DMA ops
mandatory (since those are setup by the fixups).

Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22 15:18:26 -08:00
Mark Brown
34e7aa0779 mfd: Correct WM835x ISINK ramp time defines
commit 9dffe2a32b upstream.

The constants used to specify ISINK ramp times for WM835x had the
wrong shifts so that the on times applied to the off ramp and vice
versa. The masks for the bitfields are correct.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22 15:18:19 -08:00
Martin K. Petersen
1bd24fdff4 block: Fix incorrect reporting of partition alignment
commit 81744ee44a upstream

queue_sector_alignment_offset returned the wrong value which caused
partitions to report an incorrect alignment_offset. Since offset
calculation is needed several places it has been split into a separate
helper function.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22 15:18:07 -08:00
Zhenyu Wang
de040919d2 drm: remove address mask param for drm_pci_alloc()
commit e6be8d9d17 upstream.

drm_pci_alloc() has input of address mask for setting pci dma
mask on the device, which should be properly setup by drm driver.
And leave it as a param for drm_pci_alloc() would cause confusion
or mistake would corrupt the correct dma mask setting, as seen on
intel hw which set wrong dma mask for hw status page. So remove
it from drm_pci_alloc() function.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18 10:19:23 -08:00
Al Viro
1f51eb3a88 untangle the do_mremap() mess
This backports the following upstream commits all as one patch:
	54f5de7099
	ecc1a89937
	1a0ef85f84
	f106af4e90
	097eed1038
	935874141d
	0ec62d2909
	c4caa77815
	2ea1d13f64
	570dcf2c15
	564b3bffc6
	0067bd8a55
	f8b7256096
	8c7b49b3ec
	9206de95b1
	2c6a10161d
	05d72faa6d
	bb52d66940
	e77414e0aa
	aa65607373

Backport done by Greg Kroah-Hartman.  Only minor tweaks were needed.

Cc: David S. Miller <davem@davemloft.net>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18 10:19:11 -08:00
Dmitry Monakhov
bbf245072d quota: decouple fs reserved space from quota reservation
commit fd8fbfc170 upstream.

Currently inode_reservation is managed by fs itself and this
reservation is transfered on dquot_transfer(). This means what
inode_reservation must always be in sync with
dquot->dq_dqb.dqb_rsvspace. Otherwise dquot_transfer() will result
in incorrect quota(WARN_ON in dquot_claim_reserved_space() will be
triggered)
This is not easy because of complex locking order issues
for example http://bugzilla.kernel.org/show_bug.cgi?id=14739

The patch introduce quota reservation field for each fs-inode
(fs specific inode is used in order to prevent bloating generic
vfs inode). This reservation is managed by quota code internally
similar to i_blocks/i_bytes and may not be always in sync with
internal fs reservation.

Also perform some code rearrangement:
- Unify dquot_reserve_space() and dquot_reserve_space()
- Unify dquot_release_reserved_space() and dquot_free_space()
- Also this patch add missing warning update to release_rsv()
  dquot_release_reserved_space() must call flush_warnings() as
  dquot_free_space() does.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:05:03 -08:00
Dmitry Monakhov
f07c88dd6c Add unlocked version of inode_add_bytes() function
commit b462707e7c upstream.

Quota code requires unlocked version of this function. Off course
we can just copy-paste the code, but copy-pasting is always an evil.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:05:01 -08:00
Peter Zijlstra
a09adfeb9e sched: Fix balance vs hotplug race
commit 6ad4c18884 upstream.

Since (e761b77: cpu hotplug, sched: Introduce cpu_active_map and redo
sched domain managment) we have cpu_active_mask which is suppose to rule
scheduler migration and load-balancing, except it never (fully) did.

The particular problem being solved here is a crash in try_to_wake_up()
where select_task_rq() ends up selecting an offline cpu because
select_task_rq_fair() trusts the sched_domain tree to reflect the
current state of affairs, similarly select_task_rq_rt() trusts the
root_domain.

However, the sched_domains are updated from CPU_DEAD, which is after the
cpu is taken offline and after stop_machine is done. Therefore it can
race perfectly well with code assuming the domains are right.

Cure this by building the domains from cpu_active_mask on
CPU_DOWN_PREPARE.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:04:49 -08:00
Patrick McHardy
048a424c28 netfilter: fix crashes in bridge netfilter caused by fragment jumps
commit 8fa9ff6849 upstream.

When fragments from bridge netfilter are passed to IPv4 or IPv6 conntrack
and a reassembly queue with the same fragment key already exists from
reassembling a similar packet received on a different device (f.i. with
multicasted fragments), the reassembled packet might continue on a different
codepath than where the head fragment originated. This can cause crashes
in bridge netfilter when a fragment received on a non-bridge device (and
thus with skb->nf_bridge == NULL) continues through the bridge netfilter
code.

Add a new reassembly identifier for packets originating from bridge
netfilter and use it to put those packets in insolated queues.

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=14805

Reported-and-Tested-by: Chong Qiao <qiaochong@loongson.cn>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:04:40 -08:00
Patrick McHardy
89cf4f4c85 ipv6: reassembly: use seperate reassembly queues for conntrack and local delivery
commit 0b5ccb2ee2 upstream.

Currently the same reassembly queue might be used for packets reassembled
by conntrack in different positions in the stack (PREROUTING/LOCAL_OUT),
as well as local delivery. This can cause "packet jumps" when the fragment
completing a reassembled packet is queued from a different position in the
stack than the previous ones.

Add a "user" identifier to the reassembly queue key to seperate the queues
of each caller, similar to what we do for IPv4.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:04:39 -08:00
David Howells
0399123f3d NOMMU: Optimise away the {dac_,}mmap_min_addr tests
commit 6e14154676 upstream.

In NOMMU mode clamp dac_mmap_min_addr to zero to cause the tests on it to be
skipped by the compiler.  We do this as the minimum mmap address doesn't make
any sense in NOMMU mode.

mmap_min_addr and round_hint_to_min() can be discarded entirely in NOMMU mode.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06 15:04:30 -08:00
Tejun Heo
f0cc8412be vmalloc: conditionalize build of pcpu_get_vm_areas()
No matching upstream commit as it was resolved differently there.


pcpu_get_vm_areas() is used only when dynamic percpu allocator is used
by the architecture.  In 2.6.32, ia64 doesn't use dynamic percpu
allocator and has a macro which makes pcpu_get_vm_areas() buggy via
local/global variable aliasing and triggers compile warning.

The problem is fixed in upstream and ia64 uses dynamic percpu
allocators, so the only left issue is inclusion of unnecessary code
and compile warning on ia64 on 2.6.32.

Don't build pcpu_get_vm_areas() if legacy percpu allocator is in use.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jan Beulich <JBeulich@novell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:05:35 -08:00
Li Peng
e8d9252227 drm/i915: Fix sync to vblank when VGA output is turned off
commit 778c902640 upstream

In current vblank-wait implementation, if we turn off VGA output,
drm_wait_vblank will still wait on the disabled pipe until timeout,
because vblank on the pipe is assumed be enabled. This would cause
slow system response on some system such as moblin.

This patch resolve the issue by adding a drm helper function
drm_vblank_off which explicitly clear vblank_enabled[crtc], wake up
any waiting queue and save last vblank counter before turning off
crtc. It also slightly change drm_vblank_get to ensure that we will
will return immediately if trying to wait on a disabled pipe.

Signed-off-by: Li Peng <peng.li@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[anholt: hand-applied for conflicts with overlay changes]
Signed-off-by: Eric Anholt <eric@anholt.net>
Cc: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:05:26 -08:00
Johannes Berg
7847a351aa tracing: Fix event format export
commit 811cb50baf upstream.

For some reason the export of the event print format to userspace
uses '#fmt' which breaks if the format string is anything but a plain
string, for example if it is built with macros then the macro names
are exported instead of their contents.

Use
	"\"%s\"", fmt
instead of
	"%s", #fmt
to export the string and not the way it is built.

For example, in net/mac80211/driver-trace.h for the trace event drv_start
there is:

        TP_printk(
                LOCAL_PR_FMT, LOCAL_PR_ARG
        )

Which use to produce:

 print fmt: LOCAL_PR_FMT, REC->wiphy_name

Now produces:

 print fmt: "%s", REC->wiphy_name

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
LKML-Reference: <20091113224009.GB23942@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:05:16 -08:00
Damian Lukowski
0d975c7ebd tcp: Stalling connections: Fix timeout calculation routine
[ Upstream commit 07f29bc5bb ]

This patch fixes a problem in the TCP connection timeout calculation.
Currently, timeout decisions are made on the basis of the current
tcp_time_stamp and retrans_stamp, which is usually set at the first
retransmission.
However, if the retransmission fails in tcp_retransmit_skb(),
retrans_stamp is not updated and remains zero. This leads to wrong
decisions in retransmits_timed_out() if tcp_time_stamp is larger than
the specified timeout, which is very likely.
In this case, the TCP connection dies after the first attempted
(and unsuccessful) retransmission.

With this patch, tcp_skb_cb->when is used instead, when retrans_stamp
is not available.

This bug has been introduced together with retransmits_timed_out() in
2.6.32, as the number of retransmissions has been used for timeout
decisions before. The corresponding commit was
6fa12c8503 (Revert Backoff [v3]:
Calculate TCP's connection close threshold as a time value.).

Thanks to Ilpo Järvinen for code suggestions and Frederic Leroy for
testing.

Reported-by: Frederic Leroy <fredo@starox.org>
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:05:05 -08:00
Martin Michlmayr
4a6ab40474 drm/ttm: Fix build failure due to missing struct page
commit c3a73ba13b upstream.

drm/ttm fails to build on MIPS because "struct page" is not known:
| In file included from drivers/gpu/drm/ttm/ttm_memory.c:28:
| include/drm/ttm/ttm_memory.h:154: warning: 'struct page' declared inside parameter list
| include/drm/ttm/ttm_memory.h:154: warning: its scope is only this definition or declaration, which is probably not what you want
| include/drm/ttm/ttm_memory.h:156: warning: 'struct page' declared inside parameter list
| drivers/gpu/drm/ttm/ttm_memory.c:540: error: conflicting types for 'ttm_mem_global_alloc_page'
| include/drm/ttm/ttm_memory.h:154: error: previous declaration of 'ttm_mem_global_alloc_page' was here
| drivers/gpu/drm/ttm/ttm_memory.c:561: error: conflicting types for 'ttm_mem_global_free_page'
| include/drm/ttm/ttm_memory.h:156: error: previous declaration of 'ttm_mem_global_free_page' was here

Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Acked-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:04:42 -08:00
Alan Stern
3d5536bccf USB: usb-storage: add BAD_SENSE flag
commit a0bb108112 upstream.

This patch (as1311) fixes a problem in usb-storage: Some devices are
pretty broken when it comes to reporting sense data.  The information
they send back indicates that they have more than 18 bytes of sense
data available, but when the system asks for more than 18 they fail or
hang.  The symptom is that probing fails with multiple resets.

The patch adds a new BAD_SENSE flag to indicate that usb-storage
should never ask for more than 18 bytes of sense data.  The flag can
be set in an unusual_devs entry or via the "quirks=" module parameter,
and it is set automatically whenever a REQUEST SENSE command for more
than 18 bytes fails or times out.

An unusual_devs entry is added for the Agfa photo frame, which uses a
Prolific chip having this bug.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Daniel Kukula <daniel.kuku@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:59 -08:00
Carsten Otte
fff7d05d83 KVM: s390: Make psw available on all exits, not just a subset
commit d7b0b5eb30 upstream.

This patch moves s390 processor status word into the base kvm_run
struct and keeps it up-to date on all userspace exits.

The userspace ABI is broken by this, however there are no applications
in the wild using this.  A capability check is provided so users can
verify the updated API exists.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:35 -08:00
Feng Tang
630d0a27ad hrtimer: Fix /proc/timer_list regression
commit 8629ea2eab upstream.

commit 507e1231 (timer stats: Optimize by adding quick check to avoid
function calls) introduced a regression in /proc/timer_list.

/proc/timer_list shows now
 #0: <c27d46b0>, tick_sched_timer, S:01, <(null)>, /-1
instead of
 #0: <c27d46b0>, tick_sched_timer, S:01, hrtimer_start, swapper/0

Revert the hrtimer quick check for now. The optimization needs more
thought, but this is neither 2.6.32-rc7 nor stable material.

[ tglx: - Removed unrelated changes from the original patch
  	- Prevent unneccesary call to timer_stats_update_stats
	- massaged the changelog ]

Signed-off-by: Feng Tang <feng.tang@intel.com>
LKML-Reference: <alpine.LFD.2.00.0911181933540.24119@localhost.localdomain>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:28 -08:00
Arjan van de Ven
c64d2a3d2c perf_event: Fix invalid type in ioctl definition
commit 4c49b12853 upstream.

u64 is invalid in userspace headers, including ioctl
definitions; use __u64 instead

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <20091113214733.7cd76be9@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:03:07 -08:00
Martin Michlmayr
79daedf8b6 SCSI: osd_protocol.h: Add missing #include
commit 0899638688 upstream.

include/scsi/osd_protocol.h uses ALIGN() without an #include
<linux/kernel.h>, leading to:
| include/scsi/osd_protocol.h:362: error: implicit declaration of function 'ALIGN'

Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:45 -08:00
James Bottomley
d888b1a2d5 SCSI: scsi_lib_dma: fix bug with dma maps on nested scsi objects
commit d139b9bd0e upstream.

Some of our virtual SCSI hosts don't have a proper bus parent at the
top, which can be a problem for doing DMA on them

This patch makes the host device cache a pointer to the physical bus
device and provides an extra API for setting it (the normal API picks
it up from the parent).  This patch also modifies the qla2xxx and lpfc
vport logic to use the new DMA host setting API.

Acked-By: James Smart  <james.smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:44 -08:00
Sebastian Andrzej Siewior
98d338a702 signal: Fix alternate signal stack check
commit 2a855dd01b upstream.

All architectures in the kernel increment/decrement the stack pointer
before storing values on the stack.

On architectures which have the stack grow down sas_ss_sp == sp is not
on the alternate signal stack while sas_ss_sp + sas_ss_size == sp is
on the alternate signal stack.

On architectures which have the stack grow up sas_ss_sp == sp is on
the alternate signal stack while sas_ss_sp + sas_ss_size == sp is not
on the alternate signal stack.

The current implementation fails for architectures which have the
stack grow down on the corner case where sas_ss_sp == sp.This was
reported as Debian bug #544905 on AMD64.
Simplified test case: http://download.breakpoint.cc/tc-sig-stack.c

The test case creates the following stack scenario:
   0xn0300	stack top
   0xn0200	alt stack pointer top (when switching to alt stack)
   0xn01ff	alt stack end
   0xn0100	alt stack start == stack pointer

If the signal is sent the stack pointer is pointing to the base
address of the alt stack and the kernel erroneously decides that it
has already switched to the alternate stack because of the current
check for "sp - sas_ss_sp < sas_ss_size"

On parisc (stack grows up) the scenario would be:
   0xn0200	stack pointer
   0xn01ff	alt stack end
   0xn0100	alt stack start = alt stack pointer base
   		    	  	  (when switching to alt stack)
   0xn0000	stack base

This is handled correctly by the current implementation.

[ tglx: Modified for archs which have the stack grow up (parisc) which
  	would fail with the correct implementation for stack grows
  	down. Added a check for sp >= current->sas_ss_sp which is
  	strictly not necessary but makes the code symetric for both
  	variants ]

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
LKML-Reference: <20091025143758.GA6653@Chamillionaire.breakpoint.cc>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14 09:44:42 -08:00
Linus Torvalds
56f3f55cf9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
  mfd: Correct WM831X_MAX_ISEL_VALUE
2009-12-02 15:41:49 -08:00
David Howells
f13a48bd79 SLOW_WORK: Move slow_work's proc file to debugfs
Move slow_work's debugging proc file to debugfs.

Signed-off-by: David Howells <dhowells@redhat.com>
Requested-and-acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-01 08:20:31 -08:00
Mark Brown
7716977b6a mfd: Correct WM831X_MAX_ISEL_VALUE
There was confusion between the array size and the highest ISEL
value possible.

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-01 11:24:19 +01:00
Linus Torvalds
29e553631b Merge branch 'security' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
* 'security' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6:
  mac80211: fix spurious delBA handling
  mac80211: fix two remote exploits
2009-11-30 16:47:16 -08:00
Linus Torvalds
ed9fd93e9a Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] fix crash when disconnecting usb storage
  [SCSI] fix async scan add/remove race resulting in an oops
  [SCSI] sd: Return correct error code for DIF
2009-11-30 15:21:50 -08:00
Linus Torvalds
cd79bf7b1c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
  b44: Fix wedge when using netconsole.
  wan: cosa: drop chan->wsem on error path
  ep93xx-eth: check for zero MAC address on probe, not on device open
  NET: smc91x: Fix irq flags
  smsc9420: prevent BUG() if ethtool is called with interface down
  r8169: restore mac addr in rtl8169_remove_one and rtl_shutdown
  ipv4: additional update of dev_net(dev) to struct *net in ip_fragment.c, NULL ptr OOPS
  e100: Use pci pool to work around GFP_ATOMIC order 5 memory allocation failure
  sctp: on T3_RTX retransmit all the in-flight chunks
  pktgen: Fix netdevice unregister
  macvlan: fix gso_max_size setting
  rfkill: fix miscdev ops
  ath9k: set ps_default as false
  hso: fix soft-lockup
  hso: fix debug routines
  pktgen: Fix device name compares
  stmmac: do not fail when the timer cannot be used.
  stmmac: fixed a compilation error when use the external timer
  netfilter: xt_limit: fix invalid return code in limit_mt_check()
  Au1x00: fix crash when trying register_netdev()
  ...
2009-11-30 14:01:36 -08:00
Linus Torvalds
6e80133f7f Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (31 commits)
  FS-Cache: Provide nop fscache_stat_d() if CONFIG_FSCACHE_STATS=n
  SLOW_WORK: Fix GFS2 to #include <linux/module.h> before using THIS_MODULE
  SLOW_WORK: Fix CIFS to pass THIS_MODULE to slow_work_register_user()
  CacheFiles: Don't log lookup/create failing with ENOBUFS
  CacheFiles: Catch an overly long wait for an old active object
  CacheFiles: Better showing of debugging information in active object problems
  CacheFiles: Mark parent directory locks as I_MUTEX_PARENT to keep lockdep happy
  CacheFiles: Handle truncate unlocking the page we're reading
  CacheFiles: Don't write a full page if there's only a partial page to cache
  FS-Cache: Actually requeue an object when requested
  FS-Cache: Start processing an object's operations on that object's death
  FS-Cache: Make sure FSCACHE_COOKIE_LOOKING_UP cleared on lookup failure
  FS-Cache: Add a retirement stat counter
  FS-Cache: Handle pages pending storage that get evicted under OOM conditions
  FS-Cache: Handle read request vs lookup, creation or other cache failure
  FS-Cache: Don't delete pending pages from the page-store tracking tree
  FS-Cache: Fix lock misorder in fscache_write_op()
  FS-Cache: The object-available state can't rely on the cookie to be available
  FS-Cache: Permit cache retrieval ops to be interrupted in the initial wait phase
  FS-Cache: Use radix tree preload correctly in tracking of pages to be stored
  ...
2009-11-30 13:33:48 -08:00
Johannes Berg
827d42c9ac mac80211: fix spurious delBA handling
Lennert Buytenhek noticed that delBA handling in mac80211
was broken and has remotely triggerable problems, some of
which are due to some code shuffling I did that ended up
changing the order in which things were done -- this was

  commit d75636ef9c
  Author: Johannes Berg <johannes@sipsolutions.net>
  Date:   Tue Feb 10 21:25:53 2009 +0100

    mac80211: RX aggregation: clean up stop session

and other parts were already present in the original

  commit d92684e660
  Author: Ron Rindjunsky <ron.rindjunsky@intel.com>
  Date:   Mon Jan 28 14:07:22 2008 +0200

      mac80211: A-MPDU Tx add delBA from recipient support

The first problem is that I moved a BUG_ON before various
checks -- thereby making it possible to hit. As the comment
indicates, the BUG_ON can be removed since the ampdu_action
callback must already exist when the state is != IDLE.

The second problem isn't easily exploitable but there's a
race condition due to unconditionally setting the state to
OPERATIONAL when a delBA frame is received, even when no
aggregation session was ever initiated. All the drivers
accept stopping the session even then, but that opens a
race window where crashes could happen before the driver
accepts it. Right now, a WARN_ON may happen with non-HT
drivers, while the race opens only for HT drivers.

For this case, there are two things necessary to fix it:
 1) don't process spurious delBA frames, and be more careful
    about the session state; don't drop the lock

 2) HT drivers need to be prepared to handle a session stop
    even before the session was really started -- this is
    true for all drivers (that support aggregation) but
    iwlwifi which can be fixed easily. The other HT drivers
    (ath9k and ar9170) are behaving properly already.

Reported-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-30 13:55:51 -05:00
Andrei Pelinescu-Onciul
5fdd4baef6 sctp: on T3_RTX retransmit all the in-flight chunks
When retransmitting due to T3 timeout, retransmit all the
in-flight chunks for the corresponding  transport/path, including
chunks sent less then 1 rto ago.
This is the correct behaviour according to rfc4960 section 6.3.3
E3 and
"Note: Any DATA chunks that were sent to the address for which the
 T3-rtx timer expired but did not fit in one MTU (rule E3 above)
 should be marked for retransmission and sent as soon as cwnd
 allows (normally, when a SACK arrives). ".

This fixes problems when more then one path is present and the T3
retransmission of the first chunk that timeouts stops the T3 timer
for the initial active path, leaving all the other in-flight
chunks waiting forever or until a new chunk is transmitted on the
same path and timeouts (and this will happen only if the cwnd
allows sending new chunks, but since cwnd was dropped to MTU by
the timeout => it will wait until the first heartbeat).

Example: 10 packets in flight, sent at 0.1 s intervals on the
primary path. The primary path is down and the first packet
timeouts. The first packet is retransmitted on another path, the
T3 timer for the primary path is stopped and cwnd is set to MTU.
All the other 9 in-flight packets will not be retransmitted
(unless more new packets are sent on the primary path which depend
on cwnd allowing it, and even in this case the 9 packets will be
retransmitted only after a new packet timeouts which even in the
best case would be more then RTO).

This commit reverts d0ce92910b and
also removes the now unused transport->last_rto, introduced in
 b6157d8e03.

p.s  The problem is not only when multiple paths are there.  It
can happen in a single homed environment.  If the application
stops sending data, it possible to have a hung association.

Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-29 00:14:02 -08:00
James Bottomley
860dc73608 [SCSI] fix async scan add/remove race resulting in an oops
Async scanning introduced a very wide window where the SCSI device is
up and running but has not yet been added to sysfs.  We delay the
adding until all scans have completed to retain the same ordering as
sync scanning.

This delay in visibility causes an oops if a device is removed before
we make it visible because the SCSI removal routines have an inbuilt
assumption that if a device is in SDEV_RUNNING state, it must be
visible (which is not necessarily true in the async scanning case).

Fix this by introducing an additional is_visible flag which we can use
to condition the tear down so we do the right thing for running but
not yet made visible.

Reported-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-11-26 09:43:39 -06:00
Kevin Wells
4ced24c897 i2c: i2c-pnx: Made buf type unsigned to prevent sign extension
Made buf type unsigned to prevent sign extension

Signed-off-by: Kevin Wells <kevin.wells@nxp.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
2009-11-20 00:25:42 +00:00
Alan Cox
308efab5e2 vt: Fix use of "new" in a struct field
As this struct is exposed to user space and the API was added for this
release it's a bit of a pain for the C++ world and we still have time to
fix it. Rename the fields before we end up with that pain in an actual
release.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Reported-by: Olivier Goffart
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-19 13:43:06 -08:00
David Howells
fee096deb4 CacheFiles: Catch an overly long wait for an old active object
Catch an overly long wait for an old, dying active object when we want to
replace it with a new one.  The probability is that all the slow-work threads
are hogged, and the delete can't get a look in.

What we do instead is:

 (1) if there's nothing in the slow work queue, we sleep until either the dying
     object has finished dying or there is something in the slow work queue
     behind which we can queue our object.

 (2) if there is something in the slow work queue, we return ETIMEDOUT to
     fscache_lookup_object(), which then puts us back on the slow work queue,
     presumably behind the deletion that we're blocked by.  We are then
     deferred for a while until we work our way back through the queue -
     without blocking a slow-work thread unnecessarily.

A backtrace similar to the following may appear in the log without this patch:

	INFO: task kslowd004:5711 blocked for more than 120 seconds.
	"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	kslowd004     D 0000000000000000     0  5711      2 0x00000080
	 ffff88000340bb80 0000000000000046 ffff88002550d000 0000000000000000
	 ffff88002550d000 0000000000000007 ffff88000340bfd8 ffff88002550d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff88002550d2a8
	Call Trace:
	 [<ffffffff81058e21>] ? trace_hardirqs_on+0xd/0xf
	 [<ffffffffa011c4d8>] ? cachefiles_wait_bit+0x0/0xd [cachefiles]
	 [<ffffffffa011c4e1>] cachefiles_wait_bit+0x9/0xd [cachefiles]
	 [<ffffffff81353153>] __wait_on_bit+0x43/0x76
	 [<ffffffff8111ae39>] ? ext3_xattr_get+0x1ec/0x270
	 [<ffffffff813531ef>] out_of_line_wait_on_bit+0x69/0x74
	 [<ffffffffa011c4d8>] ? cachefiles_wait_bit+0x0/0xd [cachefiles]
	 [<ffffffff8104c125>] ? wake_bit_function+0x0/0x2e
	 [<ffffffffa011bc79>] cachefiles_mark_object_active+0x203/0x23b [cachefiles]
	 [<ffffffffa011c209>] cachefiles_walk_to_object+0x558/0x827 [cachefiles]
	 [<ffffffffa011a429>] cachefiles_lookup_object+0xac/0x12a [cachefiles]
	 [<ffffffffa00aa1e9>] fscache_lookup_object+0x1c7/0x214 [fscache]
	 [<ffffffffa00aafc5>] fscache_object_state_machine+0xa5/0x52d [fscache]
	 [<ffffffffa00ab4ac>] fscache_object_slow_work_execute+0x5f/0xa0 [fscache]
	 [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1
	 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308
	 [<ffffffff8104be91>] kthread+0x7a/0x82
	 [<ffffffff8100beda>] child_rip+0xa/0x20
	 [<ffffffff8100b87c>] ? restore_args+0x0/0x30
	 [<ffffffff8104be17>] ? kthread+0x0/0x82
	 [<ffffffff8100bed0>] ? child_rip+0x0/0x20
	1 lock held by kslowd004/5711:
	 #0:  (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffffa011be64>] cachefiles_walk_to_object+0x1b3/0x827 [cachefiles]

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:12:05 +00:00
David Howells
a17754fb8c CacheFiles: Don't write a full page if there's only a partial page to cache
cachefiles_write_page() writes a full page to the backing file for the last
page of the netfs file, even if the netfs file's last page is only a partial
page.

This causes the EOF on the backing file to be extended beyond the EOF of the
netfs, and thus the backing file will be truncated by cachefiles_attr_changed()
called from cachefiles_lookup_object().

So we need to limit the write we make to the backing file on that last page
such that it doesn't push the EOF too far.

Also, if a backing file that has a partial page at the end is expanded, we
discard the partial page and refetch it on the basis that we then have a hole
in the file with invalid data, and should the power go out...  A better way to
deal with this could be to record a note that the partial page contains invalid
data until the correct data is written into it.

This isn't a problem for netfs's that discard the whole backing file if the
file size changes (such as NFS).

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:52 +00:00
David Howells
60d543ca72 FS-Cache: Start processing an object's operations on that object's death
Start processing an object's operations when that object moves into the DYING
state as the object cannot be destroyed until all its outstanding operations
have completed.

Furthermore, make sure that read and allocation operations handle being woken
up on a dead object.  Such events are recorded in the Allocs.abt and
Retrvls.abt statistics as viewable through /proc/fs/fscache/stats.

The code for waiting for object activation for the read and allocation
operations is also extracted into its own function as it is much the same in
all cases, differing only in the stats incremented.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:45 +00:00
David Howells
201a15428b FS-Cache: Handle pages pending storage that get evicted under OOM conditions
Handle netfs pages that the vmscan algorithm wants to evict from the pagecache
under OOM conditions, but that are waiting for write to the cache.  Under these
conditions, vmscan calls the releasepage() function of the netfs, asking if a
page can be discarded.

The problem is typified by the following trace of a stuck process:

	kslowd005     D 0000000000000000     0  4253      2 0x00000080
	 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007
	 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8
	Call Trace:
	 [<ffffffffa00782d8>] __fscache_wait_on_page_write+0x8b/0xa7 [fscache]
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffffa0078240>] ? __fscache_check_page_write+0x63/0x70 [fscache]
	 [<ffffffffa00b671d>] nfs_fscache_release_page+0x4e/0xc4 [nfs]
	 [<ffffffffa00927f0>] nfs_release_page+0x3c/0x41 [nfs]
	 [<ffffffff810885d3>] try_to_release_page+0x32/0x3b
	 [<ffffffff81093203>] shrink_page_list+0x316/0x4ac
	 [<ffffffff8109372b>] shrink_inactive_list+0x392/0x67c
	 [<ffffffff813532fa>] ? __mutex_unlock_slowpath+0x100/0x10b
	 [<ffffffff81058df0>] ? trace_hardirqs_on_caller+0x10c/0x130
	 [<ffffffff8135330e>] ? mutex_unlock+0x9/0xb
	 [<ffffffff81093aa2>] shrink_list+0x8d/0x8f
	 [<ffffffff81093d1c>] shrink_zone+0x278/0x33c
	 [<ffffffff81052d6c>] ? ktime_get_ts+0xad/0xba
	 [<ffffffff81094b13>] try_to_free_pages+0x22e/0x392
	 [<ffffffff81091e24>] ? isolate_pages_global+0x0/0x212
	 [<ffffffff8108e743>] __alloc_pages_nodemask+0x3dc/0x5cf
	 [<ffffffff81089529>] grab_cache_page_write_begin+0x65/0xaa
	 [<ffffffff8110f8c0>] ext3_write_begin+0x78/0x1eb
	 [<ffffffff81089ec5>] generic_file_buffered_write+0x109/0x28c
	 [<ffffffff8103cb69>] ? current_fs_time+0x22/0x29
	 [<ffffffff8108a509>] __generic_file_aio_write+0x350/0x385
	 [<ffffffff8108a588>] ? generic_file_aio_write+0x4a/0xae
	 [<ffffffff8108a59e>] generic_file_aio_write+0x60/0xae
	 [<ffffffff810b2e82>] do_sync_write+0xe3/0x120
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810b18e1>] ? __dentry_open+0x1a5/0x2b8
	 [<ffffffff810b1a76>] ? dentry_open+0x82/0x89
	 [<ffffffffa00e693c>] cachefiles_write_page+0x298/0x335 [cachefiles]
	 [<ffffffffa0077147>] fscache_write_op+0x178/0x2c2 [fscache]
	 [<ffffffffa0075656>] fscache_op_execute+0x7a/0xd1 [fscache]
	 [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1
	 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308
	 [<ffffffff8104be91>] kthread+0x7a/0x82
	 [<ffffffff8100beda>] child_rip+0xa/0x20
	 [<ffffffff8100b87c>] ? restore_args+0x0/0x30
	 [<ffffffff8102ef83>] ? tg_shares_up+0x171/0x227
	 [<ffffffff8104be17>] ? kthread+0x0/0x82
	 [<ffffffff8100bed0>] ? child_rip+0x0/0x20

In the above backtrace, the following is happening:

 (1) A page storage operation is being executed by a slow-work thread
     (fscache_write_op()).

 (2) FS-Cache farms the operation out to the cache to perform
     (cachefiles_write_page()).

 (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's
     standard write (do_sync_write()) under KERNEL_DS directly from the netfs
     page.

 (4) However, for Ext3 to perform the write, it must allocate some memory, in
     particular, it must allocate at least one page cache page into which it
     can copy the data from the netfs page.

 (5) Under OOM conditions, the memory allocator can't immediately come up with
     a page, so it uses vmscan to find something to discard
     (try_to_free_pages()).

 (6) vmscan finds a clean netfs page it might be able to discard (possibly the
     one it's trying to write out).

 (7) The netfs is called to throw the page away (nfs_release_page()) - but it's
     called with __GFP_WAIT, so the netfs decides to wait for the store to
     complete (__fscache_wait_on_page_write()).

 (8) This blocks a slow-work processing thread - possibly against itself.

The system ends up stuck because it can't write out any netfs pages to the
cache without allocating more memory.

To avoid this, we make FS-Cache cancel some writes that aren't in the middle of
actually being performed.  This means that some data won't make it into the
cache this time.  To support this, a new FS-Cache function is added
fscache_maybe_release_page() that replaces what the netfs releasepage()
functions used to do with respect to the cache.

The decisions fscache_maybe_release_page() makes are counted and displayed
through /proc/fs/fscache/stats on a line labelled "VmScan".  There are four
counters provided: "nos=N" - pages that weren't pending storage; "gon=N" -
pages that were pending storage when we first looked, but weren't by the time
we got the object lock; "bsy=N" - pages that we ignored as they were actively
being written when we looked; and "can=N" - pages that we cancelled the storage
of.

What I'd really like to do is alter the behaviour of the cancellation
heuristics, depending on how necessary it is to expel pages.  If there are
plenty of other pages that aren't waiting to be written to the cache that
could be ejected first, then it would be nice to hold up on immediate
cancellation of cache writes - but I don't see a way of doing that.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:35 +00:00
David Howells
1bccf513ac FS-Cache: Fix lock misorder in fscache_write_op()
FS-Cache has two structs internally for keeping track of the internal state of
a cached file: the fscache_cookie struct, which represents the netfs's state,
and fscache_object struct, which represents the cache's state.  Each has a
pointer that points to the other (when both are in existence), and each has a
spinlock for pointer maintenance.

Since netfs operations approach these structures from the cookie side, they get
the cookie lock first, then the object lock.  Cache operations, on the other
hand, approach from the object side, and get the object lock first.  It is not
then permitted for a cache operation to get the cookie lock whilst it is
holding the object lock lest deadlock occur; instead, it must do one of two
things:

 (1) increment the cookie usage counter, drop the object lock and then get both
     locks in order, or

 (2) simply hold the object lock as certain parts of the cookie may not be
     altered whilst the object lock is held.

It is also not permitted to follow either pointer without holding the lock at
the end you start with.  To break the pointers between the cookie and the
object, both locks must be held.

fscache_write_op(), however, violates the locking rules: It attempts to get the
cookie lock without (a) checking that the cookie pointer is a valid pointer,
and (b) holding the object lock to protect the cookie pointer whilst it follows
it.  This is so that it can access the pending page store tree without
interference from __fscache_write_page().

This is fixed by splitting the cookie lock, such that the page store tracking
tree is protected by its own lock, and checking that the cookie pointer is
non-NULL before we attempt to follow it whilst holding the object lock.

The new lock is subordinate to both the cookie lock and the object lock, and so
should be taken after those.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:25 +00:00
David Howells
4fbf4291aa FS-Cache: Allow the current state of all objects to be dumped
Allow the current state of all fscache objects to be dumped by doing:

	cat /proc/fs/fscache/objects

By default, all objects and all fields will be shown.  This can be restricted
by adding a suitable key to one of the caller's keyrings (such as the session
keyring):

	keyctl add user fscache:objlist "<restrictions>" @s

The <restrictions> are:

	K	Show hexdump of object key (don't show if not given)
	A	Show hexdump of object aux data (don't show if not given)

And paired restrictions:

	C	Show objects that have a cookie
	c	Show objects that don't have a cookie
	B	Show objects that are busy
	b	Show objects that aren't busy
	W	Show objects that have pending writes
	w	Show objects that don't have pending writes
	R	Show objects that have outstanding reads
	r	Show objects that don't have outstanding reads
	S	Show objects that have slow work queued
	s	Show objects that don't have slow work queued

If neither side of a restriction pair is given, then both are implied.  For
example:

	keyctl add user fscache:objlist KB @s

shows objects that are busy, and lists their object keys, but does not dump
their auxiliary data.  It also implies "CcWwRrSs", but as 'B' is given, 'b' is
not implied.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:04 +00:00
David Howells
440f0affe2 FS-Cache: Annotate slow-work runqueue proc lines for FS-Cache work items
Annotate slow-work runqueue proc lines for FS-Cache work items.  Objects
include the object ID and the state.  Operations include the object ID, the
operation ID and the operation type and state.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:01 +00:00
David Howells
3bde31a4ac SLOW_WORK: Allow a requeueable work item to sleep till the thread is needed
Add a function to allow a requeueable work item to sleep till the thread
processing it is needed by the slow-work facility to perform other work.

Sometimes a work item can't progress immediately, but must wait for the
completion of another work item that's currently being processed by another
slow-work thread.

In some circumstances, the waiting item could instead - theoretically - put
itself back on the queue and yield its thread back to the slow-work facility,
thus waiting till it gets processing time again before attempting to progress.
This would allow other work items processing time on that thread.

However, this only works if there is something on the queue for it to queue
behind - otherwise it will just get a thread again immediately, and will end
up cycling between the queue and the thread, eating up valuable CPU time.

So, slow_work_sleep_till_thread_needed() is provided such that an item can put
itself on a wait queue that will wake it up when the event it is actually
interested in occurs, then call this function in lieu of calling schedule().

This function will then sleep until either the item's event occurs or another
work item appears on the queue.  If another work item is queued, but the
item's event hasn't occurred, then the work item should requeue itself and
yield the thread back to the slow-work facility by returning.

This can be used by CacheFiles for an object that is being created on one
thread to wait for an object being deleted on another thread where there is
nothing on the queue for the creation to go and wait behind.  As soon as an
item appears on the queue that could be given thread time instead, CacheFiles
can stick the creating object back on the queue and return to the slow-work
facility - assuming the object deletion didn't also complete.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:57 +00:00
David Howells
31ba99d304 SLOW_WORK: Allow the owner of a work item to determine if it is queued or not
Add a function (slow_work_is_queued()) to permit the owner of a work item to
determine if the item is queued or not.

The work item is counted as being queued if it is actually on the queue, not
just if it is pending.  If it is executing and pending, then it is not on the
queue, but will rather be put back on the queue when execution finishes.

This permits a caller to quickly work out if it may be able to put another,
dependent work item on the queue behind it, or whether it will have to wait
till that is finished.

This can be used by CacheFiles to work out whether the creation a new object
can be immediately deferred when it has to wait for an old object to be
deleted, or whether a wait must take place.  If a wait is necessary, then the
slow-work thread can otherwise get blocked, preventing the deletion from
taking place.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:53 +00:00
David Howells
8fba10a42d SLOW_WORK: Allow the work items to be viewed through a /proc file
Allow the executing and queued work items to be viewed through a /proc file
for debugging purposes.  The contents look something like the following:

    THR PID   ITEM ADDR        FL MARK  DESC
    === ===== ================ == ===== ==========
      0  3005 ffff880023f52348  a 952ms FSC: OBJ17d3: LOOK
      1  3006 ffff880024e33668  2 160ms FSC: OBJ17e5 OP60d3b: Write1/Store fl=2
      2  3165 ffff8800296dd180  a 424ms FSC: OBJ17e4: LOOK
      3  4089 ffff8800262c8d78  a 212ms FSC: OBJ17ea: CRTN
      4  4090 ffff88002792bed8  2 388ms FSC: OBJ17e8 OP60d36: Write1/Store fl=2
      5  4092 ffff88002a0ef308  2 388ms FSC: OBJ17e7 OP60d2e: Write1/Store fl=2
      6  4094 ffff88002abaf4b8  2 132ms FSC: OBJ17e2 OP60d4e: Write1/Store fl=2
      7  4095 ffff88002bb188e0  a 388ms FSC: OBJ17e9: CRTN
    vsq     - ffff880023d99668  1 308ms FSC: OBJ17e0 OP60f91: Write1/EnQ fl=2
    vsq     - ffff8800295d1740  1 212ms FSC: OBJ16be OP4d4b6: Write1/EnQ fl=2
    vsq     - ffff880025ba3308  1 160ms FSC: OBJ179a OP58dec: Write1/EnQ fl=2
    vsq     - ffff880024ec83e0  1 160ms FSC: OBJ17ae OP599f2: Write1/EnQ fl=2
    vsq     - ffff880026618e00  1 160ms FSC: OBJ17e6 OP60d33: Write1/EnQ fl=2
    vsq     - ffff880025a2a4b8  1 132ms FSC: OBJ16a2 OP4d583: Write1/EnQ fl=2
    vsq     - ffff880023cbe6d8  9 212ms FSC: OBJ17eb: LOOK
    vsq     - ffff880024d37590  9 212ms FSC: OBJ17ec: LOOK
    vsq     - ffff880027746cb0  9 212ms FSC: OBJ17ed: LOOK
    vsq     - ffff880024d37ae8  9 212ms FSC: OBJ17ee: LOOK
    vsq     - ffff880024d37cb0  9 212ms FSC: OBJ17ef: LOOK
    vsq     - ffff880025036550  9 212ms FSC: OBJ17f0: LOOK
    vsq     - ffff8800250368e0  9 212ms FSC: OBJ17f1: LOOK
    vsq     - ffff880025036aa8  9 212ms FSC: OBJ17f2: LOOK

In the 'THR' column, executing items show the thread they're occupying and
queued threads indicate which queue they're on.  'PID' shows the process ID of
a slow-work thread that's executing something.  'FL' shows the work item flags.
'MARK' indicates how long since an item was queued or began executing.  Lastly,
the 'DESC' column permits the owner of an item to give some information.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:51 +00:00
Jens Axboe
6b8268b17a SLOW_WORK: Add delayed_slow_work support
This adds support for starting slow work with a delay, similar
to the functionality we have for workqueues.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:10:47 +00:00