commit 286f367dad upstream.
Avoid dereferencing a NULL pointer if the number of feature arguments
supplied is fewer than indicated.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 762a80d9fc upstream.
This patch makes dm-snapshot flush disk cache when writing metadata for
merging snapshot.
Without cache flushing the disk may reorder metadata write and other
data writes and there is a possibility of data corruption in case of
power fault.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit bb91bc7bac upstream.
For normal kernel pages, CPU cache is synchronized by the dma layer.
However, this is not done for pages allocated with vmalloc. If we do I/O
to/from vmallocated pages, we must synchronize CPU cache explicitly.
Prior to doing I/O on vmallocated page we must call
flush_kernel_vmap_range to flush dirty cache on the virtual address.
After finished read we must call invalidate_kernel_vmap_range to
invalidate cache on the virtual address, so that accesses to the virtual
address return newly read data and not stale data from CPU cache.
This patch fixes metadata corruption on dm-snapshots on PA-RISC and
possibly other architectures with caches indexed by virtual address.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit ca9380fd68 upstream.
Convert array index from the loop bound to the loop index.
A simplified version of the semantic patch that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
expression e1,e2,ar;
@@
for(e1 = 0; e1 < e2; e1++) { <...
ar[
- e2
+ e1
]
...> }
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit bea1906620 upstream.
Fix the usage of mod_timer() and make the driver usable. mod_timer() must
be called with an absolute timeout in jiffies. The old implementation
used a relative timeout thus the hardware watchdog was never triggered.
Signed-off-by: David Engraf <david.engraf@sysgo.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Wim Van sebroeck <wim@iguana.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 1923703991 upstream.
Depending upon the order of userspace/kernel during the
mount process, this can result in a hang without the
_all version of the completion.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit c027a474a6 upstream.
exit_mm() sets ->mm == NULL then it does mmput()->exit_mmap() which
frees the memory.
However select_bad_process() checks ->mm != NULL before TIF_MEMDIE,
so it continues to kill other tasks even if we have the oom-killed
task freeing its memory.
Change select_bad_process() to check ->mm after TIF_MEMDIE, but skip
the tasks which have already passed exit_notify() to ensure a zombie
with TIF_MEMDIE set can't block oom-killer. Alternatively we could
probably clear TIF_MEMDIE after exit_mmap().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 25e75dff51 upstream.
AppArmor is masking the capabilities returned by capget against the
capabilities mask in the profile. This is wrong, in complain mode the
profile has effectively all capabilities, as the profile restrictions are
not being enforced, merely tested against to determine if an access is
known by the profile.
This can result in the wrong behavior of security conscience applications
like sshd which examine their capability set, and change their behavior
accordingly. In this case because of the masked capability set being
returned sshd fails due to DAC checks, even when the profile is in complain
mode.
Kernels affected: 2.6.36 - 3.0.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 04fdc099f9 upstream.
The pointer returned from tracehook_tracer_task() is only valid inside
the rcu_read_lock. However the tracer pointer obtained is being passed
to aa_may_ptrace outside of the rcu_read_lock critical section.
Mover the aa_may_ptrace test into the rcu_read_lock critical section, to
fix this.
Kernels affected: 2.6.36 - 3.0
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d694ad62bf upstream.
If a semaphore array is removed and in parallel a sleeping task is woken
up (signal or timeout, does not matter), then the woken up task does not
wait until wake_up_sem_queue_do() is completed. This will cause crashes,
because wake_up_sem_queue_do() will read from a stale pointer.
The fix is simple: Regardless of anything, always call get_queue_result().
This function waits until wake_up_sem_queue_do() has finished it's task.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=27142
Reported-by: Yuriy Yevtukhov <yuriy@ucoz.com>
Reported-by: Harald Laabs <kernel@dasr.de>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 8c2381af0d upstream.
Currently, the hvc_console_print() function drops console output if the
hvc backend's put_chars() returns 0. This patch changes this behavior
to allow a retry through returning -EAGAIN.
This change also affects the hvc_push() function. Both functions are
changed to handle -EAGAIN and to retry the put_chars() operation.
If a hvc backend returns -EAGAIN, the retry handling differs:
- hvc_console_print() spins to write the complete console output.
- hvc_push() behaves the same way as for returning 0.
Now hvc backends can indirectly control the way how console output is
handled through the hvc console layer.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 51d3302142 upstream.
Return -EAGAIN when we get H_BUSY back from the hypervisor. This
makes the hvc console driver retry, avoiding dropped printks.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit f2eb3cdf14 upstream.
Kconfig allows enabling console support for the SC26xx driver even when
it's configured as a module resulting in a:
ERROR: "uart_console_device" [drivers/tty/serial/sc26xx.ko] undefined!
modpost error since the driver was merged in
eea63e0e8a [SC26XX: New serial driver for
SC2681 uarts] in 2.6.25. Fixed by only allowing console support to be
enabled if the driver is builtin.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-serial@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 5568181f18 upstream.
Commit 4539c24fe4 "tty/serial: Add
explicit PORT_TEGRA type" introduced separate flags describing the need
for IER bits UUE and RTOIE. Both bits are required for the XSCALE port
type. While that patch updated uart_config[] as required, the auto-probing
code wasn't updated to set the RTOIE flag when an XSCALE port type was
detected. This caused such ports to stop working. This patch rectifies
that.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 108b6a7846 upstream.
Commit 22a668d7c3 ("memcg: fix behavior under memory.limit equals to
memsw.limit") introduced "memsw_is_minimum" flag, which becomes true
when mem_limit == memsw_limit. The flag is checked at the beginning of
reclaim, and "noswap" is set if the flag is true, because using swap is
meaningless in this case.
This works well in most cases, but when we try to shrink mem_limit,
which is the same as memsw_limit now, we might fail to shrink mem_limit
because swap doesn't used.
This patch fixes this behavior by:
- check MEM_CGROUP_RECLAIM_SHRINK at the begining of reclaim
- If it is set, don't set "noswap" flag even if memsw_is_minimum is true.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit a203c2aa4c upstream.
At the beginning of wiphy_update_regulatory() a check is performed
whether the request is to be ignored. Then the request is sent to
the driver nevertheless. This happens even if last_request points
to NULL, leading to a crash in the driver:
[<bf01d864>] (lbs_set_11d_domain_info+0x28/0x1e4 [libertas]) from [<c03b714c>] (wiphy_update_regulatory+0x4d0/0x4f4)
[<c03b714c>] (wiphy_update_regulatory+0x4d0/0x4f4) from [<c03b4008>] (wiphy_register+0x354/0x420)
[<c03b4008>] (wiphy_register+0x354/0x420) from [<bf01b17c>] (lbs_cfg_register+0x80/0x164 [libertas])
[<bf01b17c>] (lbs_cfg_register+0x80/0x164 [libertas]) from [<bf020e64>] (lbs_start_card+0x20/0x88 [libertas])
[<bf020e64>] (lbs_start_card+0x20/0x88 [libertas]) from [<bf02cbd8>] (if_sdio_probe+0x898/0x9c0 [libertas_sdio])
Fix this by returning early. Also remove the out: label as it is
not any longer needed.
Signed-off-by: Sven Neumann <s.neumann@raumfeld.com>
Cc: linux-wireless@vger.kernel.org
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Daniel Mack <daniel@zonque.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e04f5f7e42 upstream.
This patch (as1480) fixes a rather obscure bug in ehci-hcd. The
qh_update() routine needs to know the number and direction of the
endpoint corresponding to its QH argument. The number can be taken
directly from the QH data structure, but the direction isn't stored
there. The direction is taken instead from the first qTD linked to
the QH.
However, it turns out that for interrupt transfers, qh_update() gets
called before the qTDs are linked to the QH. As a result, qh_update()
computes a bogus direction value, which messes up the endpoint toggle
handling. Under the right combination of circumstances this causes
usb_reset_endpoint() not to work correctly, which causes packets to be
dropped and communications to fail.
Now, it's silly for the QH structure not to have direct access to all
the descriptor information for the corresponding endpoint. Ultimately
it may get a pointer to the usb_host_endpoint structure; for now,
adding a copy of the direction flag solves the immediate problem.
This allows the Spyder2 color-calibration system (a low-speed USB
device that sends all its interrupt data packets with the toggle set
to 0 and hance requires constant use of usb_reset_endpoint) to work
when connected through a high-speed hub. Thanks to Graeme Gill for
supplying the hardware that allowed me to track down this bug.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Graeme Gill <graeme@argyllcms.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 81463c1d70 upstream.
MAX4967 USB power supply chip we use on our boards signals over-current when
power is not enabled; once it's enabled, over-current signal returns to normal.
That unfortunately caused the endless stream of "over-current change on port"
messages. The EHCI root hub code reacts on every over-current signal change
with powering off the port -- such change event is generated the moment the
port power is enabled, so once enabled the power is immediately cut off.
I think we should only cut off power when we're seeing the active over-current
signal, so I'm adding such check to that code. I also think that the fact that
we've cut off the port power should be reflected in the result of GetPortStatus
request immediately, hence I'm adding a PORTSCn register readback after write...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit f086ced171 upstream.
FCS could be GSM0_SOF, so will break state machine...
[This byte isn't quoted in any way so a SOF here doesn't imply an error
occurred.]
Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
[Trivial but best backported once its in 3.1rc I think]
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 293eb1e777 upstream.
If an inode's mode permits opening /proc/PID/io and the resulting file
descriptor is kept across execve() of a setuid or similar binary, the
ptrace_may_access() check tries to prevent using this fd against the
task with escalated privileges.
Unfortunately, there is a race in the check against execve(). If
execve() is processed after the ptrace check, but before the actual io
information gathering, io statistics will be gathered from the
privileged process. At least in theory this might lead to gathering
sensible information (like ssh/ftp password length) that wouldn't be
available otherwise.
Holding task->signal->cred_guard_mutex while gathering the io
information should protect against the race.
The order of locking is similar to the one inside of ptrace_attach():
first goes cred_guard_mutex, then lock_task_sighand().
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 0c0308066c upstream.
If the directory contents change, then we have to accept that the
file->f_pos value may shrink if we do a 'search-by-cookie'. In that
case, we should turn off the loop detection and let the NFS client
try to recover.
The patch also fixes a second loop detection bug by ensuring
that after turning on the ctx->duped flag, we read at least one new
cookie into ctx->dir_cookie before attempting to match with
ctx->dup_cookie.
Reported-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit ed1e6211a0 upstream.
nfs_mark_return_delegation() is usually called without any locking, and
so it is not safe to dereference delegation->inode. Since the inode is
only used to discover the nfs_client anyway, it makes more sense to
have the callers pass a valid pointer to the nfs_server as a parameter.
Reported-by: Ian Kent <raven@themaw.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit ebc63e531c upstream.
After commit 3262c816a3 "[PATCH] knfsd:
split svc_serv into pools", svc_delete_xprt (then svc_delete_socket) no
longer removed its xpt_ready (then sk_ready) field from whatever list it
was on, noting that there was no point since the whole list was about to
be destroyed anyway.
That was mostly true, but forgot that a few svc_xprt_enqueue()'s might
still be hanging around playing with the about-to-be-destroyed list, and
could get themselves into trouble writing to freed memory if we left
this xprt on the list after freeing it.
(This is actually functionally identical to a patch made first by Ben
Greear, but with more comments.)
Cc: gnb@fmeh.org
Reported-by: Ben Greear <greearb@candelatech.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit f197c27196 upstream.
Stateid's hold a read reference for a read open, a write reference for a
write open, and an additional one of each for each read+write open. The
latter wasn't getting put on a downgrade, so something like:
open RW
open R
downgrade to R
was resulting in a file leak.
Also fix an imbalance in an error path.
Regression from 7d94784293 "nfsd4: fix
downgrade/lock logic".
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 499f3edc23 upstream.
Without this, for example,
open read
open read+write
close
will result in a struct file leak.
Regression from 7d94784293 "nfsd4: fix
downgrade/lock logic".
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 0c12eaffdf upstream.
CLAIM_DELEGATE_CUR is used in response to a broken lease; allowing it
to break the lease and return EAGAIN leaves the client unable to make
progress in returning the delegation
nfs4_get_vfs_file() now takes struct nfsd4_open for access to the
claim type, and calls nfsd_open() with NFSD_MAY_NOT_BREAK_LEASE when
claim type is CLAIM_DELEGATE_CUR
Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit b2987a5e05 upstream.
Fixes a regression caused by b5695d0463
Kernel keyring keys containing eCryptfs authentication tokens should not
be write locked when calling out to ecryptfsd to wrap and unwrap file
encryption keys. The eCryptfs kernel code can not hold the key's write
lock because ecryptfsd needs to request the key after receiving such a
request from the kernel.
Without this fix, all file opens and creates will timeout and fail when
using the eCryptfs PKI infrastructure. This is not an issue when using
passphrase-based mount keys, which is the most widely deployed eCryptfs
configuration.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Acked-by: Roberto Sassu <roberto.sassu@polito.it>
Tested-by: Roberto Sassu <roberto.sassu@polito.it>
Tested-by: Alexis Hafner1 <haf@zurich.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 985ca0e626 upstream.
Make the inode mapping bdi consistent with the superblock bdi so that
dirty pages are flushed properly.
Signed-off-by: Thieu Le <thieule@chromium.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit ad95c5e9bc upstream.
Block allocation is called from two places: ext3_get_blocks_handle() and
ext3_xattr_block_set(). These two callers are not necessarily synchronized
because xattr code holds only xattr_sem and i_mutex, and
ext3_get_blocks_handle() may hold only truncate_mutex when called from
writepage() path. Block reservation code does not expect two concurrent
allocations to happen to the same inode and thus assertions can be triggered
or reservation structure corruption can occur.
Fix the problem by taking truncate_mutex in xattr code to serialize
allocations.
CC: Sage Weil <sage@newdream.net>
Reported-by: Fyodor Ustinov <ufm@ufm.su>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 575a1d4bdf upstream.
Upon corrupted inode or disk failures, we may fail after we already
allocate some blocks from the inode or take some blocks from the
inode's preallocation list, but before we successfully insert the
corresponding extent to the extent tree. In this case, we should free
any allocated blocks and discard the inode's preallocated blocks
because the entries in the inode's preallocation list may be in an
inconsistent state.
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 7132de744b upstream.
The current implementation of ext4_free_blocks() always calls
dquot_free_block This looks quite sensible in the most cases: blocks
to be freed are associated with inode and were accounted in quota and
i_blocks some time ago.
However, there is a case when blocks to free were not accounted by the
time calling ext4_free_blocks() yet:
1. delalloc is on, write_begin pre-allocated some space in quota
2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks()
3. then ext4_ext_map_blocks() gets an error (e.g. ENOSPC) from
ext4_ext_insert_extent() and calls ext4_free_blocks().
In this scenario, ext4_free_blocks() calls dquot_free_block() who, in
turn, decrements i_blocks for blocks which were not accounted yet (due
to delalloc) After clean umount, e2fsck reports something like:
> Inode 21, i_blocks is 5080, should be 5128. Fix<y>?
because i_blocks was erroneously decremented as explained above.
The patch fixes the problem by passing the new flag
EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request
that the dquot_free_block() call be skipped.
Signed-off-by: Maxim Patlasov <maxim.patlasov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 0d0138ebe2 upstream.
Prevent an arbitrary kernel read. Check the user pointer with access_ok()
before copying data in.
[akpm@linux-foundation.org: s/EIO/EFAULT/]
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit ccb6108f5b upstream.
Vito said:
: The system has many usb disks coming and going day to day, with their
: respective bdi's having min_ratio set to 1 when inserted. It works for
: some time until eventually min_ratio can no longer be set, even when the
: active set of bdi's seen in /sys/class/bdi/*/min_ratio doesn't add up to
: anywhere near 100.
:
: This then leads to an unrelated starvation problem caused by write-heavy
: fuse mounts being used atop the usb disks, a problem the min_ratio setting
: at the underlying devices bdi effectively prevents.
Fix this leakage by resetting the bdi min_ratio when unregistering the
BDI.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Vito Caputo <lkml@pengaru.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2efaca927f upstream.
I haven't reproduced it myself but the fail scenario is that on such
machines (notably ARM and some embedded powerpc), if you manage to hit
that futex path on a writable page whose dirty bit has gone from the PTE,
you'll livelock inside the kernel from what I can tell.
It will go in a loop of trying the atomic access, failing, trying gup to
"fix it up", getting succcess from gup, go back to the atomic access,
failing again because dirty wasn't fixed etc...
So I think you essentially hang in the kernel.
The scenario is probably rare'ish because affected architecture are
embedded and tend to not swap much (if at all) so we probably rarely hit
the case where dirty is missing or young is missing, but I think Shan has
a piece of SW that can reliably reproduce it using a shared writable
mapping & fork or something like that.
On archs who use SW tracking of dirty & young, a page without dirty is
effectively mapped read-only and a page without young unaccessible in the
PTE.
Additionally, some architectures might lazily flush the TLB when relaxing
write protection (by doing only a local flush), and expect a fault to
invalidate the stale entry if it's still present on another processor.
The futex code assumes that if the "in_atomic()" access -EFAULT's, it can
"fix it up" by causing get_user_pages() which would then be equivalent to
taking the fault.
However that isn't the case. get_user_pages() will not call
handle_mm_fault() in the case where the PTE seems to have the right
permissions, regardless of the dirty and young state. It will eventually
update those bits ... in the struct page, but not in the PTE.
Additionally, it will not handle the lazy TLB flushing that can be
required by some architectures in the fault case.
Basically, gup is the wrong interface for the job. The patch provides a
more appropriate one which boils down to just calling handle_mm_fault()
since what we are trying to do is simulate a real page fault.
The futex code currently attempts to write to user memory within a
pagefault disabled section, and if that fails, tries to fix it up using
get_user_pages().
This doesn't work on archs where the dirty and young bits are maintained
by software, since they will gate access permission in the TLB, and will
not be updated by gup().
In addition, there's an expectation on some archs that a spurious write
fault triggers a local TLB flush, and that is missing from the picture as
well.
I decided that adding those "features" to gup() would be too much for this
already too complex function, and instead added a new simpler
fixup_user_fault() which is essentially a wrapper around handle_mm_fault()
which the futex code can call.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix some nits Darren saw, fiddle comment layout]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported-by: Shan Hai <haishan.bai@gmail.com>
Tested-by: Shan Hai <haishan.bai@gmail.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Darren Hart <darren.hart@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 703f03c896 upstream.
As stated in drivers/mfd/cs5535-mfd.c, the mfd driver exposes the BARs
which then make the GPIO, MFGPT, ACPI, etc. all visible to the system.
So the dependencies of the MFGPT stuff have changed, and most people
expect Kconfig to bring in the necessary dependencies. Without them, the
module fails to load and most people don't understand why because the
details of the rewrite aren't captured anywhere most people who know to
look.
This dependency needs to be reflected in Kconfig.
Signed-off-by: Philip A. Prindeville <philipp@redfish-solutions.com>
Acked-by: Alexandros C. Couloumbis <alex@ozo.com>
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 864d296cf9 upstream.
The function pci_enable_ari() may mistakenly set the downstream port
of a v1 PCIe switch in ARI Forwarding mode. This is a PCIe v2 feature,
and with an SR-IOV device on that switch port believing the switch above
is ARI capable it may attempt to use functions 8-255, translating into
invalid (non-zero) device numbers for that bus. This has been seen
to cause Completion Timeouts and general misbehaviour including hangs
and panics.
Acked-by: Don Dutile <ddutile@redhat.com>
Tested-by: Don Dutile <ddutile@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 81d6743985 upstream.
<linux/kernel.h> is needed for min_t. The old version
happened to work on x86 because <asm/unaligned.h>
indirectly includes <linux/kernel.h>, but it didn't
work on ARM.
<linux/kernel.h> includes <asm/byteorder.h> so it's
not necessary to include it explicitly anymore.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 40ee4dffff upstream.
The "enable" file for the event system can be removed when a module
is unloaded and the event system only has events from that module.
As the event system nr_events count goes to zero, it may be freed
if its ref_count is also set to zero.
Like the "filter" file, the "enable" file may be opened by a task and
referenced later, after a module has been unloaded and the events for
that event system have been removed.
Although the "filter" file referenced the event system structure,
the "enable" file only references a pointer to the event system
name. Since the name is freed when the event system is removed,
it is possible that an access to the "enable" file may reference
a freed pointer.
Update the "enable" file to use the subsystem_open() routine that
the "filter" file uses, to keep a reference to the event system
structure while the "enable" file is opened.
Reported-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e9dbfae53e upstream.
The event system is freed when its nr_events is set to zero. This happens
when a module created an event system and then later the module is
removed. Modules may share systems, so the system is allocated when
it is created and freed when the modules are unloaded and all the
events under the system are removed (nr_events set to zero).
The problem arises when a task opened the "filter" file for the
system. If the module is unloaded and it removed the last event for
that system, the system structure is freed. If the task that opened
the filter file accesses the "filter" file after the system has
been freed, the system will access an invalid pointer.
By adding a ref_count, and using it to keep track of what
is using the event system, we can free it after all users
are finished with the event system.
Reported-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 63f21a56f1 upstream.
The existing code it pretty ugly. How about we clean it up even more
like this?
From: Anton Blanchard <anton@samba.org>
We check for timeout expiry in the outer loop, but we also need to
check it in the inner loop or we can lock up forever waiting for a
CPU to hit real mode.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 050438ed5a upstream.
In kexec jump support, jump back address passed to the kexeced
kernel via function calling ABI, that is, the function call
return address is the jump back entry.
Furthermore, jump back entry == 0 should be used to signal that
the jump back or preserve context is not enabled in the original
kernel.
But in the current implementation the stack position used for
function call return address is not cleared context
preservation is disabled. The patch fixes this bug.
Reported-and-tested-by: Yin Kangkai <kangkai.yin@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1310607277-25029-1-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>