commit 8a4aeec8d2 upstream.
The AHCI spec allows implementations to issue commands in tag order
rather than FIFO order:
5.3.2.12 P:SelectCmd
HBA sets pSlotLoc = (pSlotLoc + 1) mod (CAP.NCS + 1)
or HBA selects the command to issue that has had the
PxCI bit set to '1' longer than any other command
pending to be issued.
The result is that commands posted sequentially (time-wise) may play out
of sequence when issued by hardware.
This behavior has likely been hidden by drives that arrange for commands
to complete in issue order. However, it appears recent drives (two from
different vendors that we have found so far) inflict out-of-order
completions as a matter of course. So, we need to take care to maintain
ordered submission, otherwise we risk triggering a drive to fall out of
sequential-io automation and back to random-io processing, which incurs
large latency and degrades throughput.
This issue was found in simple benchmarks where QD=2 seq-write
performance was 30-50% *greater* than QD=32 seq-write performance.
Tagging for -stable and making the change globally since it has a low
risk-to-reward ratio. Also, word is that recent versions of an unnamed
OS also does it this way now. So, drives in the field are already
experienced with this tag ordering scheme.
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ed Ciechanowski <ed.ciechanowski@intel.com>
Reviewed-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 12cd43c6ed upstream.
Register B43_MMIO_PSM_PHY_HDR is 16 bit one, so accessing it with 32b
functions isn't safe. On my machine it causes delayed (!) CPU exception:
Disabling lock debugging due to kernel taint
mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f
mce: [Hardware Error]: TSC 164083803dc
mce: [Hardware Error]: PROCESSOR 2:20fc2 TIME 1396650505 SOCKET 0 APIC 0 microcode 0
mce: [Hardware Error]: Run the above through 'mcelog --ascii'
mce: [Hardware Error]: Machine check: Processor context corrupt
Kernel panic - not syncing: Fatal machine check on current CPU
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 43751a1b8e upstream.
This patch fixes the hardware cursor on mach64 when font width is not a
multiple of 8 pixels.
If you load such a font, the cursor is expanded to the next 8-byte
boundary and a part of the next character after the cursor is not
visible.
For example, when you load a font with 12-pixel width, the cursor width
is 16 pixels and when the cursor is displayed, 4 pixels of the next
character are not visible.
The reason is this: atyfb_cursor is called with proper parameters to
load an image that is 12-pixel wide. However, the number is aligned on
the next 8-pixel boundary on the line
"unsigned int width = (cursor->image.width + 7) >> 3;" and the whole
function acts as it is was loading a 16-pixel image.
This patch fixes it so that the value written to the framebuffer is
padded with 0xaaaa (the transparent pattern) when the image size it not
a multiple of 8 pixels. The transparent pattern causes that the cursor
will not interfere with the next character.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c29dd8696d upstream.
This patch fixes mach64 to use unaligned access to the font bitmap.
This fixes unaligned access warning on sparc64 when 14x8 font is loaded.
On x86(64), unaligned access is handled in hardware, so both functions
le32_to_cpup and get_unaligned_le32 perform the same operation.
On RISC machines, unaligned access is not handled in hardware, so we
better use get_unaligned_le32 to avoid the unaligned trap and warning.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a772d47366 upstream.
When X11 is running and the user switches back to console, the card
modifies the content of registers M_MACCESS and M_PITCH in periodic
intervals.
This patch fixes it by restoring the content of these registers before
issuing any accelerator command.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 00a9d699bc upstream.
The function cfb_copyarea is buggy when the copy operation is not aligned on
long boundary (4 bytes on 32-bit machines, 8 bytes on 64-bit machines).
How to reproduce:
- use x86-64 machine
- use a framebuffer driver without acceleration (for example uvesafb)
- set the framebuffer to 8-bit depth
(for example fbset -a 1024x768-60 -depth 8)
- load a font with character width that is not a multiple of 8 pixels
note: the console-tools package cannot load a font that has
width different from 8 pixels. You need to install the packages
"kbd" and "console-terminus" and use the program "setfont" to
set font width (for example: setfont Uni2-Terminus20x10)
- move some text left and right on the bash command line and you get a
screen corruption
To expose more bugs, put this line to the end of uvesafb_init_info:
info->flags |= FBINFO_HWACCEL_COPYAREA | FBINFO_READS_FAST;
- Now framebuffer console will use cfb_copyarea for console scrolling.
You get a screen corruption when console is scrolled.
This patch is a rewrite of cfb_copyarea. It fixes the bugs, with this
patch, console scrolling in 8-bit depth with a font width that is not a
multiple of 8 pixels works fine.
The cfb_copyarea code was very buggy and it looks like it was written
and never tried with non-8-pixel font.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fce16bc35a upstream.
In the exception return path, for both U/K cases, intr are already
disabled (for various existing reasons). So when we drop down to
@restore_regs, we need not redo that.
There was subtle issue - when intr were NOT being disabled for
ret-to-kernel-but-no-preemption case - now fixed by moving the
IRQ_DISABLE further up in @resume_kernel_mode.
So what do we gain:
* Shaves off a few insn in return path.
* Eliminates the need for IRQ_DISABLE_SAVE assembler macro for ARCv2
hence allows for entry code sharing.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e0de81759 upstream.
The A register needs to be initialized to zero in the prolog if the
first instruction of the BPF program is BPF_S_LDX_B_MSH to prevent
leaking the content of %r5 to user space.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06cd7a874e upstream.
Using a notification type mask for the store event information chsc
is unsupported on some firmware levels. Retry SEI with that mask set
to zero (which is the old way of requesting only channel subsystem
related events).
Reported-and-tested-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e6b8fd028b upstream.
We can't take an IRQ when we're about to do a trechkpt as our GPR state is set
to user GPR values.
We've hit this when running some IBM Java stress tests in the lab resulting in
the following dump:
cpu 0x3f: Vector: 700 (Program Check) at [c000000007eb3d40]
pc: c000000000050074: restore_gprs+0xc0/0x148
lr: 00000000b52a8184
sp: ac57d360
msr: 8000000100201030
current = 0xc00000002c500000
paca = 0xc000000007dbfc00 softe: 0 irq_happened: 0x00
pid = 34535, comm = Pooled Thread #
R00 = 00000000b52a8184 R16 = 00000000b3e48fda
R01 = 00000000ac57d360 R17 = 00000000ade79bd8
R02 = 00000000ac586930 R18 = 000000000fac9bcc
R03 = 00000000ade60000 R19 = 00000000ac57f930
R04 = 00000000f6624918 R20 = 00000000ade79be8
R05 = 00000000f663f238 R21 = 00000000ac218a54
R06 = 0000000000000002 R22 = 000000000f956280
R07 = 0000000000000008 R23 = 000000000000007e
R08 = 000000000000000a R24 = 000000000000000c
R09 = 00000000b6e69160 R25 = 00000000b424cf00
R10 = 0000000000000181 R26 = 00000000f66256d4
R11 = 000000000f365ec0 R27 = 00000000b6fdcdd0
R12 = 00000000f66400f0 R28 = 0000000000000001
R13 = 00000000ada71900 R29 = 00000000ade5a300
R14 = 00000000ac2185a8 R30 = 00000000f663f238
R15 = 0000000000000004 R31 = 00000000f6624918
pc = c000000000050074 restore_gprs+0xc0/0x148
cfar= c00000000004fe28 dont_restore_vec+0x1c/0x1a4
lr = 00000000b52a8184
msr = 8000000100201030 cr = 24804888
ctr = 0000000000000000 xer = 0000000000000000 trap = 700
This moves tm_recheckpoint to a C function and moves the tm_restore_sprs into
that function. It then adds IRQ disabling over the trechkpt critical section.
It also sets the TEXASR FS in the signals code to ensure this is never set now
that we explictly write the TM sprs in tm_recheckpoint.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 422b9b9684 upstream.
I noticed this when testing setarch. No, we don't magically
support a big endian userspace on a little endian kernel.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit af61e27c3f upstream.
On suspend, _scsih_suspend calls mpt2sas_base_free_resources, which
in turn calls pci_disable_device if the device is enabled prior to
suspending. However, _scsih_suspend also calls pci_disable_device
itself.
Thus, in the event that the device is enabled prior to suspending,
pci_disable_device will be called twice. This patch removes the
duplicate call to pci_disable_device in _scsi_suspend as it is both
unnecessary and results in a kernel oops.
Signed-off-by: Tyler Stachecki <tstache1@binghamton.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f74ef0f2d upstream.
When adding or removing 100G from a balloon:
BUG: soft lockup - CPU#0 stuck for 22s! [vballoon:367]
We have a wait_event_interruptible(), but the condition is always true
(more ballooning to do) so we don't ever sleep. We also have a
wait_event() for the host to ack, but that is also always true as QEMU
is synchronous for balloon operations.
Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c14af233fb upstream.
The original MIPS hibernate code flushes cache and TLB entries in
swsusp_arch_resume(). But they are removed in Commit 44eeab6741
(MIPS: Hibernation: Remove SMP TLB and cacheflushing code.). A cross-
CPU flush is surely unnecessary because all but the local CPU have
already been disabled. But a local flush (at least the TLB flush) is
needed. When we do hibernation on Loongson-3 with an E1000E NIC, it is
very easy to produce a kernel panic (kernel page fault, or unaligned
access). The root cause is E1000E driver use vzalloc_node() to allocate
pages, the stale TLB entries of the booting kernel will be misused by
the resumed target kernel.
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Cc: John Crispin <john@phrozen.org>
Cc: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: linux-mips@linux-mips.org
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/6643/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1550567936 upstream.
Previously a reserved instruction exception while in guest code would
cause a KVM internal error if kvm_mips_handle_ri() didn't recognise the
instruction (including a RDHWR from an unrecognised hardware register).
However the guest OS should really have the opportunity to catch the
exception so that it can take the appropriate actions such as sending a
SIGILL to the guest user process or emulating the instruction itself.
Therefore in these cases emulate a guest RI exception and only return
EMULATE_FAIL if that fails, being careful to revert the PC first in case
the exception occurred in a branch delay slot in which case the PC will
already point to the branch target.
Also turn the printk messages relating to these cases into kvm_debug
messages so that they aren't usually visible.
This allows crashme to run in the guest without killing the entire VM.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5678de3f15 upstream.
QE reported that they got the BUG_ON in ioapic_service to trigger.
I cannot reproduce it, but there are two reasons why this could happen.
The less likely but also easiest one, is when kvm_irq_delivery_to_apic
does not deliver to any APIC and returns -1.
Because irqe.shorthand == 0, the kvm_for_each_vcpu loop in that
function is never reached. However, you can target the similar loop in
kvm_irq_delivery_to_apic_fast; just program a zero logical destination
address into the IOAPIC, or an out-of-range physical destination address.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 03e7848a64 upstream.
This patch fixes a bug where outstanding RDMA_READs with WRITE_PENDING
status require an extra target_put_sess_cmd() in isert_put_cmd() code
when called from isert_cq_tx_comp_err() + isert_cq_drain_comp_llist()
context during session shutdown.
The extra kref PUT is required so that transport_generic_free_cmd()
invokes the last target_put_sess_cmd() -> target_release_cmd_kref(),
which will complete(&se_cmd->cmd_wait_comp) the outstanding se_cmd
descriptor with WRITE_PENDING status, and awake the completion in
target_wait_for_sess_cmds() to invoke TFO->release_cmd().
The bug was manifesting itself in target_wait_for_sess_cmds() where
a se_cmd descriptor with WRITE_PENDING status would end up sleeping
indefinately.
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2145e15e05 upstream.
Do not leak kernel-only floppy_raw_cmd structure members to userspace.
This includes the linked-list pointer and the pointer to the allocated
DMA space.
Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef87dbe761 upstream.
Always clear out these floppy_raw_cmd struct members after copying the
entire structure from userspace so that the in-kernel version is always
valid and never left in an interdeterminate state.
Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4291086b1f upstream.
The tty atomic_write_lock does not provide an exclusion guarantee for
the tty driver if the termios settings are LECHO & !OPOST. And since
it is unexpected and not allowed to call TTY buffer helpers like
tty_insert_flip_string concurrently, this may lead to crashes when
concurrect writers call pty_write. In that case the following two
writers:
* the ECHOing from a workqueue and
* pty_write from the process
race and can overflow the corresponding TTY buffer like follows.
If we look into tty_insert_flip_string_fixed_flag, there is:
int space = __tty_buffer_request_room(port, goal, flags);
struct tty_buffer *tb = port->buf.tail;
...
memcpy(char_buf_ptr(tb, tb->used), chars, space);
...
tb->used += space;
so the race of the two can result in something like this:
A B
__tty_buffer_request_room
__tty_buffer_request_room
memcpy(buf(tb->used), ...)
tb->used += space;
memcpy(buf(tb->used), ...) ->BOOM
B's memcpy is past the tty_buffer due to the previous A's tb->used
increment.
Since the N_TTY line discipline input processing can output
concurrently with a tty write, obtain the N_TTY ldisc output_lock to
serialize echo output with normal tty writes. This ensures the tty
buffer helper tty_insert_flip_string is not called concurrently and
everything is fine.
Note that this is nicely reproducible by an ordinary user using
forkpty and some setup around that (raw termios + ECHO). And it is
present in kernels at least after commit
d945cb9cce (pty: Rework the pty layer to
use the normal buffering logic) in 2.6.31-rc3.
js: add more info to the commit log
js: switch to bool
js: lock unconditionally
js: lock only the tty->ops->write call
References: CVE-2014-0196
Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b790f210fe upstream.
The sleep function was updated to put the serial port to sleep only when necessary.
This appears to resolve the errant behavior of the driver as described in
Kernel Bug 61961 – "My Exar Corp. XR17C/D152 Dual PCI UART modem does not
work with 3.8.0".
Signed-off-by: Michael Welling <mwelling@ieee.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 501fed45b7 upstream.
When 'console=hvc0' is specified to the kernel parameter in x86 KVM guest,
hvc console is setup within a kthread. However, that will cause SEGV
and the boot will fail when the driver is builtin to the kernel,
because currently hvc_console_setup() is annotated with '__init'. This
patch removes '__init' to boot the guest successfully with 'console=hvc0'.
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eee3f15d5f upstream.
instead of relying on the otg pointer, which
can be NULL in certain cases, we can use the
gadget and host pointers we already hold inside
struct musb.
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e6358fc3c upstream.
We haven't taken i_mutex yet, so we need to use i_size_read().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec4cb1aa2b upstream.
When heavily exercising xattr code the assertion that
jbd2_journal_dirty_metadata() shouldn't return error was triggered:
WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237
jbd2_journal_dirty_metadata+0x1ba/0x260()
CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G W 3.10.0-ceph-00049-g68d04c9 #1
Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
Call Trace:
[<ffffffff816311b0>] dump_stack+0x19/0x1b
[<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
[<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
[<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
[<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
[<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
[<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
[<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
[<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
[<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
[<ffffffff811935ce>] generic_setxattr+0x6e/0x90
[<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
[<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
[<ffffffff8119421e>] setxattr+0x13e/0x1e0
[<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
[<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
[<ffffffff8118c65c>] ? fget_light+0x3c/0x130
[<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
[<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
[<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
[<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
The reason for the warning is that buffer_head passed into
jbd2_journal_dirty_metadata() didn't have journal_head attached. This is
caused by the following race of two ext4_xattr_release_block() calls:
CPU1 CPU2
ext4_xattr_release_block() ext4_xattr_release_block()
lock_buffer(bh);
/* False */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
} else {
le32_add_cpu(&BHDR(bh)->h_refcount, -1);
unlock_buffer(bh);
lock_buffer(bh);
/* True */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
get_bh(bh);
ext4_free_blocks()
...
jbd2_journal_forget()
jbd2_journal_unfile_buffer()
-> JH is gone
error = ext4_handle_dirty_xattr_block(handle, inode, bh);
-> triggers the warning
We fix the problem by moving ext4_handle_dirty_xattr_block() under the
buffer lock. Sadly this cannot be done in nojournal mode as that
function can call sync_dirty_buffer() which would deadlock. Luckily in
nojournal mode the race is harmless (we only dirty already freed buffer)
and thus for nojournal mode we leave the dirtying outside of the buffer
lock.
Reported-by: Sage Weil <sage@inktank.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ded2cf7141 upstream.
There is a race window in dlm_do_recovery() between dlm_remaster_locks()
and dlm_reset_recovery() when the recovery master nearly finish the
recovery process for a dead node. After the master sends FINALIZE_RECO
message in dlm_remaster_locks(), another node may become the recovery
master for another dead node, and then send the BEGIN_RECO message to
all the nodes included the old master, in the handler of this message
dlm_begin_reco_handler() of old master, dlm->reco.dead_node and
dlm->reco.new_master will be set to the second dead node and the new
master, then in dlm_reset_recovery(), these two variables will be reset
to default value. This will cause new recovery master can not finish
the recovery process and hung, at last the whole cluster will hung for
recovery.
old recovery master: new recovery master:
dlm_remaster_locks()
become recovery master for
another dead node.
dlm_send_begin_reco_message()
dlm_begin_reco_handler()
{
if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) {
return -EAGAIN;
}
dlm_set_reco_master(dlm, br->node_idx);
dlm_set_reco_dead_node(dlm, br->dead_node);
}
dlm_reset_recovery()
{
dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
}
will hang in dlm_remaster_locks() for
request dlm locks info
Before send FINALIZE_RECO message, recovery master should set
DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
this can break the race windows as the BEGIN_RECO messages will not be
handled before DLM_RECO_STATE_FINALIZE flag is cleared.
A similar race may happen between new recovery master and normal node
which is in dlm_finalize_reco_handler(), also fix it.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34aa8dac48 upstream.
This issue was introduced by commit 800deef3f6 ("ocfs2: use
list_for_each_entry where benefical") in 2007 where it replaced
list_for_each with list_for_each_entry. The variable "lock" will point
to invalid data if "tmpq" list is empty and a panic will be triggered
due to this. Sunil advised reverting it back, but the old version was
also not right. At the end of the outer for loop, that
list_for_each_entry will also set "lock" to an invalid data, then in the
next loop, if the "tmpq" list is empty, "lock" will be an stale invalid
data and cause the panic. So reverting the list_for_each back and reset
"lock" to NULL to fix this issue.
Another concern is that this seemes can not happen because the "tmpq"
list should not be empty. Let me describe how.
old lock resource owner(node 1): migratation target(node 2):
image there's lockres with a EX lock from node 2 in
granted list, a NR lock from node x with convert_type
EX in converting list.
dlm_empty_lockres() {
dlm_pick_migration_target() {
pick node 2 as target as its lock is the first one
in granted list.
}
dlm_migrate_lockres() {
dlm_mark_lockres_migrating() {
res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
//after the above code, we can not dirty lockres any more,
// so dlm_thread shuffle list will not run
downconvert lock from EX to NR
upconvert lock from NR to EX
<<< migration may schedule out here, then
<<< node 2 send down convert request to convert type from EX to
<<< NR, then send up convert request to convert type from NR to
<<< EX, at this time, lockres granted list is empty, and two locks
<<< in the converting list, node x up convert lock followed by
<<< node 2 up convert lock.
// will set lockres RES_MIGRATING flag, the following
// lock/unlock can not run
dlm_lockres_release_ast(dlm, res);
}
dlm_send_one_lockres()
dlm_process_recovery_data()
for (i=0; i<mres->num_locks; i++)
if (ml->node == dlm->node_num)
for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
list_for_each_entry(lock, tmpq, list)
if (lock) break; <<< lock is invalid as grant list is empty.
}
if (lock->ml.node != ml->node)
BUG() >>> crash here
}
I see the above locks status from a vmcore of our internal bug.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 80df284765 upstream.
As sysctl_hung_task_timeout_sec is unsigned long, when this value is
larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in
watchdog will return immediately without sleep and with print :
schedule_timeout: wrong timeout value ffffffffffffff83
and then the funtion watchdog will call schedule_timeout_interruptible
again and again. The screen will be filled with
"schedule_timeout: wrong timeout value ffffffffffffff83"
This patch does some check and correction in sysctl, to let the function
schedule_timeout_interruptible allways get the valid parameter.
Signed-off-by: Liu Hua <sdu.liu@huawei.com>
Tested-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55f67141a8 upstream.
When I decrease the value of nr_hugepage in procfs a lot, softlockup
happens. It is because there is no chance of context switch during this
process.
On the other hand, when I allocate a large number of hugepages, there is
some chance of context switch. Hence softlockup doesn't happen during
this process. So it's necessary to add the context switch in the
freeing process as same as allocating process to avoid softlockup.
When I freed 12 TB hugapages with kernel-2.6.32-358.el6, the freeing
process occupied a CPU over 150 seconds and following softlockup message
appeared twice or more.
$ echo 6000000 > /proc/sys/vm/nr_hugepages
$ cat /proc/sys/vm/nr_hugepages
6000000
$ grep ^Huge /proc/meminfo
HugePages_Total: 6000000
HugePages_Free: 6000000
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
$ echo 0 > /proc/sys/vm/nr_hugepages
BUG: soft lockup - CPU#16 stuck for 67s! [sh:12883] ...
Pid: 12883, comm: sh Not tainted 2.6.32-358.el6.x86_64 #1
Call Trace:
free_pool_huge_page+0xb8/0xd0
set_max_huge_pages+0x128/0x190
hugetlb_sysctl_handler_common+0x113/0x140
hugetlb_sysctl_handler+0x1e/0x20
proc_sys_call_handler+0x97/0xd0
proc_sys_write+0x14/0x20
vfs_write+0xb8/0x1a0
sys_write+0x51/0x90
__audit_syscall_exit+0x265/0x290
system_call_fastpath+0x16/0x1b
I have not confirmed this problem with upstream kernels because I am not
able to prepare the machine equipped with 12TB memory now. However I
confirmed that the amount of decreasing hugepages was directly
proportional to the amount of required time.
I measured required times on a smaller machine. It showed 130-145
hugepages decreased in a millisecond.
Amount of decreasing Required time Decreasing rate
hugepages (msec) (pages/msec)
------------------------------------------------------------
10,000 pages == 20GB 70 - 74 135-142
30,000 pages == 60GB 208 - 229 131-144
It means decrement of 6TB hugepages will trigger softlockup with the
default threshold 20sec, in this decreasing rate.
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57e68e9cd6 upstream.
A BUG_ON(!PageLocked) was triggered in mlock_vma_page() by Sasha Levin
fuzzing with trinity. The call site try_to_unmap_cluster() does not lock
the pages other than its check_page parameter (which is already locked).
The BUG_ON in mlock_vma_page() is not documented and its purpose is
somewhat unclear, but apparently it serializes against page migration,
which could otherwise fail to transfer the PG_mlocked flag. This would
not be fatal, as the page would be eventually encountered again, but
NR_MLOCK accounting would become distorted nevertheless. This patch adds
a comment to the BUG_ON in mlock_vma_page() and munlock_vma_page() to that
effect.
The call site try_to_unmap_cluster() is fixed so that for page !=
check_page, trylock_page() is attempted (to avoid possible deadlocks as we
already have check_page locked) and mlock_vma_page() is performed only
upon success. If the page lock cannot be obtained, the page is left
without PG_mlocked, which is again not a problem in the whole unevictable
memory design.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1044b1bb92 upstream.
We need to set the queue bounce limit during the device initialization to
prevent excessive bouncing on 32 bit architectures.
Signed-off-by: Felipe Franciosi <felipe@paradoxo.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6aec044cc2 upstream.
When a driver doesn't have pre_reset, post_reset, or reset_resume
methods, the USB core unbinds that driver when its device undergoes a
reset or a reset-resume, and then rebinds it afterward.
The existing straightforward implementation can lead to problems,
because each interface gets unbound and rebound before the next
interface is handled. If a driver claims additional interfaces, the
claim may fail because the old binding instance may still own the
additional interface when the new instance tries to claim it.
This patch fixes the problem by first unbinding all the interfaces
that are marked (i.e., their needs_binding flag is set) and then
rebinding all of them.
The patch also makes the helper functions in driver.c a little more
uniform and adjusts some out-of-date comments.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: "Poulain, Loic" <loic.poulain@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f76a1cbed1 upstream.
Commit 3e6c6f630a ("Delay creation of
khcvd thread") moved the call of hvc_init from being a device_initcall
into hvc_alloc, and used a non-null hvc_driver as indication of whether
hvc_init had already been called.
The problem with this is that hvc_driver is only assigned a value
at the bottom of hvc_init, and so there is a window where multiple
hvc_alloc calls can be in progress at the same time and hence try
and call hvc_init multiple times. Previously the use of device_init
guaranteed that hvc_init was only called once.
This manifests itself as sporadic instances of two hvc_init calls
racing each other, and with the loser of the race getting -EBUSY
from tty_register_driver() and hence that virtual console fails:
Couldn't register hvc console driver
virtio-ports vport0p1: error -16 allocating hvc for port
Here we add an atomic_t to guarantee we'll never run hvc_init twice.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 3e6c6f630a ("Delay creation of khcvd thread")
Reported-by: Jim Somerville <Jim.Somerville@windriver.com>
Tested-by: Jim Somerville <Jim.Somerville@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06f9b6e596 upstream.
Around DWC USB3 2.30a release another bit has been added to the
Device-Specific Event (DEVT) Event Information (EvtInfo) bitfield.
Because of that, what used to be 8 bits long, has become 9 bits long.
Per dwc3 2.30a+ spec in the Device-Specific Event (DEVT), the field of
Event Information Bits(EvtInfo) uses [24:16] bits, and it has 9 bits
not 8 bits. And the following reserved field uses [31:25] bits not
[31:24] bits, and it has 7 bits.
So in dwc3_event_devt, the bit mask should be:
event_info [24:16] 9 bits
reserved31_25 [31:25] 7 bits
This patch makes sure that newer core releases will work fine with
Linux and that we will decode the event information properly on new
core releases.
[ balbi@ti.com : improve commit log a bit ]
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8b57b9669a upstream.
Commit 3fdfedaaa "[media] omap3isp: preview: Lower the crop margins"
accidentally changed the previewer's cropping, causing the previewer
to miss four pixels on each line, thus corrupting the final image.
Restored the removed setting.
Signed-off-by: Florian Vaussard <florian.vaussard@epfl.ch>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ec40dcfb4 upstream.
Pointer to device state has been moved to different location during
some change. PCTV 290e LNA function still uses old pointer, carried
over FE priv, and it crash.
Reported-by: Janne Kujanpää <jikuja@iki.fi>
Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8272d0a0c0 upstream.
Add m88rs2000_get_tune_settings, min delay of 2000 ms on symbol
rate more than 3000000 and delay of 3000ms less than this.
Adding min delay prevents crashing the frontend on continuous
transponder scans. Other dvb_frontend_tune_settings remain as default.
This makes very little time difference to good channel scans, but slows down
the set frontend where lock can never be achieved i.e. DVB-S2.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1e43f2326 upstream.
The UVC specification uses alternate setting selection to notify devices
of stream start/stop. This breaks when using bulk-based devices, as the
video streaming interface has a single alternate setting in that case,
making video stream start and video stream stop events to appear
identical to the device. Bulk-based devices are thus not well supported
by UVC.
The webcam built in the Asus Zenbook UX302LA ignores the set interface
request and will keep the video stream enabled when the driver tries to
stop it. If USB autosuspend is enabled the device will then be suspended
and will crash, requiring a cold reboot.
USB trace capture showed that Windows sends a CLEAR_FEATURE(HALT)
request to the bulk endpoint when stopping the stream instead of
selecting alternate setting 0. The camera then behaves correctly, and
thus seems to require that behaviour.
Replace selection of alternate setting 0 with clearing of the endpoint
halt feature at video stream stop for bulk-based devices. Let's refrain
from blaming Microsoft this time, as it's not clear whether this
Windows-specific but USB-compliant behaviour was specifically developed
to handle bulkd-based UVC devices, or if the camera just took advantage
of it.
Signed-off-by: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>