Page table entries on ARM64 are 64 bits, and some pte functions such as
pte_dirty return a bitwise-and of a flag with the pte value. If the
flag to be tested resides in the upper 32 bits of the pte, then we run
into the danger of the result being dropped if downcast.
For example:
gather_stats(page, md, pte_dirty(*pte), 1);
where pte_dirty(*pte) is downcast to an int.
This patch adds a double logical invert to all the pte_ accessors to
ensure predictable downcasting.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We have the following means for encoding writable or dirty ptes:
PTE_DIRTY PTE_RDONLY
!pte_dirty && !pte_write 0 1
!pte_dirty && pte_write 0 1
pte_dirty && !pte_write 1 1
pte_dirty && pte_write 1 0
So we can't distinguish between writable clean ptes and read only
ptes. This can cause problems with ptes being incorrectly flagged as
read only when they are writable but not dirty.
This patch introduces a new software bit PTE_WRITE which allows us to
correctly identify writable ptes. PTE_RDONLY is now only clear for
valid ptes where a page is both writable and dirty.
Signed-off-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Conflicts:
arch/arm64/include/asm/pgtable.h
Expand out the pte manipulation functions. This makes our life easier
when using things like tags and cscope.
Signed-off-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
PTE_PROT_NONE means that a pte is present but does not have any
read/write attributes. However, setting the memory type like
pgprot_writecombine() is allowed and such bits overlap with
PTE_PROT_NONE. This causes mmap/munmap issues in drivers that change the
vma->vm_pg_prot on PROT_NONE mappings.
This patch reverts the PTE_FILE/PTE_PROT_NONE shift in commit
59911ca432 (ARM64: mm: Move PTE_PROT_NONE bit) and moves PTE_PROT_NONE
together with the other software bits.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Steve Capper <steve.capper@linaro.org>
Cc: <stable@vger.kernel.org> # 3.11+
Bring Transparent HugePage support to ARM. The size of a
transparent huge page depends on the normal page size. A
transparent huge page is always represented as a pmd.
If PAGE_SIZE is 4KB, THPs are 2MB.
If PAGE_SIZE is 64KB, THPs are 512MB.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
The buddy allocator has a default MAX_ORDER of 11, which is too
low to allocate enough memory for 512MB Transparent HugePages if
our base page size is 64KB.
This patch introduces MAX_ZONE_ORDER and sets it to 14 when 64KB
pages are used in conjuction with THP, otherwise the default value
of 11 is used.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Add huge page support to ARM64, different huge page sizes are
supported depending on the size of normal pages:
PAGE_SIZE is 4KB:
2MB - (pmds) these can be allocated at any time.
1024MB - (puds) usually allocated on bootup with the command line
with something like: hugepagesz=1G hugepages=6
PAGE_SIZE is 64KB:
512MB - (pmds) usually allocated on bootup via command line.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Under ARM64, PTEs can be broadly categorised as follows:
- Present and valid: Bit #0 is set. The PTE is valid and memory
access to the region may fault.
- Present and invalid: Bit #0 is clear and bit #1 is set.
Represents present memory with PROT_NONE protection. The PTE
is an invalid entry, and the user fault handler will raise a
SIGSEGV.
- Not present (file or swap): Bits #0 and #1 are clear.
Memory represented has been paged out. The PTE is an invalid
entry, and the fault handler will try and re-populate the
memory where necessary.
Huge PTEs are block descriptors that have bit #1 clear. If we wish
to represent PROT_NONE huge PTEs we then run into a problem as
there is no way to distinguish between regular and huge PTEs if we
set bit #1.
To resolve this ambiguity this patch moves PTE_PROT_NONE from
bit #1 to bit #2 and moves PTE_FILE from bit #2 to bit #3. The
number of swap/file bits is reduced by 1 as a consequence, leaving
60 bits for file and swap entries.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
If we consider the following code sequence:
my_pte = pte_modify(entry, myprot);
x = pte_write(my_pte);
y = pte_exec(my_pte);
If myprot comes from a PROT_NONE page, then x and y will both be
true which is undesireable behaviour.
This patch sets the no-execute and read-only bits for PAGE_NONE
such that the code above will return false for both x and y.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
In paging_init the memblock limit is set to restrict any addresses
returned by early_alloc to fit within the initial direct kernel
mapping in swapper_pg_dir. This allows map_mem to allocate puds,
pmds and ptes from the initial direct kernel mapping.
The limit stays low after paging_init() though, meaning any
bootmem allocations will be from a restricted subset of memory.
Gigabyte huge pages, for instance, are normally allocated from
bootmem as their order (18) is too large for the default buddy
allocator (MAX_ORDER = 11).
This patch restores the memblock limit when map_mem has finished,
allowing gigabyte huge pages (and other objects) to be allocated
from all of bootmem.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
All Transparent Huge Pages are allocated by the buddy allocator.
A compile time check is in place that fails when the order of a
transparent huge page is too large to be allocated by the buddy
allocator. Unfortunately that compile time check passes when:
HPAGE_PMD_ORDER == MAX_ORDER
( which is incorrect as the buddy allocator can only allocate
memory of order strictly less than MAX_ORDER. )
This patch updates the compile time check to fail in the above
case.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
huge_pte_alloc, huge_pte_offset and follow_huge_p[mu]d have
already been copied over to mm.
This patch removes the x86 copies of these functions and activates
the general ones by enabling:
CONFIG_ARCH_WANT_GENERAL_HUGETLB
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
The huge_pte_alloc, huge_pte_offset and follow_huge_p[mu]d
functions in x86/mm/hugetlbpage.c do not rely on any architecture
specific knowledge other than the fact that pmds and puds can be
treated as huge ptes.
To allow other architectures to use this code (and reduce the need
for code duplication), this patch copies these functions into mm,
replaces the use of pud_large with pud_huge and provides a config
flag to activate them:
CONFIG_ARCH_WANT_GENERAL_HUGETLB
If CONFIG_ARCH_WANT_HUGE_PMD_SHARE is also active then the
huge_pmd_share code will be called by huge_pte_alloc (othewise we
call pmd_alloc and skip the sharing code).
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
The huge_pmd_share code has been copied over to mm/hugetlb.c to
make it accessible to other architectures.
Remove the x86 copy of the huge_pmd_share code and enable the
ARCH_WANT_HUGE_PMD_SHARE config flag. That way we reference the
general one.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Under x86, multiple puds can be made to reference the same bank of
huge pmds provided that they represent a full PUD_SIZE of shared
huge memory that is aligned to a PUD_SIZE boundary.
The code to share pmds does not require any architecture specific
knowledge other than the fact that pmds can be indexed, thus can
be beneficial to some other architectures.
This patch copies the huge pmd sharing (and unsharing) logic from
x86/ to mm/ and introduces a new config option to activate it:
CONFIG_ARCH_WANTS_HUGE_PMD_SHARE
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Since newer DT bindings include references to include/dt-bindings we need
to make this available to build DTs using them. Upstream has a number of
reworkings which are much more invasive but featureful, just include a
minimal fix.
Signed-off-by: Mark Brown <broonie@linaro.org>
[Squashed down several commits from dev repository, reused vexpress
config option -- broonie]
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
All platforms other than TC2 default to enabling packing. Since TC2
shows no performance or energy degradation with this feature enabled
make it default enabled the same as everyone else.
Likewise, vendors have been including TC2 support in multi-machine
kernel builds so they expect the default thresholds to remain the
same when the TC2 #ifdef is removed.
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
commit eee3f15d5f upstream.
instead of relying on the otg pointer, which
can be NULL in certain cases, we can use the
gadget and host pointers we already hold inside
struct musb.
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e6358fc3c upstream.
We haven't taken i_mutex yet, so we need to use i_size_read().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec4cb1aa2b upstream.
When heavily exercising xattr code the assertion that
jbd2_journal_dirty_metadata() shouldn't return error was triggered:
WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237
jbd2_journal_dirty_metadata+0x1ba/0x260()
CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G W 3.10.0-ceph-00049-g68d04c9 #1
Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
Call Trace:
[<ffffffff816311b0>] dump_stack+0x19/0x1b
[<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
[<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
[<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
[<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
[<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
[<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
[<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
[<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
[<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
[<ffffffff811935ce>] generic_setxattr+0x6e/0x90
[<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
[<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
[<ffffffff8119421e>] setxattr+0x13e/0x1e0
[<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
[<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
[<ffffffff8118c65c>] ? fget_light+0x3c/0x130
[<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
[<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
[<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
[<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
The reason for the warning is that buffer_head passed into
jbd2_journal_dirty_metadata() didn't have journal_head attached. This is
caused by the following race of two ext4_xattr_release_block() calls:
CPU1 CPU2
ext4_xattr_release_block() ext4_xattr_release_block()
lock_buffer(bh);
/* False */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
} else {
le32_add_cpu(&BHDR(bh)->h_refcount, -1);
unlock_buffer(bh);
lock_buffer(bh);
/* True */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
get_bh(bh);
ext4_free_blocks()
...
jbd2_journal_forget()
jbd2_journal_unfile_buffer()
-> JH is gone
error = ext4_handle_dirty_xattr_block(handle, inode, bh);
-> triggers the warning
We fix the problem by moving ext4_handle_dirty_xattr_block() under the
buffer lock. Sadly this cannot be done in nojournal mode as that
function can call sync_dirty_buffer() which would deadlock. Luckily in
nojournal mode the race is harmless (we only dirty already freed buffer)
and thus for nojournal mode we leave the dirtying outside of the buffer
lock.
Reported-by: Sage Weil <sage@inktank.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ded2cf7141 upstream.
There is a race window in dlm_do_recovery() between dlm_remaster_locks()
and dlm_reset_recovery() when the recovery master nearly finish the
recovery process for a dead node. After the master sends FINALIZE_RECO
message in dlm_remaster_locks(), another node may become the recovery
master for another dead node, and then send the BEGIN_RECO message to
all the nodes included the old master, in the handler of this message
dlm_begin_reco_handler() of old master, dlm->reco.dead_node and
dlm->reco.new_master will be set to the second dead node and the new
master, then in dlm_reset_recovery(), these two variables will be reset
to default value. This will cause new recovery master can not finish
the recovery process and hung, at last the whole cluster will hung for
recovery.
old recovery master: new recovery master:
dlm_remaster_locks()
become recovery master for
another dead node.
dlm_send_begin_reco_message()
dlm_begin_reco_handler()
{
if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) {
return -EAGAIN;
}
dlm_set_reco_master(dlm, br->node_idx);
dlm_set_reco_dead_node(dlm, br->dead_node);
}
dlm_reset_recovery()
{
dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
}
will hang in dlm_remaster_locks() for
request dlm locks info
Before send FINALIZE_RECO message, recovery master should set
DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
this can break the race windows as the BEGIN_RECO messages will not be
handled before DLM_RECO_STATE_FINALIZE flag is cleared.
A similar race may happen between new recovery master and normal node
which is in dlm_finalize_reco_handler(), also fix it.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34aa8dac48 upstream.
This issue was introduced by commit 800deef3f6 ("ocfs2: use
list_for_each_entry where benefical") in 2007 where it replaced
list_for_each with list_for_each_entry. The variable "lock" will point
to invalid data if "tmpq" list is empty and a panic will be triggered
due to this. Sunil advised reverting it back, but the old version was
also not right. At the end of the outer for loop, that
list_for_each_entry will also set "lock" to an invalid data, then in the
next loop, if the "tmpq" list is empty, "lock" will be an stale invalid
data and cause the panic. So reverting the list_for_each back and reset
"lock" to NULL to fix this issue.
Another concern is that this seemes can not happen because the "tmpq"
list should not be empty. Let me describe how.
old lock resource owner(node 1): migratation target(node 2):
image there's lockres with a EX lock from node 2 in
granted list, a NR lock from node x with convert_type
EX in converting list.
dlm_empty_lockres() {
dlm_pick_migration_target() {
pick node 2 as target as its lock is the first one
in granted list.
}
dlm_migrate_lockres() {
dlm_mark_lockres_migrating() {
res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
//after the above code, we can not dirty lockres any more,
// so dlm_thread shuffle list will not run
downconvert lock from EX to NR
upconvert lock from NR to EX
<<< migration may schedule out here, then
<<< node 2 send down convert request to convert type from EX to
<<< NR, then send up convert request to convert type from NR to
<<< EX, at this time, lockres granted list is empty, and two locks
<<< in the converting list, node x up convert lock followed by
<<< node 2 up convert lock.
// will set lockres RES_MIGRATING flag, the following
// lock/unlock can not run
dlm_lockres_release_ast(dlm, res);
}
dlm_send_one_lockres()
dlm_process_recovery_data()
for (i=0; i<mres->num_locks; i++)
if (ml->node == dlm->node_num)
for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
list_for_each_entry(lock, tmpq, list)
if (lock) break; <<< lock is invalid as grant list is empty.
}
if (lock->ml.node != ml->node)
BUG() >>> crash here
}
I see the above locks status from a vmcore of our internal bug.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 80df284765 upstream.
As sysctl_hung_task_timeout_sec is unsigned long, when this value is
larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in
watchdog will return immediately without sleep and with print :
schedule_timeout: wrong timeout value ffffffffffffff83
and then the funtion watchdog will call schedule_timeout_interruptible
again and again. The screen will be filled with
"schedule_timeout: wrong timeout value ffffffffffffff83"
This patch does some check and correction in sysctl, to let the function
schedule_timeout_interruptible allways get the valid parameter.
Signed-off-by: Liu Hua <sdu.liu@huawei.com>
Tested-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55f67141a8 upstream.
When I decrease the value of nr_hugepage in procfs a lot, softlockup
happens. It is because there is no chance of context switch during this
process.
On the other hand, when I allocate a large number of hugepages, there is
some chance of context switch. Hence softlockup doesn't happen during
this process. So it's necessary to add the context switch in the
freeing process as same as allocating process to avoid softlockup.
When I freed 12 TB hugapages with kernel-2.6.32-358.el6, the freeing
process occupied a CPU over 150 seconds and following softlockup message
appeared twice or more.
$ echo 6000000 > /proc/sys/vm/nr_hugepages
$ cat /proc/sys/vm/nr_hugepages
6000000
$ grep ^Huge /proc/meminfo
HugePages_Total: 6000000
HugePages_Free: 6000000
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
$ echo 0 > /proc/sys/vm/nr_hugepages
BUG: soft lockup - CPU#16 stuck for 67s! [sh:12883] ...
Pid: 12883, comm: sh Not tainted 2.6.32-358.el6.x86_64 #1
Call Trace:
free_pool_huge_page+0xb8/0xd0
set_max_huge_pages+0x128/0x190
hugetlb_sysctl_handler_common+0x113/0x140
hugetlb_sysctl_handler+0x1e/0x20
proc_sys_call_handler+0x97/0xd0
proc_sys_write+0x14/0x20
vfs_write+0xb8/0x1a0
sys_write+0x51/0x90
__audit_syscall_exit+0x265/0x290
system_call_fastpath+0x16/0x1b
I have not confirmed this problem with upstream kernels because I am not
able to prepare the machine equipped with 12TB memory now. However I
confirmed that the amount of decreasing hugepages was directly
proportional to the amount of required time.
I measured required times on a smaller machine. It showed 130-145
hugepages decreased in a millisecond.
Amount of decreasing Required time Decreasing rate
hugepages (msec) (pages/msec)
------------------------------------------------------------
10,000 pages == 20GB 70 - 74 135-142
30,000 pages == 60GB 208 - 229 131-144
It means decrement of 6TB hugepages will trigger softlockup with the
default threshold 20sec, in this decreasing rate.
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57e68e9cd6 upstream.
A BUG_ON(!PageLocked) was triggered in mlock_vma_page() by Sasha Levin
fuzzing with trinity. The call site try_to_unmap_cluster() does not lock
the pages other than its check_page parameter (which is already locked).
The BUG_ON in mlock_vma_page() is not documented and its purpose is
somewhat unclear, but apparently it serializes against page migration,
which could otherwise fail to transfer the PG_mlocked flag. This would
not be fatal, as the page would be eventually encountered again, but
NR_MLOCK accounting would become distorted nevertheless. This patch adds
a comment to the BUG_ON in mlock_vma_page() and munlock_vma_page() to that
effect.
The call site try_to_unmap_cluster() is fixed so that for page !=
check_page, trylock_page() is attempted (to avoid possible deadlocks as we
already have check_page locked) and mlock_vma_page() is performed only
upon success. If the page lock cannot be obtained, the page is left
without PG_mlocked, which is again not a problem in the whole unevictable
memory design.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1044b1bb92 upstream.
We need to set the queue bounce limit during the device initialization to
prevent excessive bouncing on 32 bit architectures.
Signed-off-by: Felipe Franciosi <felipe@paradoxo.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6aec044cc2 upstream.
When a driver doesn't have pre_reset, post_reset, or reset_resume
methods, the USB core unbinds that driver when its device undergoes a
reset or a reset-resume, and then rebinds it afterward.
The existing straightforward implementation can lead to problems,
because each interface gets unbound and rebound before the next
interface is handled. If a driver claims additional interfaces, the
claim may fail because the old binding instance may still own the
additional interface when the new instance tries to claim it.
This patch fixes the problem by first unbinding all the interfaces
that are marked (i.e., their needs_binding flag is set) and then
rebinding all of them.
The patch also makes the helper functions in driver.c a little more
uniform and adjusts some out-of-date comments.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: "Poulain, Loic" <loic.poulain@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f76a1cbed1 upstream.
Commit 3e6c6f630a ("Delay creation of
khcvd thread") moved the call of hvc_init from being a device_initcall
into hvc_alloc, and used a non-null hvc_driver as indication of whether
hvc_init had already been called.
The problem with this is that hvc_driver is only assigned a value
at the bottom of hvc_init, and so there is a window where multiple
hvc_alloc calls can be in progress at the same time and hence try
and call hvc_init multiple times. Previously the use of device_init
guaranteed that hvc_init was only called once.
This manifests itself as sporadic instances of two hvc_init calls
racing each other, and with the loser of the race getting -EBUSY
from tty_register_driver() and hence that virtual console fails:
Couldn't register hvc console driver
virtio-ports vport0p1: error -16 allocating hvc for port
Here we add an atomic_t to guarantee we'll never run hvc_init twice.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 3e6c6f630a ("Delay creation of khcvd thread")
Reported-by: Jim Somerville <Jim.Somerville@windriver.com>
Tested-by: Jim Somerville <Jim.Somerville@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06f9b6e596 upstream.
Around DWC USB3 2.30a release another bit has been added to the
Device-Specific Event (DEVT) Event Information (EvtInfo) bitfield.
Because of that, what used to be 8 bits long, has become 9 bits long.
Per dwc3 2.30a+ spec in the Device-Specific Event (DEVT), the field of
Event Information Bits(EvtInfo) uses [24:16] bits, and it has 9 bits
not 8 bits. And the following reserved field uses [31:25] bits not
[31:24] bits, and it has 7 bits.
So in dwc3_event_devt, the bit mask should be:
event_info [24:16] 9 bits
reserved31_25 [31:25] 7 bits
This patch makes sure that newer core releases will work fine with
Linux and that we will decode the event information properly on new
core releases.
[ balbi@ti.com : improve commit log a bit ]
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8b57b9669a upstream.
Commit 3fdfedaaa "[media] omap3isp: preview: Lower the crop margins"
accidentally changed the previewer's cropping, causing the previewer
to miss four pixels on each line, thus corrupting the final image.
Restored the removed setting.
Signed-off-by: Florian Vaussard <florian.vaussard@epfl.ch>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ec40dcfb4 upstream.
Pointer to device state has been moved to different location during
some change. PCTV 290e LNA function still uses old pointer, carried
over FE priv, and it crash.
Reported-by: Janne Kujanpää <jikuja@iki.fi>
Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>