Add the update_rq_clock() call at the top of the callstack instead of
at the bottom where we find it missing, this to aid later effort to
minimize the number of update_rq_lock() calls.
WARNING: CPU: 30 PID: 194 at ../kernel/sched/sched.h:797 assert_clock_updated()
rq->clock_update_flags < RQCF_ACT_SKIP
Call Trace:
dump_stack()
__warn()
warn_slowpath_fmt()
assert_clock_updated.isra.63.part.64()
can_migrate_task()
load_balance()
pick_next_task_fair()
__schedule()
schedule()
worker_thread()
kthread()
Change-Id: I17716141789f9fac709495951cd2e079cf49d6d8
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 3bed5e2166)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Instead of adding the update_rq_clock() all the way at the bottom of
the callstack, add one at the top, this to aid later effort to
minimize update_rq_lock() calls.
WARNING: CPU: 0 PID: 1 at ../kernel/sched/sched.h:797 detach_task_cfs_rq()
rq->clock_update_flags < RQCF_ACT_SKIP
Call Trace:
dump_stack()
__warn()
warn_slowpath_fmt()
detach_task_cfs_rq()
switched_from_fair()
__sched_setscheduler()
_sched_setscheduler()
sched_set_stop_task()
cpu_stop_create()
__smpboot_create_thread.part.2()
smpboot_register_percpu_thread_cpumask()
cpu_stop_init()
do_one_initcall()
? print_cpu_info()
kernel_init_freeable()
? rest_init()
kernel_init()
ret_from_fork()
Change-Id: Iee08c2ed3303ae8f0c527658f13646b02a412cad
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 80f5c1b84b)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
When using schedfreq on cpus with max capacity significantly smaller than
1024, the tick update uses non-normalised capacities - this leads to
selecting an incorrect OPP as we were scaling the frequency as if the
max capacity achievable was 1024 rather than the max for that particular
cpu or group. This could result in a cpu being stuck at the lowest OPP
and unable to generate enough utilisation to climb out if the max
capacity is significantly smaller than 1024.
Instead, normalize the capacity to be in the range 0-1024 in the tick
so that when we later select a frequency, we get the correct one.
Also comments updated to be clearer about what is needed.
Change-Id: Id84391c7ac015311002ada21813a353ee13bee60
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit bc3b0db024f3d0b323e7d93994655e9e3d9f8d68)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
When we convert capacity into frequency, we used policy->max to get
the max freq of the cpu. Since this can be changed by userspace policy
or thermal events, we are potentially asking for a lower frequency
than the utilization demands.
Change over to using cpuinfo.max which is the max freq supported by
that cpu rather than the currently-chosen max. Frequency granted still
honours the max policy.
Tested by setting a userspace policy and observing the relevant vars
in a trace. In this instance, we ask for around 1ghz instead of 620MHz.
freq_new=1013512
unfixed_freq_new=624487
capacity=546
cpuinfo_max=1900800
policy_max=1171200
Change-Id: I8c5694db42243c6fb78bb9be9046b06ac81295e7
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit 114c6ceb2e5b0e30442e2ba5f78cfaedf76bf1f9)
[trivial cherry-pick issue]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
The usx2y driver allocates the stream read/write buffers in continuous
pages depending on the stream setup, and this may spew the kernel
warning messages with a stack trace like:
WARNING: CPU: 1 PID: 1846 at mm/page_alloc.c:3883
__alloc_pages_slowpath+0x1ef2/0x2d70
Modules linked in:
CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted
....
It may confuse user as if it were any serious error, although this is
no fatal error and the driver handles the error case gracefully.
Since the driver has already some sanity check of the given size (128
and 256 pages), it can't pass any crazy value. So it's merely page
fragmentation.
This patch adds __GFP_NOWARN to each caller for suppressing such
kernel warnings. The original issue was spotted by syzkaller.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andres Oportus <andresoportus@google.com>
Bug: 64827499
Url: https://lkml.org/lkml/2017/10/2/234
Change-Id: I32a4b702cd5c9cd4b82328ac6bc87c435146d44d
We should avoid using the space character when passing arguments to
clang, because static code analysis check tool such as sparse may
misinterpret the arguments followed by spaces as build targets hence
cause the build to fail.
Signed-off-by: David Lin <dtwlin@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
(cherry picked from linux-kbuild commit bb3f38c3c5)
Bug: 66969589
Change-Id: I055719cbc89e7a12d1f46f0a9c2152738a278d2a
[ghackmann@google.com: tweak to preserve AOSP-specific CLANG_TRIPLE]
Signed-off-by: Greg Hackmann <ghackmann@google.com>
In isolation, the original change will break building efistub for ARM64
with gcc. This wasn't an issue upstream due to the earlier change
60f38de7a8 ("efi/libstub: Unify command line param parsing"). That's
now been backported to AOSP too.
This reverts commit 36733457a3.
Change-Id: I99592a19c77821df897bba966b5486cb835745f9
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Changes in 4.9.59
USB: devio: Revert "USB: devio: Don't corrupt user memory"
USB: core: fix out-of-bounds access bug in usb_get_bos_descriptor()
USB: serial: metro-usb: add MS7820 device id
usb: cdc_acm: Add quirk for Elatec TWN3
usb: quirks: add quirk for WORLDE MINI MIDI keyboard
usb: hub: Allow reset retry for USB2 devices on connect bounce
ALSA: usb-audio: Add native DSD support for Pro-Ject Pre Box S2 Digital
can: gs_usb: fix busy loop if no more TX context is available
parisc: Fix double-word compare and exchange in LWS code on 32-bit kernels
iio: dummy: events: Add missing break
usb: musb: sunxi: Explicitly release USB PHY on exit
usb: musb: Check for host-mode using is_host_active() on reset interrupt
xhci: Identify USB 3.1 capable hosts by their port protocol capability
can: esd_usb2: Fix can_dlc value for received RTR, frames
drm/nouveau/bsp/g92: disable by default
drm/nouveau/mmu: flush tlbs before deleting page tables
ALSA: seq: Enable 'use' locking in all configurations
ALSA: hda: Remove superfluous '-' added by printk conversion
ALSA: hda: Abort capability probe at invalid register read
i2c: ismt: Separate I2C block read from SMBus block read
i2c: piix4: Fix SMBus port selection for AMD Family 17h chips
brcmfmac: Add check for short event packets
brcmsmac: make some local variables 'static const' to reduce stack size
bus: mbus: fix window size calculation for 4GB windows
clockevents/drivers/cs5535: Improve resilience to spurious interrupts
rtlwifi: rtl8821ae: Fix connection lost problem
x86/microcode/intel: Disable late loading on model 79
KEYS: encrypted: fix dereference of NULL user_key_payload
lib/digsig: fix dereference of NULL user_key_payload
KEYS: don't let add_key() update an uninstantiated key
pkcs7: Prevent NULL pointer dereference, since sinfo is not always set.
vmbus: fix missing signaling in hv_signal_on_read()
xfs: don't unconditionally clear the reflink flag on zero-block files
xfs: evict CoW fork extents when performing finsert/fcollapse
fs/xfs: Use %pS printk format for direct addresses
xfs: report zeroed or not correctly in xfs_zero_range()
xfs: update i_size after unwritten conversion in dio completion
xfs: perag initialization should only touch m_ag_max_usable for AG 0
xfs: Capture state of the right inode in xfs_iflush_done
xfs: always swap the cow forks when swapping extents
xfs: handle racy AIO in xfs_reflink_end_cow
xfs: Don't log uninitialised fields in inode structures
xfs: move more RT specific code under CONFIG_XFS_RT
xfs: don't change inode mode if ACL update fails
xfs: reinit btree pointer on attr tree inactivation walk
xfs: handle error if xfs_btree_get_bufs fails
xfs: cancel dirty pages on invalidation
xfs: trim writepage mapping to within eof
fscrypt: fix dereference of NULL user_key_payload
KEYS: Fix race between updating and finding a negative key
FS-Cache: fix dereference of NULL user_key_payload
Linux 4.9.59
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
On systems using ACPI cpufreq, at boot-time only CPU0 max_freq has been
populated when load_scale_factor/capacity recomputation is triggered on
all possible cpus. This leads to the following divide by zero panic when
rq->max_freq for any cpu other than CPU0 is accessed:
[ 4.827597] divide error: 0000 [#1] PREEMPT SMP
[ 4.827604] [<ffffffff862ad88e>] ? notifier_call_chain+0x37/0x57
[ 4.827607] [<ffffffff862adaf3>] ? __blocking_notifier_call_chain+0x46/0x5c
[ 4.827611] [<ffffffff8684b9f4>] ? cpufreq_set_policy+0xa6/0x2dc
[ 4.827614] [<ffffffff869e4d55>] ? cpufreq_init_policy+0xba/0xde
[ 4.827617] [<ffffffff8684bd4c>] ? cpufreq_update_policy+0x122/0x122
[ 4.827620] [<ffffffff8684c51c>] ? cpufreq_online+0x5a7/0x64e
[ 4.827623] [<ffffffff8684c63b>] ? cpufreq_add_dev+0x78/0xed
[ 4.827627] [<ffffffff866ce412>] ? subsys_interface_register+0xd6/0x117
[ 4.827630] [<ffffffff8684b410>] ? cpufreq_register_driver+0x168/0x2b2
[ 4.827632] [<ffffffff8684b410>] ? cpufreq_register_driver+0x168/0x2b2
[ 4.827636] [<ffffffff86f83b2e>] ? cpufreq_gov_dbs_init+0xc/0xc
[ 4.827639] [<ffffffff86f83d43>] ? acpi_cpufreq_init+0x215/0x282
[ 4.827641] [<ffffffff86f83b2e>] ? cpufreq_gov_dbs_init+0xc/0xc
[ 4.827645] [<ffffffff8620046e>] ? do_one_initcall+0x9c/0x12e
[ 4.827649] [<ffffffff86f3d058>] ? kernel_init_freeable+0x190/0x22b
[ 4.827651] [<ffffffff869e59d9>] ? rest_init+0x7c/0x7c
[ 4.827653] [<ffffffff869e59e3>] ? kernel_init+0xa/0xf3
[ 4.827656] [<ffffffff869ed482>] ? ret_from_fork+0x22/0x30
On ACPI based systems, for CPUs with uninitialized rq->max_freq skip the
re-calculation.
Reference discussion on a similar issue here:
https://www.spinics.net/lists/arm-kernel/msg557612.html
Change-Id: I5025e3f6db671e9e72369f85708d3b0482d5dad2
Signed-off-by: Abhilash Kesavan <abhilash.kesavan@intel.com>
Show the high watermark of the index into the alloc->pages
array, to facilitate sizing the buffer on a per-process
basis.
Change-Id: I2b40cd16628e0ee45216c51dc9b3c5b0c862032e
Signed-off-by: Martijn Coenen <maco@android.com>
This flag determines whether the thread should currently
process the work in the thread->todo worklist.
The prime usecase for this is improving the performance
of synchronous transactions: all synchronous transactions
post a BR_TRANSACTION_COMPLETE to the calling thread,
but there's no reason to return that command to userspace
right away - userspace anyway needs to wait for the reply.
Likewise, a synchronous transaction that contains a binder
object can cause a BC_ACQUIRE/BC_INCREFS to be returned to
userspace; since the caller must anyway hold a strong/weak
ref for the duration of the call, postponing these commands
until the reply comes in is not a problem.
Note that this flag is not used to determine whether a
thread can handle process work; a thread should never pick
up process work when thread work is still pending.
Before patch:
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
BM_sendVec_binderize/4 45959 ns 20288 ns 34351
BM_sendVec_binderize/8 45603 ns 20080 ns 34909
BM_sendVec_binderize/16 45528 ns 20113 ns 34863
BM_sendVec_binderize/32 45551 ns 20122 ns 34881
BM_sendVec_binderize/64 45701 ns 20183 ns 34864
BM_sendVec_binderize/128 45824 ns 20250 ns 34576
BM_sendVec_binderize/256 45695 ns 20171 ns 34759
BM_sendVec_binderize/512 45743 ns 20211 ns 34489
BM_sendVec_binderize/1024 46169 ns 20430 ns 34081
After patch:
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
BM_sendVec_binderize/4 42939 ns 17262 ns 40653
BM_sendVec_binderize/8 42823 ns 17243 ns 40671
BM_sendVec_binderize/16 42898 ns 17243 ns 40594
BM_sendVec_binderize/32 42838 ns 17267 ns 40527
BM_sendVec_binderize/64 42854 ns 17249 ns 40379
BM_sendVec_binderize/128 42881 ns 17288 ns 40427
BM_sendVec_binderize/256 42917 ns 17297 ns 40429
BM_sendVec_binderize/512 43184 ns 17395 ns 40411
BM_sendVec_binderize/1024 43119 ns 17357 ns 40432
Signed-off-by: Martijn Coenen <maco@android.com>
Change-Id: Ia70287066d62aba64e98ac44ff1214e37ca75693
commit d124b2c53c upstream.
When the file /proc/fs/fscache/objects (available with
CONFIG_FSCACHE_OBJECT_LIST=y) is opened, we request a user key with
description "fscache:objlist", then access its payload. However, a
revoked key has a NULL payload, and we failed to check for this.
request_key() *does* skip revoked keys, but there is still a window
where the key can be revoked before we access its payload.
Fix it by checking for a NULL payload, treating it like a key which was
already revoked at the time it was requested.
Fixes: 4fbf4291aa ("FS-Cache: Allow the current state of all objects to be dumped")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 363b02dab0 upstream.
Consolidate KEY_FLAG_INSTANTIATED, KEY_FLAG_NEGATIVE and the rejection
error into one field such that:
(1) The instantiation state can be modified/read atomically.
(2) The error can be accessed atomically with the state.
(3) The error isn't stored unioned with the payload pointers.
This deals with the problem that the state is spread over three different
objects (two bits and a separate variable) and reading or updating them
atomically isn't practical, given that not only can uninstantiated keys
change into instantiated or rejected keys, but rejected keys can also turn
into instantiated keys - and someone accessing the key might not be using
any locking.
The main side effect of this problem is that what was held in the payload
may change, depending on the state. For instance, you might observe the
key to be in the rejected state. You then read the cached error, but if
the key semaphore wasn't locked, the key might've become instantiated
between the two reads - and you might now have something in hand that isn't
actually an error code.
The state is now KEY_IS_UNINSTANTIATED, KEY_IS_POSITIVE or a negative error
code if the key is negatively instantiated. The key_is_instantiated()
function is replaced with key_is_positive() to avoid confusion as negative
keys are also 'instantiated'.
Additionally, barriering is included:
(1) Order payload-set before state-set during instantiation.
(2) Order state-read before payload-read when using the key.
Further separate barriering is necessary if RCU is being used to access the
payload content after reading the payload pointers.
Fixes: 146aa8b145 ("KEYS: Merge the type-specific data with the payload data")
Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d60b5b7854 upstream.
When an fscrypt-encrypted file is opened, we request the file's master
key from the keyrings service as a logon key, then access its payload.
However, a revoked key has a NULL payload, and we failed to check for
this. request_key() *does* skip revoked keys, but there is still a
window where the key can be revoked before we acquire its semaphore.
Fix it by checking for a NULL payload, treating it like a key which was
already revoked at the time it was requested.
Fixes: 88bd6ccdcd ("ext4 crypto: add encryption key management facilities")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40214d128e upstream.
The writeback rework in commit fbcc025613 ("xfs: Introduce
writeback context for writepages") introduced a subtle change in
behavior with regard to the block mapping used across the
->writepages() sequence. The previous xfs_cluster_write() code would
only flush pages up to EOF at the time of the writepage, thus
ensuring that any pages due to file-extending writes would be
handled on a separate cycle and with a new, updated block mapping.
The updated code establishes a block mapping in xfs_writepage_map()
that could extend beyond EOF if the file has post-eof preallocation.
Because we now use the generic writeback infrastructure and pass the
cached mapping to each writepage call, there is no implicit EOF
limit in place. If eofblocks trimming occurs during ->writepages(),
any post-eof portion of the cached mapping becomes invalid. The
eofblocks code has no means to serialize against writeback because
there are no pages associated with post-eof blocks. Therefore if an
eofblocks trim occurs and is followed by a file-extending buffered
write, not only has the mapping become invalid, but we could end up
writing a page to disk based on the invalid mapping.
Consider the following sequence of events:
- A buffered write creates a delalloc extent and post-eof
speculative preallocation.
- Writeback starts and on the first writepage cycle, the delalloc
extent is converted to real blocks (including the post-eof blocks)
and the mapping is cached.
- The file is closed and xfs_release() trims post-eof blocks. The
cached writeback mapping is now invalid.
- Another buffered write appends the file with a delalloc extent.
- The concurrent writeback cycle picks up the just written page
because the writeback range end is LLONG_MAX. xfs_writepage_map()
attributes it to the (now invalid) cached mapping and writes the
data to an incorrect location on disk (and where the file offset is
still backed by a delalloc extent).
This problem is reproduced by xfstests test generic/464, which
triggers racing writes, appends, open/closes and writeback requests.
To address this problem, trim the mapping used during writeback to
within EOF when the mapping is validated. This ensures the mapping
is revalidated for any pages encountered beyond EOF as of the time
the current mapping was cached or last validated.
Reported-by: Eryu Guan <eguan@redhat.com>
Diagnosed-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 793d7dbe6d upstream.
Recently we've had warnings arise from the vm handing us pages
without bufferheads attached to them. This should not ever occur
in XFS, but we don't defend against it properly if it does. The only
place where we remove bufferheads from a page is in
xfs_vm_releasepage(), but we can't tell the difference here between
"page is dirty so don't release" and "page is dirty but is being
invalidated so release it".
In some places that are invalidating pages ask for pages to be
released and follow up afterward calling ->releasepage by checking
whether the page was dirty and then aborting the invalidation. This
is a possible vector for releasing buffers from a page but then
leaving it in the mapping, so we really do need to avoid dirty pages
in xfs_vm_releasepage().
To differentiate between invalidated pages and normal pages, we need
to clear the page dirty flag when invalidating the pages. This can
be done through xfs_vm_invalidatepage(), and will result
xfs_vm_releasepage() seeing the page as clean which matches the
bufferhead state on the page after calling block_invalidatepage().
Hence we can re-add the page dirty check in xfs_vm_releasepage to
catch the case where we might be releasing a page that is actually
dirty and so should not have the bufferheads on it removed. This
will remove one possible vector of "dirty page with no bufferheads"
and so help narrow down the search for the root cause of that
problem.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 93e8befc17 upstream.
Jason reported that a corrupted filesystem failed to replay
the log with a metadata block out of bounds warning:
XFS (dm-2): _xfs_buf_find: Block out of range: block 0x80270fff8, EOFS 0x9c40000
_xfs_buf_find() and xfs_btree_get_bufs() return NULL if
that happens, and then when xfs_alloc_fix_freelist() calls
xfs_trans_binval() on that NULL bp, we oops with:
BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8
We don't handle _xfs_buf_find errors very well, every
caller higher up the stack gets to guess at why it failed.
But we should at least handle it somehow, so return
EFSCORRUPTED here.
Reported-by: Jason L Tibbitts III <tibbs@math.uh.edu>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f35c5e10c6 upstream.
xfs_attr3_root_inactive() walks the attr fork tree to invalidate the
associated blocks. xfs_attr3_node_inactive() recursively descends
from internal blocks to leaf blocks, caching block address values
along the way to revisit parent blocks, locate the next entry and
descend down that branch of the tree.
The code that attempts to reread the parent block is unsafe because
it assumes that the local xfs_da_node_entry pointer remains valid
after an xfs_trans_brelse() and re-read of the parent buffer. Under
heavy memory pressure, it is possible that the buffer has been
reclaimed and reallocated by the time the parent block is reread.
This means that 'btree' can point to an invalid memory address, lead
to a random/garbage value for child_fsb and cause the subsequent
read of the attr fork to go off the rails and return a NULL buffer
for an attr fork offset that is most likely not allocated.
Note that this problem can be manufactured by setting
XFS_ATTR_BTREE_REF to 0 to prevent LRU caching of attr buffers,
creating a file with a multi-level attr fork and removing it to
trigger inactivation.
To address this problem, reinit the node/btree pointers to the
parent buffer after it has been re-read. This ensures btree points
to a valid record and allows the walk to proceed.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67f2ffe31d upstream.
If we get ENOSPC half way through setting the ACL, the inode mode
can still be changed even though the ACL does not exist. Reorder the
operation to only change the mode of the inode if the ACL is set
correctly.
Whilst this does not fix the problem with crash consistency (that requires
attribute addition to be a deferred op) it does prevent ENOSPC and other
non-fatal errors setting an xattr to be handled sanely.
This fixes xfstests generic/449.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb9c2e5433 upstream.
Various utility functions and interfaces that iterate internal
devices try to reference the realtime device even when RT support is
not compiled into the kernel.
Make sure this code is excluded from the CONFIG_XFS_RT=n build,
and where appropriate stub functions to return fatal errors if
they ever get called when RT support is not present.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 20413e37d7 upstream.
Prevent kmemcheck from throwing warnings about reading uninitialised
memory when formatting inodes into the incore log buffer. There are
several issues here - we don't always log all the fields in the
inode log format item, and we never log the inode the
di_next_unlinked field.
In the case of the inode log format item, this is exacerbated
by the old xfs_inode_log_format structure padding issue. Hence make
the padded, 64 bit aligned version of the structure the one we always
use for formatting the log and get rid of the 64 bit variant. This
means we'll always log the 64-bit version and so recovery only needs
to convert from the unpadded 32 bit version from older 32 bit
kernels.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e12199f85d upstream.
If we got two AIO writes into a COW area the second one might not have any
COW extents left to convert. Handle that case gracefully instead of
triggering an assert or accessing beyond the bounds of the extent list.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52bfcdd7ad upstream.
Since the CoW fork exists as a secondary data structure to the data
fork, we must always swap cow forks during swapext. We also need to
swap the extent counts and reset the cowblocks tags.
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 842f6e9f78 upstream.
My previous patch: d3a304b629 check for
XFS_LI_FAILED flag xfs_iflush done, so the failed item can be properly
resubmitted.
In the loop scanning other inodes being completed, it should check the
current item for the XFS_LI_FAILED, and not the initial one.
The state of the initial inode is checked after the loop ends
Kudos to Eric for catching this.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9789dd9e1d upstream.
We call __xfs_ag_resv_init to make a per-AG reservation for each AG.
This makes the reservation per-AG, not per-filesystem. Therefore, it
is incorrect to adjust m_ag_max_usable for each AG. Adjust it only
when we're reserving AG 0's blocks so that we only do it once per fs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee70daaba8 upstream.
Since commit d531d91d69 ("xfs: always use unwritten extents for
direct I/O writes"), we start allocating unwritten extents for all
direct writes to allow appending aio in XFS.
But for dio writes that could extend file size we update the in-core
inode size first, then convert the unwritten extents to real
allocations at dio completion time in xfs_dio_write_end_io(). Thus a
racing direct read could see the new i_size and find the unwritten
extents first and read zeros instead of actual data, if the direct
writer also takes a shared iolock.
Fix it by updating the in-core inode size after the unwritten extent
conversion. To do this, introduce a new boolean argument to
xfs_iomap_write_unwritten() to tell if we want to update in-core
i_size or not.
Suggested-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: backported to the old direct I/O code before Linux 4.10]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e150dcd459 upstream.
Use the %pS instead of the %pF printk format specifier for printing symbols
from direct addresses. This is needed for the ia64, ppc64 and parisc64
architectures.
Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3af423b034 upstream.
When we perform an finsert/fcollapse operation, cancel all the CoW
extents for the affected file offset range so that they don't end up
pointing to the wrong blocks.
Reported-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc6f77710a upstream.
If we have speculative cow preallocations hanging around in the cow
fork, don't let a truncate operation clear the reflink flag because if
we do then there's a chance we'll forget to free those extents when we
destroy the incore inode.
Reported-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Fixes upstream in a much larger set of patches that are not worth backporting
to 4.9 - gregkh]
When the space available before start of reading (cached_write_sz)
is the same as the host required space (pending_sz), we need to
still signal host.
Fixes: 433e19cf33 ("Drivers: hv: vmbus: finally fix hv_need_to_signal_on_read()")
Signed-off-by: John Starks <jon.Starks@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 68a1fdbbf8 upstream.
The ASN.1 parser does not necessarily set the sinfo field,
this patch prevents a NULL pointer dereference on broken
input.
Fixes: 99db443506 ("PKCS#7: Appropriately restrict authenticated attributes and content type")
Signed-off-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 60ff5b2f54 upstream.
Currently, when passed a key that already exists, add_key() will call the
key's ->update() method if such exists. But this is heavily broken in the
case where the key is uninstantiated because it doesn't call
__key_instantiate_and_link(). Consequently, it doesn't do most of the
things that are supposed to happen when the key is instantiated, such as
setting the instantiation state, clearing KEY_FLAG_USER_CONSTRUCT and
awakening tasks waiting on it, and incrementing key->user->nikeys.
It also never takes key_construction_mutex, which means that
->instantiate() can run concurrently with ->update() on the same key. In
the case of the "user" and "logon" key types this causes a memory leak, at
best. Maybe even worse, the ->update() methods of the "encrypted" and
"trusted" key types actually just dereference a NULL pointer when passed an
uninstantiated key.
Change key_create_or_update() to wait interruptibly for the key to finish
construction before continuing.
This patch only affects *uninstantiated* keys. For now we still allow a
negatively instantiated key to be updated (thereby positively
instantiating it), although that's broken too (the next patch fixes it)
and I'm not sure that anyone actually uses that functionality either.
Here is a simple reproducer for the bug using the "encrypted" key type
(requires CONFIG_ENCRYPTED_KEYS=y), though as noted above the bug
pertained to more than just the "encrypted" key type:
#include <stdlib.h>
#include <unistd.h>
#include <keyutils.h>
int main(void)
{
int ringid = keyctl_join_session_keyring(NULL);
if (fork()) {
for (;;) {
const char payload[] = "update user:foo 32";
usleep(rand() % 10000);
add_key("encrypted", "desc", payload, sizeof(payload), ringid);
keyctl_clear(ringid);
}
} else {
for (;;)
request_key("encrypted", "desc", "callout_info", ringid);
}
}
It causes:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: encrypted_update+0xb0/0x170
PGD 7a178067 P4D 7a178067 PUD 77269067 PMD 0
PREEMPT SMP
CPU: 0 PID: 340 Comm: reproduce Tainted: G D 4.14.0-rc1-00025-g428490e38b2e #796
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8a467a39a340 task.stack: ffffb15c40770000
RIP: 0010:encrypted_update+0xb0/0x170
RSP: 0018:ffffb15c40773de8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8a467a275b00 RCX: 0000000000000000
RDX: 0000000000000005 RSI: ffff8a467a275b14 RDI: ffffffffb742f303
RBP: ffffb15c40773e20 R08: 0000000000000000 R09: ffff8a467a275b17
R10: 0000000000000020 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8a4677057180 R15: ffff8a467a275b0f
FS: 00007f5d7fb08700(0000) GS:ffff8a467f200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000018 CR3: 0000000077262005 CR4: 00000000001606f0
Call Trace:
key_create_or_update+0x2bc/0x460
SyS_add_key+0x10c/0x1d0
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x7f5d7f211259
RSP: 002b:00007ffed03904c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000f8
RAX: ffffffffffffffda RBX: 000000003b2a7955 RCX: 00007f5d7f211259
RDX: 00000000004009e4 RSI: 00000000004009ff RDI: 0000000000400a04
RBP: 0000000068db8bad R08: 000000003b2a7955 R09: 0000000000000004
R10: 000000000000001a R11: 0000000000000246 R12: 0000000000400868
R13: 00007ffed03905d0 R14: 0000000000000000 R15: 0000000000000000
Code: 77 28 e8 64 34 1f 00 45 31 c0 31 c9 48 8d 55 c8 48 89 df 48 8d 75 d0 e8 ff f9 ff ff 85 c0 41 89 c4 0f 88 84 00 00 00 4c 8b 7d c8 <49> 8b 75 18 4c 89 ff e8 24 f8 ff ff 85 c0 41 89 c4 78 6d 49 8b
RIP: encrypted_update+0xb0/0x170 RSP: ffffb15c40773de8
CR2: 0000000000000018
Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 192cabd6a2 upstream.
digsig_verify() requests a user key, then accesses its payload.
However, a revoked key has a NULL payload, and we failed to check for
this. request_key() *does* skip revoked keys, but there is still a
window where the key can be revoked before we acquire its semaphore.
Fix it by checking for a NULL payload, treating it like a key which was
already revoked at the time it was requested.
Fixes: 051dbb918c ("crypto: digital signature verification support")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 13923d0865 upstream.
A key of type "encrypted" references a "master key" which is used to
encrypt and decrypt the encrypted key's payload. However, when we
accessed the master key's payload, we failed to handle the case where
the master key has been revoked, which sets the payload pointer to NULL.
Note that request_key() *does* skip revoked keys, but there is still a
window where the key can be revoked before we acquire its semaphore.
Fix it by checking for a NULL payload, treating it like a key which was
already revoked at the time it was requested.
This was an issue for master keys of type "user" only. Master keys can
also be of type "trusted", but those cannot be revoked.
Fixes: 7e70cb4978 ("keys: add new key-type encrypted")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: David Safford <safford@us.ibm.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b8b8b16352 upstream.
In commit 40b368af4b ("rtlwifi: Fix alignment issues"), the read
of REG_DBI_READ was changed from 16 to 8 bits. For unknown reasonsi
this change results in reduced stability for the wireless connection.
This regression was located using bisection.
Fixes: 40b368af4b ("rtlwifi: Fix alignment issues")
Reported-and-tested-by: James Cameron <quozl@laptop.org>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eb39a7c035 upstream.
The interrupt handler mfgpt_tick() is not robust versus spurious interrupts
which happen before the clock event device is registered and fully
initialized.
The reason is that the safe guard against spurious interrupts solely checks
for the clockevents shutdown state, but lacks a check for detached
state. If the interrupt hits while the device is in detached state it
passes the safe guard and dereferences the event handler call back which is
NULL.
Add the missing state check.
Fixes: 8f9327cbb6 ("clockevents/drivers/cs5535: Migrate to new 'set-state' interface")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Kozub <zub@linux.fjfi.cvut.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lkml.kernel.org/r/20171020093103.3317F6004D@linux.fjfi.cvut.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bbbd96357 upstream.
At least the Armada XP SoC supports 4GB on a single DRAM window. Because
the size register values contain the actual size - 1, the MSB is set in
that case. For example, the SDRAM window's control register's value is
0xffffffe1 for 4GB (bits 31 to 24 contain the size).
The MBUS driver reads back each window's size from registers and
calculates the actual size as (control_reg | ~DDR_SIZE_MASK) + 1, which
overflows for 32 bit values, resulting in other miscalculations further
on (a bad RAM window for the CESA crypto engine calculated by
mvebu_mbus_setup_cpu_target_nooverlap() in my case).
This patch changes the type in 'struct mbus_dram_window' from u32 to
u64, which allows us to keep using the same register calculation code in
most MBUS-using drivers (which calculate ->size - 1 again).
Fixes: fddddb52a6 ("bus: introduce an Marvell EBU MBus driver")
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c503dd38f8 upstream.
With KASAN and a couple of other patches applied, this driver is one
of the few remaining ones that actually use more than 2048 bytes of
kernel stack:
broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 'wlc_phy_workarounds_nphy_gainctrl':
broadcom/brcm80211/brcmsmac/phy/phy_n.c:16065:1: warning: the frame size of 3264 bytes is larger than 2048 bytes [-Wframe-larger-than=]
broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 'wlc_phy_workarounds_nphy':
broadcom/brcm80211/brcmsmac/phy/phy_n.c:17138:1: warning: the frame size of 2864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
Here, I'm reducing the stack size by marking as many local variables as
'static const' as I can without changing the actual code.
This is the first of three patches to improve the stack usage in this
driver. It would be good to have this backported to stabl kernels
to get all drivers in 'allmodconfig' below the 2048 byte limit so
we can turn on the frame warning again globally, but I realize that
the patch is larger than the normal limit for stable backports.
The other two patches do not need to be backported.
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dd2349121b upstream.
The length of the data in the received skb is currently passed into
brcmf_fweh_process_event() as packet_len, but this value is not checked.
event_packet should be followed by DATALEN bytes of additional event
data. Ensure that the received packet actually contains at least
DATALEN bytes of additional data, to avoid copying uninitialized memory
into event->data.
Suggested-by: Mattias Nissler <mnissler@chromium.org>
Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0fe16195f8 upstream.
AMD Family 17h uses the KERNCZ SMBus controller. While its documentation
is not publicly available, it is documented in the BIOS and Kernel
Developer’s Guide for AMD Family 15h Models 60h-6Fh Processors.
On this SMBus controller, the port select register is at PMx register
0x02, bit 4:3 (PMx00 register bit 20:19).
Without this patch, the 4 SMBus channels on AMD Family 17h chips are
mirrored and report the same chips on all channels.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6ebcedbab upstream.
Commit b6c159a9cb ("i2c: ismt: Don't duplicate the receive length for
block reads") broke I2C block reads. It aimed to fix normal SMBus block
read, but changed the correct behavior of I2C block read in the process.
According to Documentation/i2c/smbus-protocol, one vital difference
between normal SMBus block read and I2C block read is that there is no
byte count prefixed in the data sent on the wire:
SMBus Block Read: i2c_smbus_read_block_data()
S Addr Wr [A] Comm [A]
S Addr Rd [A] [Count] A [Data] A [Data] A ... A [Data] NA P
I2C Block Read: i2c_smbus_read_i2c_block_data()
S Addr Wr [A] Comm [A]
S Addr Rd [A] [Data] A [Data] A ... A [Data] NA P
Therefore the two transaction types need to be processed differently in
the driver by copying of the dma_buffer as done previously for the
I2C_SMBUS_I2C_BLOCK_DATA case.
Fixes: b6c159a9cb ("i2c: ismt: Don't duplicate the receive length for block reads")
Signed-off-by: Pontus Andersson <epontan@gmail.com>
Tested-by: Stephen Douthit <stephend@adiengineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 098a0a62c1 upstream.
The loop in snd_hdac_bus_parse_capabilities() may go to nirvana when
it hits an invalid register value read:
BUG: unable to handle kernel paging request at ffffad5dc41f3fff
IP: pci_azx_readl+0x5/0x10 [snd_hda_intel]
Call Trace:
snd_hdac_bus_parse_capabilities+0x3c/0x1f0 [snd_hda_core]
azx_probe_continue+0x7d5/0x940 [snd_hda_intel]
.....
This happened on a new Intel machine, and we need to check the value
and abort the loop accordingly.
[Note: the fixes tag below indicates only the commit where this patch
can be applied; the original problem was introduced even before that
commit]
Fixes: 6720b38420 ("ALSA: hda - move bus_parse_capabilities to core")
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>