Applications that read EFI variables may see a return
value of -EINTR if they exceed the rate limit and a
signal delivery is attempted while the process is sleeping.
This is quite surprising to the application, which probably
doesn't have code to handle it.
Change the interruptible sleep to a non-interruptible one.
Reported-by: Lennart Poettering <mzxreary@0pointer.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20200528194905.690-3-tony.luck@intel.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
kobject_init_and_add() takes reference even when it fails.
If this function returns an error, kobject_put() must be called to
properly clean up the memory associated with the object. Previous
commit "b8eb718348b8" fixed a similar problem.
Fixes: 0bb549052d ("efi: Add esrt support")
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Link: https://lore.kernel.org/r/20200528183804.4497-1-wu000273@umn.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
With CONFIG_DEBUG_VIRTUAL=y, we can hit a BUG() if we take a hard
lockup watchdog interrupt when in OPAL mode.
This happens in show_instructions() if the kernel takes the watchdog
NMI IPI, or any other interrupt, with MSR_IR == 0. show_instructions()
updates the variable pc in the loop and the second iteration will
result in BUG().
We hit the BUG_ON due the below check in __va()
#define __va(x)
({
VIRTUAL_BUG_ON((unsigned long)(x) >= PAGE_OFFSET);
(void *)(unsigned long)((phys_addr_t)(x) | PAGE_OFFSET);
})
Fix it by moving the check out of the loop. Also update nip so that
the nip == pc check still matches.
Fixes: 4dd7554a64 ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va and __pa addresses")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Use IS_ENABLED(), massage change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200524093822.423487-1-aneesh.kumar@linux.ibm.com
It is possible that the first event in the event log is not actually a
log header at all, but rather a normal event. This leads to the cast in
__calc_tpm2_event_size being an invalid conversion, which means that
the values read are effectively garbage. Depending on the first event's
contents, this leads either to apparently normal behaviour, a crash or
a freeze.
While this behaviour of the firmware is not in accordance with the
TCG Client EFI Specification, this happens on a Dell Precision 5510
with the TPM enabled but hidden from the OS ("TPM On" disabled, state
otherwise untouched). The EFI firmware claims that the TPM is present
and active and that it supports the TCG 2.0 event log format.
Fortunately, this can be worked around by simply checking the header
of the first event and the event log header signature itself.
Commit b4f1874c62 ("tpm: check event log version before reading final
events") addressed a similar issue also found on Dell models.
Fixes: 6b03261902 ("efi: Attempt to get the TCG2 event log in the boot stub")
Signed-off-by: Fabian Vogt <fvogt@suse.de>
Link: https://lore.kernel.org/r/1927248.evlx2EsYKh@linux-e202.suse.de
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1165773
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reinitialize IA32_FEAT_CTL on the BSP during wakeup to handle the case
where firmware doesn't initialize or save/restore across S3. This fixes
a bug where IA32_FEAT_CTL is left uninitialized and results in VMXON
taking a #GP due to VMX not being fully enabled, i.e. breaks KVM.
Use init_ia32_feat_ctl() to "restore" IA32_FEAT_CTL as it already deals
with the case where the MSR is locked, and because APs already redo
init_ia32_feat_ctl() during suspend by virtue of the SMP boot flow being
used to reinitialize APs upon wakeup. Do the call in the early wakeup
flow to avoid dependencies in the syscore_ops chain, e.g. simply adding
a resume hook is not guaranteed to work, as KVM does VMXON in its own
resume hook, kvm_resume(), when KVM has active guests.
Fixes: 21bd3467a5 ("KVM: VMX: Drop initialization of IA32_FEAT_CTL MSR")
Reported-by: Brad Campbell <lists2009@fnarfbargle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Brad Campbell <lists2009@fnarfbargle.com>
Cc: stable@vger.kernel.org # v5.6
Link: https://lkml.kernel.org/r/20200608174134.11157-1-sean.j.christopherson@intel.com
TEXT_OFFSET was recently changed to 0x0, in preparation for its removal
at a later stage, and a warning is emitted into the kernel log when the
bootloader appears to have failed to take the TEXT_OFFSET image header
value into account.
Ironically, this warning itself fails to take TEXT_OFFSET into account,
and compares the kernel image's alignment modulo 2M against a hardcoded
value of 0x0, and so the warning will trigger spuriously when TEXT_OFFSET
randomization is enabled.
Given the intent to get rid of TEXT_OFFSET entirely, let's fix this
oversight by just removing support for TEXT_OFFSET randomization.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20200615101939.634391-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Explain the rationale for annotating WARN(), even though, strictly
speaking printk() and friends are very much not safe in many of the
places we put them.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
The UBSAN instrumentation only inserts external CALLs when things go
'BAD', much like WARN(). So treat them similar to WARN()s for noinstr,
that is: allow them, at the risk of taking the machine down, to get
their message out.
Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Marco Elver <elver@google.com>
Adds config variable CC_HAS_WORKING_NOSANITIZE_ADDRESS, which will be
true if we have a compiler that does not fail builds due to
no_sanitize_address functions. This does not yet mean they work as
intended, but for automated build-tests, this is the minimum
requirement.
For example, we require that __always_inline functions used from
no_sanitize_address functions do not generate instrumentation. On GCC <=
7 this fails to build entirely, therefore we make the minimum version
GCC 8.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Link: https://lkml.kernel.org/r/20200602175859.GC2604@hirez.programming.kicks-ass.net
The 'noinstr' function attribute means no-instrumentation, this should
very much include *SAN. Because lots of that is broken at present,
only include KCSAN for now, as that is limited to clang11, which has
sane function attribute behaviour.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
There are no more user of this function attribute, also, with us now
actively supporting '__no_kcsan inline' it doesn't make sense to have
in any case.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Now that KCSAN relies on -tsan-distinguish-volatile we no longer need
the annotation for constant_test_bit(). Remove it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
The dt-bindings for the GSWIP describe that the node should be named
"switch". Use the same name in sysctrl.c so the GSWIP driver can
actually find the "gphy0" and "gphy1" clocks.
Fixes: 14fceff477 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Cc: stable@vger.kernel.org
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Commit
bbf8e8b0fe ("efi/libstub: Optimize for size instead of speed")
changed the optimization level for the EFI stub to -Os from -O2.
Andrey Ignatov reports that this breaks the build with gcc 4.8.5.
Testing on godbolt.org, the combination of -Os,
-fno-asynchronous-unwind-tables, and ms_abi functions doesn't work,
failing with the error:
sorry, unimplemented: ms_abi attribute requires
-maccumulate-outgoing-args or subtarget optimization implying it
This does appear to work with gcc 4.9 onwards.
Add -maccumulate-outgoing-args explicitly to unbreak the build with
pre-4.9 versions of gcc.
Reported-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200605150638.1011637-1-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
gcc-9 gets confused by the code flow in check_dirty_whitelist:
drivers/gpu/drm/i915/gt/selftest_workarounds.c: In function 'check_dirty_whitelist':
drivers/gpu/drm/i915/gt/selftest_workarounds.c:492:17: error: 'rsvd' may be used uninitialized in this function [-Werror=maybe-uninitialized]
I could not figure out a good way to do this in a way that gcc
understands better, so initialize the variable to zero, as last
resort.
Fixes: aee20aaed8 ("drm/i915: Implement read-only support in whitelist selftest")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200527140526.1458215-2-arnd@arndb.de
(cherry picked from commit cc649a9eaf)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Conditional spinlocks make it hard for gcc and for lockdep to
follow the code flow. This one causes a warning with at least
gcc-9 and higher:
In file included from include/linux/irq.h:14,
from drivers/gpu/drm/i915/i915_pmu.c:7:
drivers/gpu/drm/i915/i915_pmu.c: In function 'i915_sample':
include/linux/spinlock.h:289:3: error: 'flags' may be used uninitialized in this function [-Werror=maybe-uninitialized]
289 | _raw_spin_unlock_irqrestore(lock, flags); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/i915_pmu.c:288:17: note: 'flags' was declared here
288 | unsigned long flags;
| ^~~~~
Split out the part between the locks into a separate function
for readability and to let the compiler figure out what the
logic actually is.
Fixes: d79e1bd676 ("drm/i915/pmu: Only use exclusive mmio access for gen7")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200527140526.1458215-1-arnd@arndb.de
(cherry picked from commit 6ec81b8273)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
ttm_bo_add_move_fence() invokes dma_fence_get(), which returns a
reference of the specified dma_fence object to "fence" with increased
refcnt.
When ttm_bo_add_move_fence() returns, local variable "fence" becomes
invalid, so the refcount should be decreased to keep refcount balanced.
The reference counting issue happens in one exception handling path of
ttm_bo_add_move_fence(). When no_wait_gpu flag is equals to true, the
function forgets to decrease the refcnt increased by dma_fence_get(),
causing a refcnt leak.
Fix this issue by calling dma_fence_put() when no_wait_gpu flag is
equals to true.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/370221/
Signed-off-by: Christian König <christian.koenig@amd.com>
ttm_bo_vm_fault_reserved() invokes dma_fence_get(), which returns a
reference of the specified dma_fence object to "moving" with increased
refcnt.
When ttm_bo_vm_fault_reserved() returns, local variable "moving" becomes
invalid, so the refcount should be decreased to keep refcount balanced.
The reference counting issue happens in several exception handling paths
of ttm_bo_vm_fault_reserved(). When those error scenarios occur such as
"err" equals to -EBUSY, the function forgets to decrease the refcnt
increased by dma_fence_get(), causing a refcnt leak.
Fix this issue by calling dma_fence_put() when no_wait_gpu flag is
equals to true.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/370219/
Signed-off-by: Christian König <christian.koenig@amd.com>
Smatch reports that:
drivers/crypto/marvell/octeontx/otx_cptvf_algs.c:132 otx_cpt_aead_callback()
warn: variable dereferenced before check 'cpt_info' (see line 121)
This function is called from process_pending_queue() as:
drivers/crypto/marvell/octeontx/otx_cptvf_reqmgr.c
599 /*
600 * Call callback after current pending entry has been
601 * processed, we don't do it if the callback pointer is
602 * invalid.
603 */
604 if (callback)
605 callback(res_code, areq, cpt_info);
It does appear to me that "cpt_info" can be NULL so this could lead to
a NULL dereference.
Fixes: 10b4f09491 ("crypto: marvell - add the Virtual Function driver for CPT")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When a crypto template needs to be instantiated, CRYPTO_MSG_ALG_REQUEST
is sent to crypto_chain. cryptomgr_schedule_probe() handles this by
starting a thread to instantiate the template, then waiting for this
thread to complete via crypto_larval::completion.
This can deadlock because instantiating the template may require loading
modules, and this (apparently depending on userspace) may need to wait
for the crc-t10dif module (lib/crc-t10dif.c) to be loaded. But
crc-t10dif's module_init function uses crypto_register_notifier() and
therefore takes crypto_chain.rwsem for write. That can't proceed until
the notifier callback has finished, as it holds this semaphore for read.
Fix this by removing the wait on crypto_larval::completion from within
cryptomgr_schedule_probe(). It's actually unnecessary because
crypto_alg_mod_lookup() calls crypto_larval_wait() itself after sending
CRYPTO_MSG_ALG_REQUEST.
This only actually became a problem in v4.20 due to commit b76377543b
("crc-t10dif: Pick better transform if one becomes available"), but the
unnecessary wait was much older.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207159
Reported-by: Mike Gerow <gerow@google.com>
Fixes: 398710379f ("crypto: algapi - Move larval completion into algboss")
Cc: <stable@vger.kernel.org> # v3.6+
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reported-by: Kai Lüke <kai@kinvolk.io>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The function hisi_acc_create_sg_pool may allocate a block of
memory of size PAGE_SIZE * 2^(MAX_ORDER - 1). This value may
exceed 2^31 on ia64, which would overflow the u32.
This patch caps it at 2^31.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: d8ac7b8523 ("crypto: hisilicon - fix large sgl memory...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
pm_runtime_get_sync() increments the runtime PM usage counter even
the call returns an error code. Thus a pairing decrement is needed
on the error handling path to keep the counter balanced.
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Memory bandwidth is calculated reading the monitoring counter
at two intervals and calculating the delta. It is the software’s
responsibility to read the count often enough to avoid having
the count roll over _twice_ between reads.
The current code hardcodes the bandwidth monitoring counter's width
to 24 bits for AMD. This is due to default base counter width which
is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
to adjust the counter width. But, the AMD hardware supports much
wider bandwidth counter with the default width of 44 bits.
Kernel reads these monitoring counters every 1 second and adjusts the
counter value for overflow. With 24 bits and scale value of 64 for AMD,
it can only measure up to 1GB/s without overflowing. For the rates
above 1GB/s this will fail to measure the bandwidth.
Fix the issue setting the default width to 44 bits by adjusting the
offset.
AMD future products will implement CPUID 0xF.[ECX=1]:EAX.
[ bp: Let the line stick out and drop {}-brackets around a single
statement. ]
Fixes: 4d05bf71f1 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/159129975546.62538.5656031125604254041.stgit@naples-babu.amd.com
fix error "clock source 41 is not valid, cannot use"
[] New USB device found, idVendor=154e, idProduct=1002, bcdDevice= 1.00
[] New USB device strings: Mfr=1, Product=2, SerialNumber=0
[] Product: DCD-1500RE
[] Manufacturer: D & M Holdings Inc.
[]
[] clock source 41 is not valid, cannot use
[] usbcore: registered new interface driver snd-usb-audio
Signed-off-by: Yick W. Tse <y_w_tse@yahoo.com.hk>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/1373857985.210365.1592048406997@mail.yahoo.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
DMA_REMAP is an unnecessary requirement for AMD SEV, which requires
DMA_COHERENT_POOL, so avoid selecting it when it is otherwise unnecessary.
The only other requirement for DMA coherent pools is DMA_DIRECT_REMAP, so
ensure that properly selects the config option when needed.
Fixes: 82fef0ad81 ("x86/mm: unencrypted non-blocking DMA allocations use coherent pools")
Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Signed-off-by: David Rientjes <rientjes@google.com>
Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The crypto algorithms selected by the ESP and AH kconfig options are
out-of-date with the guidance of RFC 8221, which lists the legacy
algorithms MD5 and DES as "MUST NOT" be implemented, and some more
modern algorithms like AES-GCM and HMAC-SHA256 as "MUST" be implemented.
But the options select the legacy algorithms, not the modern ones.
Therefore, modify these options to select the MUST algorithms --
and *only* the MUST algorithms.
Also improve the help text.
Note that other algorithms may still be explicitly enabled in the
kconfig, and the choice of which to actually use is still controlled by
userspace. This change only modifies the list of algorithms for which
kernel support is guaranteed to be present.
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Suggested-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Corentin Labbe <clabbe@baylibre.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Commit f23efcbcc5 ("crypto: ctr - no longer needs CRYPTO_SEQIV") made
CRYPTO_CTR stop selecting CRYPTO_SEQIV. This breaks IPsec for most
users since GCM and several other encryption algorithms require "seqiv"
-- and RFC 8221 lists AES-GCM as "MUST" be implemented.
Just make XFRM_ESP select CRYPTO_SEQIV.
Fixes: f23efcbcc5 ("crypto: ctr - no longer needs CRYPTO_SEQIV")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Corentin Labbe <clabbe@baylibre.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Commit
10e68b02c8 ("Makefile: support compressed debug info")
added support for compressed debug sections.
Support is detected by checking
- does the compiler support -gz=zlib
- does the assembler support --compressed-debug-sections=zlib
- does the linker support --compressed-debug-sections=zlib
However, the gcc driver's support for this option is somewhat
convoluted. The driver's builtin specs are set based on the version of
binutils that it was configured with. It reports an error if the
configure-time linker/assembler (i.e., not necessarily the actual
assembler that will be run) do not support the option, but only if the
assembler (or linker) is actually invoked when -gz=zlib is passed.
The cc-option check in scripts/Kconfig.include does not invoke the
assembler, so the gcc driver reports success even if it does not support
the option being passed to the assembler.
Because the as-option check passes the option directly to the assembler
via -Wa,--compressed-debug-sections=zlib, the gcc driver does not see
this option and will never report an error.
Combined with an installed version of binutils that is more recent than
the one the compiler was built with, it is possible for all three tests
to succeed, yet an actual compilation with -gz=zlib to fail.
Moreover, it is unnecessary to explicitly pass
--compressed-debug-sections=zlib to the assembler via -Wa, since the
driver will do that automatically when it supports -gz=zlib.
Convert the as-option to just -gz=zlib, simplifying it as well as
performing a better test of the gcc driver's capabilities.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
scripts/checkpatch.pl reports following warning for patch
("bcache: check and adjust logical block size for backing devices"),
WARNING: quoted string split across lines
#146: FILE: drivers/md/bcache/super.c:896:
+ pr_info("%s: sb/logical block size (%u) greater than page size "
+ "(%lu) falling back to device logical block size (%u)",
There are two things to fix up,
- The kernel message print should be in a single line.
- pr_info() won't automatically add new line since v5.8, a '\n' should
be added.
This patch just does the above cleanup in bcache_device_init().
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch changes the asynchronous registration kworker to a delayed
kworker. There is probability queue_work() queues the async registration
kworker to the same CPU (even though very little), then the process
which writing sysfs interface to reigster bcache device may won't return
immeidately. queue_delayed_work() in this patch will delay 10 jiffies
before insert the kworker to run queue, which makes sure the registering
process may always returns to user space in time.
Fixes: 9e23ccf8f0 ("bcache: asynchronous devices registration")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It's possible for a block driver to set logical block size to
a value greater than page size incorrectly; e.g. bcache takes
the value from the superblock, set by the user w/ make-bcache.
This causes a BUG/NULL pointer dereference in the path:
__blkdev_get()
-> set_init_blocksize() // set i_blkbits based on ...
-> bdev_logical_block_size()
-> queue_logical_block_size() // ... this value
-> bdev_disk_changed()
...
-> blkdev_readpage()
-> block_read_full_page()
-> create_page_buffers() // size = 1 << i_blkbits
-> create_empty_buffers() // give size/take pointer
-> alloc_page_buffers() // return NULL
.. BUG!
Because alloc_page_buffers() is called with size > PAGE_SIZE,
thus it initializes head = NULL, skips the loop, return head;
then create_empty_buffers() gets (and uses) the NULL pointer.
This has been around longer than commit ad6bf88a6c ("block:
fix an integer overflow in logical block size"); however, it
increased the range of values that can trigger the issue.
Previously only 8k/16k/32k (on x86/4k page size) would do it,
as greater values overflow unsigned short to zero, and queue_
logical_block_size() would then use the default of 512.
Now the range with unsigned int is much larger, and users w/
the 512k value, which happened to be zero'ed previously and
work fine, started to hit this issue -- as the zero is gone,
and queue_logical_block_size() does return 512k (>PAGE_SIZE.)
Fix this by checking the bcache device's logical block size,
and if it's greater than page size, fallback to the backing/
cached device's logical page size.
This doesn't affect cache devices as those are still checked
for block/page size in read_super(); only the backing/cached
devices are not.
Apparently it's a regression from commit 2903381fce ("bcache:
Take data offset from the bdev superblock."), moving the check
into BCACHE_SB_VERSION_CDEV only. Now that we have superblocks
of backing devices out there with this larger value, we cannot
refuse to load them (i.e., have a similar check in _BDEV.)
Ideally perhaps bcache should use all values from the backing
device (physical/logical/io_min block size)? But for now just
fix the problematic case.
Test-case:
# IMG=/root/disk.img
# dd if=/dev/zero of=$IMG bs=1 count=0 seek=1G
# DEV=$(losetup --find --show $IMG)
# make-bcache --bdev $DEV --block 8k
< see dmesg >
Before:
# uname -r
5.7.0-rc7
[ 55.944046] BUG: kernel NULL pointer dereference, address: 0000000000000000
...
[ 55.949742] CPU: 3 PID: 610 Comm: bcache-register Not tainted 5.7.0-rc7 #4
...
[ 55.952281] RIP: 0010:create_empty_buffers+0x1a/0x100
...
[ 55.966434] Call Trace:
[ 55.967021] create_page_buffers+0x48/0x50
[ 55.967834] block_read_full_page+0x49/0x380
[ 55.972181] do_read_cache_page+0x494/0x610
[ 55.974780] read_part_sector+0x2d/0xaa
[ 55.975558] read_lba+0x10e/0x1e0
[ 55.977904] efi_partition+0x120/0x5a6
[ 55.980227] blk_add_partitions+0x161/0x390
[ 55.982177] bdev_disk_changed+0x61/0xd0
[ 55.982961] __blkdev_get+0x350/0x490
[ 55.983715] __device_add_disk+0x318/0x480
[ 55.984539] bch_cached_dev_run+0xc5/0x270
[ 55.986010] register_bcache.cold+0x122/0x179
[ 55.987628] kernfs_fop_write+0xbc/0x1a0
[ 55.988416] vfs_write+0xb1/0x1a0
[ 55.989134] ksys_write+0x5a/0xd0
[ 55.989825] do_syscall_64+0x43/0x140
[ 55.990563] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 55.991519] RIP: 0033:0x7f7d60ba3154
...
After:
# uname -r
5.7.0.bcachelbspgsz
[ 31.672460] bcache: bcache_device_init() bcache0: sb/logical block size (8192) greater than page size (4096) falling back to device logical block size (512)
[ 31.675133] bcache: register_bdev() registered backing device loop0
# grep ^ /sys/block/bcache0/queue/*_block_size
/sys/block/bcache0/queue/logical_block_size:512
/sys/block/bcache0/queue/physical_block_size:8192
Reported-by: Ryan Finnie <ryan@finnie.org>
Reported-by: Sebastian Marsching <sebastian@marsching.com>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
coccicheck reports:
drivers/md//bcache/btree.c:1538:1-7: preceding lock on line 1417
In btree_gc_coalesce func, if the coalescing process fails, we will goto
to out_nocoalesce tag directly without releasing new_nodes[i]->write_lock.
Then, it will cause a deadlock when trying to acquire new_nodes[i]->
write_lock for freeing new_nodes[i] before return.
btree_gc_coalesce func details as follows:
if alloc new_nodes[i] fails:
goto out_nocoalesce;
// obtain new_nodes[i]->write_lock
mutex_lock(&new_nodes[i]->write_lock)
// main coalescing process
for (i = nodes - 1; i > 0; --i)
[snipped]
if coalescing process fails:
// Here, directly goto out_nocoalesce
// tag will cause a deadlock
goto out_nocoalesce;
[snipped]
// release new_nodes[i]->write_lock
mutex_unlock(&new_nodes[i]->write_lock)
// coalesing succ, return
return;
out_nocoalesce:
btree_node_free(new_nodes[i]) // free new_nodes[i]
// obtain new_nodes[i]->write_lock
mutex_lock(&new_nodes[i]->write_lock);
// set flag for reuse
clear_bit(BTREE_NODE_dirty, &ew_nodes[i]->flags);
// release new_nodes[i]->write_lock
mutex_unlock(&new_nodes[i]->write_lock);
To fix the problem, we add a new tag 'out_unlock_nocoalesce' for
releasing new_nodes[i]->write_lock before out_nocoalesce tag. If
coalescing process fails, we will go to out_unlock_nocoalesce tag
for releasing new_nodes[i]->write_lock before free new_nodes[i] in
out_nocoalesce tag.
(Coly Li helps to clean up commit log format.)
Fixes: 2a285686c1 ("bcache: btree locking rework")
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The NSP SoC includes an SP804 timer so should be enabled here.
Fixes: a0efb0d28b ("ARM: dts: NSP: Add SP804 Support to DT")
Signed-off-by: Matthew Hagan <mnhagan88@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Pull SafeSetID update from Micah Morton:
"Add additional LSM hooks for SafeSetID
SafeSetID is capable of making allow/deny decisions for set*uid calls
on a system, and we want to add similar functionality for set*gid
calls.
The work to do that is not yet complete, so probably won't make it in
for v5.8, but we are looking to get this simple patch in for v5.8
since we have it ready.
We are planning on the rest of the work for extending the SafeSetID
LSM being merged during the v5.9 merge window"
* tag 'LSM-add-setgid-hook-5.8-author-fix' of git://github.com/micah-morton/linux:
security: Add LSM hooks to set*gid syscalls
The SafeSetID LSM uses the security_task_fix_setuid hook to filter
set*uid() syscalls according to its configured security policy. In
preparation for adding analagous support in the LSM for set*gid()
syscalls, we add the requisite hook here. Tested by putting print
statements in the security_task_fix_setgid hook and seeing them get hit
during kernel boot.
Signed-off-by: Thomas Cedeno <thomascedeno@google.com>
Signed-off-by: Micah Morton <mortonm@chromium.org>