linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-09 04:10:18 +09:00

Author	SHA1	Message	Date
Filipe Manana	78748f2491	Btrfs: fix infinite loop during fsync after rename operations commit `b5e4ff9d46` upstream. Recently fsstress (from fstests) sporadically started to trigger an infinite loop during fsync operations. This turned out to be because support for the rename exchange and whiteout operations was added to fsstress in fstests. These operations, unlike any others in fsstress, cause file names to be reused, whence triggering this issue. However it's not necessary to use rename exchange and rename whiteout operations trigger this issue, simple rename operations and file creations are enough to trigger the issue. The issue boils down to when we are logging inodes that conflict (that had the name of any inode we need to log during the fsync operation), we keep logging them even if they were already logged before, and after that we check if there's any other inode that conflicts with them and then add it again to the list of inodes to log. Skipping already logged inodes fixes the issue. Consider the following example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/testdir # inode 257 $ touch /mnt/testdir/zz # inode 258 $ ln /mnt/testdir/zz /mnt/testdir/zz_link $ touch /mnt/testdir/a # inode 259 $ sync # The following 3 renames achieve the same result as a rename exchange # operation (<rename_exchange> /mnt/testdir/zz_link to /mnt/testdir/a). $ mv /mnt/testdir/a /mnt/testdir/a/tmp $ mv /mnt/testdir/zz_link /mnt/testdir/a $ mv /mnt/testdir/a/tmp /mnt/testdir/zz_link # The following rename and file creation give the same result as a # rename whiteout operation (<rename_whiteout> zz to a2). $ mv /mnt/testdir/zz /mnt/testdir/a2 $ touch /mnt/testdir/zz # inode 260 $ xfs_io -c fsync /mnt/testdir/zz --> results in the infinite loop The following steps happen: 1) When logging inode 260, we find that its reference named "zz" was used by inode 258 in the previous transaction (through the commit root), so inode 258 is added to the list of conflicting indoes that need to be logged; 2) After logging inode 258, we find that its reference named "a" was used by inode 259 in the previous transaction, and therefore we add inode 259 to the list of conflicting inodes to be logged; 3) After logging inode 259, we find that its reference named "zz_link" was used by inode 258 in the previous transaction - we add inode 258 to the list of conflicting inodes to log, again - we had already logged it before at step 3. After logging it again, we find again that inode 259 conflicts with him, and we add again 259 to the list, etc - we end up repeating all the previous steps. So fix this by skipping logging of conflicting inodes that were already logged. Fixes: `6b5fc433a7` ("Btrfs: fix fsync after succession of renames of different files") CC: stable@vger.kernel.org # 5.1+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:33 -08:00
Filipe Manana	79a29dee90	Btrfs: make deduplication with range including the last block work commit `831d2fa25a` upstream. Since btrfs was migrated to use the generic VFS helpers for clone and deduplication, it stopped allowing for the last block of a file to be deduplicated when the source file size is not sector size aligned (when eof is somewhere in the middle of the last block). There are two reasons for that: 1) The generic code always rounds down, to a multiple of the block size, the range's length for deduplications. This means we end up never deduplicating the last block when the eof is not block size aligned, even for the safe case where the destination range's end offset matches the destination file's size. That rounding down operation is done at generic_remap_check_len(); 2) Because of that, the btrfs specific code does not expect anymore any non-aligned range length's for deduplication and therefore does not work if such nona-aligned length is given. This patch addresses that second part, and it depends on a patch that fixes generic_remap_check_len(), in the VFS, which was submitted ealier and has the following subject: "fs: allow deduplication of eof block into the end of the destination file" These two patches address reports from users that started seeing lower deduplication rates due to the last block never being deduplicated when the file size is not aligned to the filesystem's block size. Link: https://lore.kernel.org/linux-btrfs/2019-1576167349.500456@svIo.N5dq.dFFD/ CC: stable@vger.kernel.org # 5.1+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:33 -08:00
Filipe Manana	ddb36ab79b	Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES commit `0e56315ca1` upstream. When using the NO_HOLES feature, if we punch a hole into a file and then fsync it, there are cases where a subsequent fsync will miss the fact that a hole was punched, resulting in the holes not existing after replaying the log tree. Essentially these cases all imply that, tree-log.c:copy_items(), is not invoked for the leafs that delimit holes, because nothing changed those leafs in the current transaction. And it's precisely copy_items() where we currenly detect and log holes, which works as long as the holes are between file extent items in the input leaf or between the beginning of input leaf and the previous leaf or between the last item in the leaf and the next leaf. First example where we miss a hole: ) The extent items of the inode span multiple leafs; ) The punched hole covers a range that affects only the extent items of the first leaf; ) The fsync operation is done in full mode (BTRFS_INODE_NEEDS_FULL_SYNC is set in the inode's runtime flags). That results in the hole not existing after replaying the log tree. For example, if the fs/subvolume tree has the following layout for a particular inode: Leaf N, generation 10: [ ... INODE_ITEM INODE_REF EXTENT_ITEM (0 64K) EXTENT_ITEM (64K 128K) ] Leaf N + 1, generation 10: [ EXTENT_ITEM (128K 64K) ... ] If at transaction 11 we punch a hole coverting the range [0, 128K[, we end up dropping the two extent items from leaf N, but we don't touch the other leaf, so we end up in the following state: Leaf N, generation 11: [ ... INODE_ITEM INODE_REF ] Leaf N + 1, generation 10: [ EXTENT_ITEM (128K 64K) ... ] A full fsync after punching the hole will only process leaf N because it was modified in the current transaction, but not leaf N + 1, since it was not modified in the current transaction (generation 10 and not 11). As a result the fsync will not log any holes, because it didn't process any leaf with extent items. Second example where we will miss a hole: ) An inode as its items spanning 5 (or more) leafs; *) A hole is punched and it covers only the extents items of the 3rd leaf. This resulsts in deleting the entire leaf and not touching any of the other leafs. So the only leaf that is modified in the current transaction, when punching the hole, is the first leaf, which contains the inode item. During the full fsync, the only leaf that is passed to copy_items() is that first leaf, and that's not enough for the hole detection code in copy_items() to determine there's a hole between the last file extent item in the 2nd leaf and the first file extent item in the 3rd leaf (which was the 4th leaf before punching the hole). Fix this by scanning all leafs and punch holes as necessary when doing a full fsync (less common than a non-full fsync) when the NO_HOLES feature is enabled. The lack of explicit file extent items to mark holes makes it necessary to scan existing extents to determine if holes exist. A test case for fstests follows soon. Fixes: `16e7549f04` ("Btrfs: incompatible format change to remove hole extents") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:33 -08:00
Eric Biggers	f0edd3abee	ext4: fix race conditions in ->d_compare() and ->d_hash() commit `ec772f0130` upstream. Since ->d_compare() and ->d_hash() can be called in RCU-walk mode, ->d_parent and ->d_inode can be concurrently modified, and in particular, ->d_inode may be changed to NULL. For ext4_d_hash() this resulted in a reproducible NULL dereference if a lookup is done in a directory being deleted, e.g. with: int main() { if (fork()) { for (;;) { mkdir("subdir", 0700); rmdir("subdir"); } } else { for (;;) access("subdir/file", 0); } } ... or by running the 't_encrypted_d_revalidate' program from xfstests. Both repros work in any directory on a filesystem with the encoding feature, even if the directory doesn't actually have the casefold flag. I couldn't reproduce a crash in ext4_d_compare(), but it appears that a similar crash is possible there. Fix these bugs by reading ->d_parent and ->d_inode using READ_ONCE() and falling back to the case sensitive behavior if the inode is NULL. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: `b886ee3e77` ("ext4: Support case-insensitive file name lookups") Cc: <stable@vger.kernel.org> # v5.2+ Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20200124041234.159740-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:32 -08:00
Eric Biggers	d44fa04f08	ext4: fix deadlock allocating crypto bounce page from mempool commit `547c556f4d` upstream. ext4_writepages() on an encrypted file has to encrypt the data, but it can't modify the pagecache pages in-place, so it encrypts the data into bounce pages and writes those instead. All bounce pages are allocated from a mempool using GFP_NOFS. This is not correct use of a mempool, and it can deadlock. This is because GFP_NOFS includes __GFP_DIRECT_RECLAIM, which enables the "never fail" mode for mempool_alloc() where a failed allocation will fall back to waiting for one of the preallocated elements in the pool. But since this mode is used for all a bio's pages and not just the first, it can deadlock waiting for pages already in the bio to be freed. This deadlock can be reproduced by patching mempool_alloc() to pretend that pool->alloc() always fails (so that it always falls back to the preallocations), and then creating an encrypted file of size > 128 KiB. Fix it by only using GFP_NOFS for the first page in the bio. For subsequent pages just use GFP_NOWAIT, and if any of those fail, just submit the bio and start a new one. This will need to be fixed in f2fs too, but that's less straightforward. Fixes: `c9af28fdd4` ("ext4 crypto: don't let data integrity writebacks fail with ENOMEM") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20191231181149.47619-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:32 -08:00
Vasily Averin	b19f130269	jbd2_seq_info_next should increase position index commit `1a8e9cf40c` upstream. if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. Script below generates endless output $ q=;while read -r r;do echo "$((++q)) $r";done </proc/fs/jbd2/DEV/info https://bugzilla.kernel.org/show_bug.cgi?id=206283 Fixes: `1f4aace60b` ("fs/seq_file.c: simplify seq_file iteration code and interface") Cc: stable@kernel.org Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/d13805e5-695e-8ac3-b678-26ca2313629f@virtuozzo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:32 -08:00
Trond Myklebust	6282102dbc	nfsd: fix filecache lookup commit `28c7d86bb6` upstream. If the lookup keeps finding a nfsd_file with an unhashed open file, then retry once only. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org Fixes: `65294c1f2c` "nfsd: add a new struct file caching facility to nfsd" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:32 -08:00
Trond Myklebust	4544a69124	NFS: Directory page cache pages need to be locked when read commit `114de38225` upstream. When a NFS directory page cache page is removed from the page cache, its contents are freed through a call to nfs_readdir_clear_array(). To prevent the removal of the page cache entry until after we've finished reading it, we must take the page lock. Fixes: `11de3b11e0` ("NFS: Fix a memory leak in nfs_readdir") Cc: stable@vger.kernel.org # v2.6.37+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:32 -08:00
Trond Myklebust	293cdcd89b	NFS: Fix memory leaks and corruption in readdir commit `4b310319c6` upstream. nfs_readdir_xdr_to_array() must not exit without having initialised the array, so that the page cache deletion routines can safely call nfs_readdir_clear_array(). Furthermore, we should ensure that if we exit nfs_readdir_filler() with an error, we free up any page contents to prevent a leak if we try to fill the page again. Fixes: `11de3b11e0` ("NFS: Fix a memory leak in nfs_readdir") Cc: stable@vger.kernel.org # v2.6.37+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:32 -08:00
Arun Easi	8d313c04b4	scsi: qla2xxx: Fix unbound NVME response length commit `00fe717ee1` upstream. On certain cases when response length is less than 32, NVME response data is supplied inline in IOCB. This is indicated by some combination of state flags. There was an instance when a high, and incorrect, response length was indicated causing driver to overrun buffers. Fix this by checking and limiting the response payload length. Fixes: `7401bc18d1` ("scsi: qla2xxx: Add FC-NVMe command handling") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200124045014.23554-1-hmadhani@marvell.com Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:32 -08:00
Michael Ellerman	246a54895a	powerpc/futex: Fix incorrect user access blocking commit `9dc086f1e9` upstream. The early versions of our kernel user access prevention (KUAP) were written by Russell and Christophe, and didn't have separate read/write access. At some point I picked up the series and added the read/write access, but I failed to update the usages in futex.h to correctly allow read and write. However we didn't notice because of another bug which was causing the low-level code to always enable read and write. That bug was fixed recently in commit `1d8f739b07` ("powerpc/kuap: Fix set direction in allow/prevent_user_access()"). futex_atomic_cmpxchg_inatomic() is passed the user address as %3 and does: 1: lwarx %1, 0, %3 cmpw 0, %1, %4 bne- 3f 2: stwcx. %5, 0, %3 Which clearly loads and stores from/to %3. The logic in arch_futex_atomic_op_inuser() is similar, so fix both of them to use allow_read_write_user(). Without this fix, and with PPC_KUAP_DEBUG=y, we see eg: Bug: Read fault blocked by AMR! WARNING: CPU: 94 PID: 149215 at arch/powerpc/include/asm/book3s/64/kup-radix.h:126 __do_page_fault+0x600/0xf30 CPU: 94 PID: 149215 Comm: futex_requeue_p Tainted: G W 5.5.0-rc7-gcc9x-g4c25df5640ae #1 ... NIP [c000000000070680] __do_page_fault+0x600/0xf30 LR [c00000000007067c] __do_page_fault+0x5fc/0xf30 Call Trace: [c00020138e5637e0] [c00000000007067c] __do_page_fault+0x5fc/0xf30 (unreliable) [c00020138e5638c0] [c00000000000ada8] handle_page_fault+0x10/0x30 --- interrupt: 301 at cmpxchg_futex_value_locked+0x68/0xd0 LR = futex_lock_pi_atomic+0xe0/0x1f0 [c00020138e563bc0] [c000000000217b50] futex_lock_pi_atomic+0x80/0x1f0 (unreliable) [c00020138e563c30] [c00000000021b668] futex_requeue+0x438/0xb60 [c00020138e563d60] [c00000000021c6cc] do_futex+0x1ec/0x2b0 [c00020138e563d90] [c00000000021c8b8] sys_futex+0x128/0x200 [c00020138e563e20] [c00000000000b7ac] system_call+0x5c/0x68 Fixes: `de78a9c42a` ("powerpc: Add a framework for Kernel Userspace Access Protection") Cc: stable@vger.kernel.org # v5.2+ Reported-by: syzbot+e808452bad7c375cbee6@syzkaller-ppc64.appspotmail.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Link: https://lore.kernel.org/r/20200207122145.11928-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:31 -08:00
Chuhong Yuan	eee7a67c03	crypto: picoxcell - adjust the position of tasklet_init and fix missed tasklet_kill commit `7f8c36fe9b` upstream. Since tasklet is needed to be initialized before registering IRQ handler, adjust the position of tasklet_init to fix the wrong order. Besides, to fix the missed tasklet_kill, this patch adds a helper function and uses devm_add_action to kill the tasklet automatically. Fixes: `ce92136843` ("crypto: picoxcell - add support for the picoxcell crypto engines") Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:31 -08:00
Herbert Xu	e057d64f86	crypto: api - Fix race condition in crypto_spawn_alg commit `73669cc556` upstream. The function crypto_spawn_alg is racy because it drops the lock before shooting the dying algorithm. The algorithm could disappear altogether before we shoot it. This patch fixes it by moving the shooting into the locked section. Fixes: `6bfd48096f` ("[CRYPTO] api: Added spawns") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:31 -08:00
Tudor Ambarus	12a15e1c54	crypto: atmel-aes - Fix counter overflow in CTR mode commit `781a08d974` upstream. 32 bit counter is not supported by neither of our AES IPs, all implement a 16 bit block counter. Drop the 32 bit block counter logic. Fixes: `fcac83656a` ("crypto: atmel-aes - fix the counter overflow in CTR mode") Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:31 -08:00
Herbert Xu	2c4d8203ff	crypto: pcrypt - Do not clear MAY_SLEEP flag in original request commit `e8d998264b` upstream. We should not be modifying the original request's MAY_SLEEP flag upon completion. It makes no sense to do so anyway. Reported-by: Eric Biggers <ebiggers@kernel.org> Fixes: `5068c7a883` ("crypto: pcrypt - Add pcrypt crypto...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:31 -08:00
Ard Biesheuvel	ded7c73a2b	crypto: arm64/ghash-neon - bump priority to 150 commit `5441c6507b` upstream. The SIMD based GHASH implementation for arm64 is typically much faster than the generic one, and doesn't use any lookup tables, so it is clearly preferred when available. So bump the priority to reflect that. Fixes: `5a22b198cd` ("crypto: arm64/ghash - register PMULL variants ...") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:31 -08:00
Ard Biesheuvel	3a35871603	crypto: ccp - set max RSA modulus size for v3 platform devices as well commit `11548f5a57` upstream. AMD Seattle incorporates a non-PCI version of the v3 CCP crypto accelerator, and this version was left behind when the maximum RSA modulus size was parameterized in order to support v5 hardware which supports larger moduli than v3 hardware does. Due to this oversight, RSA acceleration no longer works at all on these systems. Fix this by setting the .rsamax property to the appropriate value for v3 platform hardware. Fixes: `e28c190db6` ("csrypto: ccp - Expand RSA support for a v5 ccp") Cc: Gary R Hook <gary.hook@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:30 -08:00
Jonathan Cameron	58d8f2dec6	crypto: hisilicon - Use the offset fields in sqe to avoid need to split scatterlists commit `484a897ffa` upstream. We can configure sgl offset fields in ZIP sqe to let ZIP engine read/write sgl data with skipped data. Hence no need to splite the sgl. Fixes: `62c455ca85` (crypto: hisilicon - add HiSilicon ZIP accelerator support) Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:30 -08:00
Herbert Xu	a791fc62a5	crypto: api - fix unexpectedly getting generic implementation commit `2bbb3375d9` upstream. When CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y, the first lookup of an algorithm that needs to be instantiated using a template will always get the generic implementation, even when an accelerated one is available. This happens because the extra self-tests for the accelerated implementation allocate the generic implementation for comparison purposes, and then crypto_alg_tested() for the generic implementation "fulfills" the original request (i.e. sets crypto_larval::adult). This patch fixes this by only fulfilling the original request if we are currently the best outstanding larval as judged by the priority. If we're not the best then we will ask all waiters on that larval request to retry the lookup. Note that this patch introduces a behaviour change when the module providing the new algorithm is unregistered during the process. Previously we would have failed with ENOENT, after the patch we will instead redo the lookup. Fixes: `9a8a6b3f09` ("crypto: testmgr - fuzz hashes against...") Fixes: `d435e10e67` ("crypto: testmgr - fuzz skciphers against...") Fixes: `40153b10d9` ("crypto: testmgr - fuzz AEADs against...") Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:30 -08:00
Lorenz Bauer	1f5f3f65f9	selftests: bpf: Ignore FIN packets for reuseport tests commit `8bec4f665e` upstream. The reuseport tests currently suffer from a race condition: FIN packets count towards DROP_ERR_SKB_DATA, since they don't contain a valid struct cmd. Tests will spuriously fail depending on whether check_results is called before or after the FIN is processed. Exit the BPF program early if FIN is set. Fixes: `91134d849a` ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200124112754.19664-3-lmb@cloudflare.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:30 -08:00
Lorenz Bauer	44a522bf5e	selftests: bpf: Use a temporary file in test_sockmap commit `c31dbb1e41` upstream. Use a proper temporary file for sendpage tests. This means that running the tests doesn't clutter the working directory, and allows running the test on read-only filesystems. Fixes: `16962b2404` ("bpf: sockmap, add selftests") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200124112754.19664-2-lmb@cloudflare.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:30 -08:00
Hangbin Liu	da43712a72	selftests/bpf: Skip perf hw events test if the setup disabled it commit `f1c3656c6d` upstream. The same with commit `4e59afbbed` ("selftests/bpf: skip nmi test when perf hw events are disabled"), it would make more sense to skip the test_stacktrace_build_id_nmi test if the setup (e.g. virtual machines) has disabled hardware perf events. Fixes: `13790d1cc7` ("bpf: add selftest for stackmap with build_id in NMI context") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200117100656.10359-1-liuhangbin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:30 -08:00
Alexei Starovoitov	49437ecf9f	selftests/bpf: Fix test_attach_probe commit `580205dd4f` upstream. Fix two issues in test_attach_probe: 1. it was not able to parse /proc/self/maps beyond the first line, since %s means parse string until white space. 2. offset has to be accounted for otherwise uprobed address is incorrect. Fixes: `1e8611bbdf` ("selftests/bpf: add kprobe/uprobe selftests") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191219020442.1922617-1-ast@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:29 -08:00
Jesper Dangaard Brouer	c0ada6ad3e	samples/bpf: Xdp_redirect_cpu fix missing tracepoint attach commit `f9e6bfdbaf` upstream. When sample xdp_redirect_cpu was converted to use libbpf, the tracepoints used by this sample were not getting attached automatically like with bpf_load.c. The BPF-maps was still getting loaded, thus nobody notice that the tracepoints were not updating these maps. This fix doesn't use the new skeleton code, as this bug was introduced in v5.1 and stable might want to backport this. E.g. Red Hat QA uses this sample as part of their testing. Fixes: `bbaf6029c4` ("samples/bpf: Convert XDP samples to libbpf usage") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157685877642.26195.2798780195186786841.stgit@firesoul Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:29 -08:00
Toke Høiland-Jørgensen	a69af866bd	samples/bpf: Don't try to remove user's homedir on clean commit `b2e5e93ae8` upstream. The 'clean' rule in the samples/bpf Makefile tries to remove backup files (ending in ~). However, if no such files exist, it will instead try to remove the user's home directory. While the attempt is mostly harmless, it does lead to a somewhat scary warning like this: rm: cannot remove '~': Is a directory Fix this by using find instead of shell expansion to locate any actual backup files that need to be removed. Fixes: `b62a796c10` ("samples/bpf: allow make to be run from samples/bpf/ directory") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/157952560126.1683545.7273054725976032511.stgit@toke.dk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:29 -08:00
Davide Caratti	fbee8f6174	tc-testing: fix eBPF tests failure on linux fresh clones commit `7145fcfffe` upstream. when the following command is done on a fresh clone of the kernel tree, [root@f31 tc-testing]# ./tdc.py -c bpf test cases that need to build the eBPF sample program fail systematically, because 'buildebpfPlugin' is unable to install the kernel headers (i.e, the 'khdr' target fails). Pass the correct environment to 'make', in place of ENVIR, to allow running these tests. Fixes: `4c2d39bd40` ("tc-testing: use a plugin to build eBPF program") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:29 -08:00
Andrii Nakryiko	f7a2ccc00a	libbpf: Fix realloc usage in bpf_core_find_cands commit `35b9211c0a` upstream. Fix bug requesting invalid size of reallocated array when constructing CO-RE relocation candidate list. This can cause problems if there are many potential candidates and a very fine-grained memory allocator bucket sizes are used. Fixes: `ddc7c30426` ("libbpf: implement BPF CO-RE offset relocation algorithm") Reported-by: William Smith <williampsmith@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200124201847.212528-1-andriin@fb.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:29 -08:00
Amol Grover	ab48c14a44	bpf, devmap: Pass lockdep expression to RCU lists commit `485ec2ea9c` upstream. head is traversed using hlist_for_each_entry_rcu outside an RCU read-side critical section but under the protection of dtab->index_lock. Hence, add corresponding lockdep expression to silence false-positive lockdep warnings, and harden RCU lists. Fixes: `6f9d451ab1` ("xdp: Add devmap_hash map type for looking up devices by hashed index") Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20200123120437.26506-1-frextrite@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:29 -08:00
Andrii Nakryiko	77bb53cb09	selftests/bpf: Fix perf_buffer test on systems w/ offline CPUs commit `91cbdf740a` upstream. Fix up perf_buffer.c selftest to take into account offline/missing CPUs. Fixes: `ee5cf82ce0` ("selftests/bpf: test perf buffer API") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212013621.1691858-1-andriin@fb.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:29 -08:00
Björn Töpel	5b0e9b563c	riscv, bpf: Fix broken BPF tail calls commit `f1003b787c` upstream. The BPF JIT incorrectly clobbered the a0 register, and did not flag usage of s5 register when BPF stack was being used. Fixes: `2353ecc6f9` ("bpf, riscv: add BPF JIT for RV64G") Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191216091343.23260-2-bjorn.topel@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:28 -08:00
Nikolay Borisov	f3107a3c9b	btrfs: Handle another split brain scenario with metadata uuid feature commit `0584071014` upstream. There is one more cases which isn't handled by the original metadata uuid work. Namely, when a filesystem has METADATA_UUID incompat bit and the user decides to change the FSID to the original one e.g. have metadata_uuid and fsid match. In case of power failure while this operation is in progress we could end up in a situation where some of the disks have the incompat bit removed and the other half have both METADATA_UUID_INCOMPAT and FSID_CHANGING_IN_PROGRESS flags. This patch handles the case where a disk that has successfully changed its FSID such that it equals METADATA_UUID is scanned first. Subsequently when a disk with both METADATA_UUID_INCOMPAT/FSID_CHANGING_IN_PROGRESS flags is scanned find_fsid_changed won't be able to find an appropriate btrfs_fs_devices. This is done by extending find_fsid_changed to correctly find btrfs_fs_devices whose metadata_uuid/fsid are the same and they match the metadata_uuid of the currently scanned device. Fixes: `cc5de4e702` ("btrfs: Handle final split-brain possibility during fsid change") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reported-by: Su Yue <Damenly_Su@gmx.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:28 -08:00
Josef Bacik	dd9837259d	btrfs: fix improper setting of scanned for range cyclic write cache pages commit `556755a8a9` upstream. We noticed that we were having regular CG OOM kills in cases where there was still enough dirty pages to avoid OOM'ing. It turned out there's this corner case in btrfs's handling of range_cyclic where files that were being redirtied were not getting fully written out because of how we do range_cyclic writeback. We unconditionally were setting scanned = 1; the first time we found any pages in the inode. This isn't actually what we want, we want it to be set if we've scanned the entire file. For range_cyclic we could be starting in the middle or towards the end of the file, so we could write one page and then not write any of the other dirty pages in the file because we set scanned = 1. Fix this by not setting scanned = 1 if we find pages. The rules for setting scanned should be 1) !range_cyclic. In this case we have a specified range to write out. 2) range_cyclic && index == 0. In this case we've started at the beginning and there is no need to loop around a second time. 3) range_cyclic && we started at index > 0 and we've reached the end of the file without satisfying our nr_to_write. This patch fixes both of our writepages implementations to make sure these rules hold true. This fixed our over zealous CG OOMs in production. Fixes: `d1310b2e0c` ("Btrfs: Split the extent_map code into two parts") Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add comment ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:28 -08:00
Herbert Xu	b4c8ed0bf9	crypto: pcrypt - Avoid deadlock by using per-instance padata queues commit `bbefa1dd6a` upstream. If the pcrypt template is used multiple times in an algorithm, then a deadlock occurs because all pcrypt instances share the same padata_instance, which completes requests in the order submitted. That is, the inner pcrypt request waits for the outer pcrypt request while the outer request is already waiting for the inner. This patch fixes this by allocating a set of queues for each pcrypt instance instead of using two global queues. In order to maintain the existing user-space interface, the pinst structure remains global so any sysfs modifications will apply to every pcrypt instance. Note that when an update occurs we have to allocate memory for every pcrypt instance. Should one of the allocations fail we will abort the update without rolling back changes already made. The new per-instance data structure is called padata_shell and is essentially a wrapper around parallel_data. Reproducer: #include <linux/if_alg.h> #include <sys/socket.h> #include <unistd.h> int main() { struct sockaddr_alg addr = { .salg_type = "aead", .salg_name = "pcrypt(pcrypt(rfc4106-gcm-aesni))" }; int algfd, reqfd; char buf[32] = { 0 }; algfd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(algfd, (void *)&addr, sizeof(addr)); setsockopt(algfd, SOL_ALG, ALG_SET_KEY, buf, 20); reqfd = accept(algfd, 0, 0); write(reqfd, buf, 32); read(reqfd, buf, 16); } Reported-by: syzbot+56c7151cad94eec37c521f0e47d2eee53f9361c4@syzkaller.appspotmail.com Fixes: `5068c7a883` ("crypto: pcrypt - Add pcrypt crypto parallelization wrapper") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:28 -08:00
Steven Rostedt (VMware)	c8e9dafe66	ftrace: Protect ftrace_graph_hash with ftrace_sync [ Upstream commit `54a16ff6f2` ] As function_graph tracer can run when RCU is not "watching", it can not be protected by synchronize_rcu() it requires running a task on each CPU before it can be freed. Calling schedule_on_each_cpu(ftrace_sync) needs to be used. Link: https://lore.kernel.org/r/20200205131110.GT2935@paulmck-ThinkPad-P72 Cc: stable@vger.kernel.org Fixes: `b9b0c831be` ("ftrace: Convert graph filter to use hash tables") Reported-by: "Paul E. McKenney" <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:35:28 -08:00
Steven Rostedt (VMware)	6a652ed941	ftrace: Add comment to why rcu_dereference_sched() is open coded [ Upstream commit `16052dd5bd` ] Because the function graph tracer can execute in sections where RCU is not "watching", the rcu_dereference_sched() for the has needs to be open coded. This is fine because the RCU "flavor" of the ftrace hash is protected by its own RCU handling (it does its own little synchronization on every CPU and does not rely on RCU sched). Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:35:28 -08:00
Amol Grover	c9dc142b39	tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu [ Upstream commit `fd0e6852c4` ] Fix following instances of sparse error kernel/trace/ftrace.c:5667:29: error: incompatible types in comparison kernel/trace/ftrace.c:5813:21: error: incompatible types in comparison kernel/trace/ftrace.c:5868:36: error: incompatible types in comparison kernel/trace/ftrace.c:5870:25: error: incompatible types in comparison Use rcu_dereference_protected to dereference the newly annotated pointer. Link: http://lkml.kernel.org/r/20200205055701.30195-1-frextrite@gmail.com Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:35:27 -08:00
Amol Grover	024537c754	tracing: Annotate ftrace_graph_hash pointer with __rcu [ Upstream commit `24a9729f83` ] Fix following instances of sparse error kernel/trace/ftrace.c:5664:29: error: incompatible types in comparison kernel/trace/ftrace.c:5785:21: error: incompatible types in comparison kernel/trace/ftrace.c:5864:36: error: incompatible types in comparison kernel/trace/ftrace.c:5866:25: error: incompatible types in comparison Use rcu_dereference_protected to access the __rcu annotated pointer. Link: http://lkml.kernel.org/r/20200201072703.17330-1-frextrite@gmail.com Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:35:27 -08:00
Pierre-Louis Bossart	df57920d6e	ASoC: SOF: core: release resources on errors in probe_continue [ Upstream commit `410e5e55c9` ] The initial intent of releasing resources in the .remove does not work well with HDaudio codecs. If the probe_continue() fails in a work queue, e.g. due to missing firmware or authentication issues, we don't release any resources, and as a result the kernel oopses during suspend operations. The suggested fix is to release all resources during errors in probe_continue(), and use fw_state to track resource allocation state, so that .remove does not attempt to release the same hardware resources twice. PM operations are also modified so that no action is done if DSP resources have been freed due to an error at probe. Reported-by: Takashi Iwai <tiwai@suse.de> Co-developed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Bugzilla: http://bugzilla.suse.com/show_bug.cgi?id=1161246 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/20200124213625.30186-4-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:35:27 -08:00
Ranjani Sridharan	3145862d8f	ASoC: SOF: Introduce state machine for FW boot [ Upstream commit `6ca5cecbd1` ] Add a state machine for FW boot to track the different stages of FW boot and replace the boot_complete field with fw_state field in struct snd_sof_dev. This will be used to determine the actions to be performed during system suspend. One of the main motivations for adding this change is the fact that errors during the top-level SOF device probe cannot be propagated and therefore suspending the SOF device normally during system suspend could potentially run into errors. For example, with the current flow, if the FW boot failed for some reason and the system suspends, the SOF device suspend could fail because the CTX_SAVE IPC would be attempted even though the FW never really booted successfully causing it to time out. Another scenario that the state machine fixes is when the runtime suspend for the SOF device fails and the DSP is powered down nevertheless, the CTX_SAVE IPC during system suspend would timeout because the DSP is already powered down. Reviewed-by: Curtis Malainey <cujomalainey@chromium.org> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20191218002616.7652-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:35:27 -08:00
Quinn Tran	0b84591fdd	scsi: qla2xxx: Fix stuck login session using prli_pend_timer [ Upstream commit `8aaac2d7da` ] Session is stuck if driver sees FW has received a PRLI. Driver allows FW to finish with processing of PRLI by checking back with FW at a later time to see if the PRLI has finished. Instead, driver failed to push forward after re-checking PRLI completion. Fixes: `ce0ba496dc` ("scsi: qla2xxx: Fix stuck login session") Cc: stable@vger.kernel.org # 5.3 Link: https://lore.kernel.org/r/20191217220617.28084-9-hmadhani@marvell.com Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-02-11 04:35:27 -08:00
Mike Snitzer	78cbd2c397	dm: fix potential for q->make_request_fn NULL pointer commit `47ace7e012` upstream. Move blk_queue_make_request() to dm.c:alloc_dev() so that q->make_request_fn is never NULL during the lifetime of a DM device (even one that is created without a DM table). Otherwise generic_make_request() will crash simply by doing: dmsetup create -n test mount /dev/dm-N /mnt While at it, move ->congested_data initialization out of dm.c:alloc_dev() and into the bio-based specific init method. Reported-by: Stefan Bader <stefan.bader@canonical.com> BugLink: https://bugs.launchpad.net/bugs/1860231 Fixes: `ff36ab3458` ("dm: remove request-based logic from make_request_fn wrapper") Depends-on: `c12c9a3c38` ("dm: various cleanups to md->queue initialization code") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:27 -08:00
Mike Snitzer	1426201af0	dm thin metadata: use pool locking at end of dm_pool_metadata_close commit `44d8ebf436` upstream. Ensure that the pool is locked during calls to __commit_transaction and __destroy_persistent_data_objects. Just being consistent with locking, but reality is dm_pool_metadata_close is called once pool is being destroyed so access to pool shouldn't be contended. Also, use pmd_write_lock_in_core rather than __pmd_write_lock in dm_pool_commit_metadata and rename __pmd_write_lock to pmd_write_lock_in_core -- there was no need for the alias. In addition, verify that the pool is locked in __commit_transaction(). Fixes: `873f258bec` ("dm thin metadata: do not write metadata if no changes occurred") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:26 -08:00
Milan Broz	40d3d8d6eb	dm crypt: fix benbi IV constructor crash if used in authenticated mode commit `4ea9471fbd` upstream. If benbi IV is used in AEAD construction, for example: cryptsetup luksFormat <device> --cipher twofish-xts-benbi --key-size 512 --integrity=hmac-sha256 the constructor uses wrong skcipher function and crashes: BUG: kernel NULL pointer dereference, address: 00000014 ... EIP: crypt_iv_benbi_ctr+0x15/0x70 [dm_crypt] Call Trace: ? crypt_subkey_size+0x20/0x20 [dm_crypt] crypt_ctr+0x567/0xfc0 [dm_crypt] dm_table_add_target+0x15f/0x340 [dm_mod] Fix this by properly using crypt_aead_blocksize() in this case. Fixes: `ef43aa3806` ("dm crypt: add cryptographic data integrity protection (authenticated encryption)") Cc: stable@vger.kernel.org # v4.12+ Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=941051 Reported-by: Jerad Simpson <jbsimpson@gmail.com> Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:26 -08:00
Mikulas Patocka	b805ec7d08	dm crypt: fix GFP flags passed to skcipher_request_alloc() commit `9402e95901` upstream. GFP_KERNEL is not supposed to be or'd with GFP_NOFS (the result is equivalent to GFP_KERNEL). Also, we use GFP_NOIO instead of GFP_NOFS because we don't want any I/O being submitted in the direct reclaim path. Fixes: `39d13a1ac4` ("dm crypt: reuse eboiv skcipher for IV generation") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:26 -08:00
Mikulas Patocka	1781fa54a4	dm writecache: fix incorrect flush sequence when doing SSD mode commit commit `aa9509209c` upstream. When committing state, the function writecache_flush does the following: 1. write metadata (writecache_commit_flushed) 2. flush disk cache (writecache_commit_flushed) 3. wait for data writes to complete (writecache_wait_for_ios) 4. increase superblock seq_count 5. write the superblock 6. flush disk cache It may happen that at step 3, when we wait for some write to finish, the disk may report the write as finished, but the write only hit the disk cache and it is not yet stored in persistent storage. At step 5 we write the superblock - it may happen that the superblock is written before the write that we waited for in step 3. If the machine crashes, it may result in incorrect data being returned after reboot. In order to fix the bug, we must swap steps 2 and 3 in the above sequence, so that we first wait for writes to complete and then flush the disk cache. Fixes: `48debafe4f` ("dm: add writecache target") Cc: stable@vger.kernel.org # 4.18+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:26 -08:00
Joe Thornber	a8d99d6301	dm space map common: fix to ensure new block isn't already in use commit `4feaef830d` upstream. The space-maps track the reference counts for disk blocks allocated by both the thin-provisioning and cache targets. There are variants for tracking metadata blocks and data blocks. Transactionality is implemented by never touching blocks from the previous transaction, so we can rollback in the event of a crash. When allocating a new block we need to ensure the block is free (has reference count of 0) in both the current and previous transaction. Prior to this fix we were doing this by searching for a free block in the previous transaction, and relying on a 'begin' counter to track where the last allocation in the current transaction was. This 'begin' field was not being updated in all code paths (eg, increment of a data block reference count due to breaking sharing of a neighbour block in the same btree leaf). This fix keeps the 'begin' field, but now it's just a hint to speed up the search. Instead the current transaction is searched for a free block, and then the old transaction is double checked to ensure it's free. Much simpler. This fixes reports of sm_disk_new_block()'s BUG_ON() triggering when DM thin-provisioning's snapshots are heavily used. Reported-by: Eric Wheeler <dm-devel@lists.ewheeler.net> Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:26 -08:00
Dmitry Fomichev	188f9b710f	dm zoned: support zone sizes smaller than 128MiB commit `b399629503` upstream. dm-zoned is observed to log failed kernel assertions and not work correctly when operating against a device with a zone size smaller than 128MiB (e.g. 32768 bits per 4K block). The reason is that the bitmap size per zone is calculated as zero with such a small zone size. Fix this problem and also make the code related to zone bitmap management be able to handle per zone bitmaps smaller than a single block. A dm-zoned-tools patch is required to properly format dm-zoned devices with zone sizes smaller than 128MiB. Fixes: `3b1a94c88b` ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:26 -08:00
Chen-Yu Tsai	ad7c38abe8	ARM: dma-api: fix max_pfn off-by-one error in __dma_supported() commit `f3cc4e1d44` upstream. max_pfn, as set in arch/arm/mm/init.c: static void __init find_limits(unsigned long min, unsigned long max_low, unsigned long max_high) { max_low = PFN_DOWN(memblock_get_current_limit()); min = PFN_UP(memblock_start_of_DRAM()); max_high = PFN_DOWN(memblock_end_of_DRAM()); } with memblock_end_of_DRAM() pointing to the next byte after DRAM. As such, max_pfn points to the PFN after the end of DRAM. Thus when using max_pfn to check DMA masks, we should subtract one when checking DMA ranges against it. Commit `8bf1268f48` ("ARM: dma-api: fix off-by-one error in __dma_supported()") fixed the same issue, but missed this spot. This issue was found while working on the sun4i-csi v4l2 driver on the Allwinner R40 SoC. On Allwinner SoCs, DRAM is offset at 0x40000000, and we are starting to use of_dma_configure() with the "dma-ranges" property in the device tree to have the DMA API handle the offset. In this particular instance, dma-ranges was set to the same range as the actual available (2 GiB) DRAM. The following error appeared when the driver attempted to allocate a buffer: sun4i-csi 1c09000.csi: Coherent DMA mask 0x7fffffff (pfn 0x40000-0xc0000) covers a smaller range of system memory than the DMA zone pfn 0x0-0xc0001 sun4i-csi 1c09000.csi: dma_alloc_coherent of size 307200 failed Fixing the off-by-one error makes things work. Link: http://lkml.kernel.org/r/20191224030239.5656-1-wens@kernel.org Fixes: `11a5aa3256` ("ARM: dma-mapping: check DMA mask against available memory") Fixes: `9f28cde0bc` ("ARM: another fix for the DMA mapping checks") Fixes: `ab746573c4` ("ARM: dma-mapping: allow larger DMA mask than supported") Signed-off-by: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Robin Murphy <robin.murphy@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:25 -08:00
Michael Ellerman	bae74e7ac8	of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc commit `dabf6b36b8` upstream. There's an OF helper called of_dma_is_coherent(), which checks if a device has a "dma-coherent" property to see if the device is coherent for DMA. But on some platforms devices are coherent by default, and on some platforms it's not possible to update existing device trees to add the "dma-coherent" property. So add a Kconfig symbol to allow arch code to tell of_dma_is_coherent() that devices are coherent by default, regardless of the presence of the property. Select that symbol on powerpc when NOT_COHERENT_CACHE is not set, ie. when the system has a coherent cache. Fixes: `92ea637ede` ("of: introduce of_dma_is_coherent() helper") Cc: stable@vger.kernel.org # v3.16+ Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:25 -08:00
Rafael J. Wysocki	f5f68d165d	cpufreq: Avoid creating excessively large stack frames commit `1e4f63aecb` upstream. In the process of modifying a cpufreq policy, the cpufreq core makes a copy of it including all of the internals which is stored on the CPU stack. Because struct cpufreq_policy is relatively large, this may cause the size of the stack frame to exceed the 2 KB limit and so the GCC complains when -Wframe-larger-than= is used. In fact, it is not necessary to copy the entire policy structure in order to modify it, however. First, because cpufreq_set_policy() obtains the min and max policy limits from frequency QoS now, it is not necessary to pass the limits to it from the callers. The only things that need to be passed to it from there are the new governor pointer or (if there is a built-in governor in the driver) the "policy" value representing the governor choice. They both can be passed as individual arguments, though, so make cpufreq_set_policy() take them this way and rework its callers accordingly. This avoids making copies of cpufreq policies in the callers of cpufreq_set_policy(). Second, cpufreq_set_policy() still needs to pass the new policy data to the ->verify() callback of the cpufreq driver whose task is to sanitize the min and max policy limits. It still does not need to make a full copy of struct cpufreq_policy for this purpose, but it needs to pass a few items from it to the driver in case they are needed (different drivers have different needs in that respect and all of them have to be covered). For this reason, introduce struct cpufreq_policy_data to hold copies of the members of struct cpufreq_policy used by the existing ->verify() driver callbacks and pass a pointer to a temporary structure of that type to ->verify() (instead of passing a pointer to full struct cpufreq_policy to it). While at it, notice that intel_pstate and longrun don't really need to verify the "policy" value in struct cpufreq_policy, so drop those check from them to avoid copying "policy" into struct cpufreq_policy_data (which allows it to be slightly smaller). Also while at it fix up white space in a couple of places and make cpufreq_set_policy() static (as it can be so). Fixes: `3000ce3c52` ("cpufreq: Use per-policy frequency QoS") Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.com Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-02-11 04:35:25 -08:00

1 2 3 4 5 ...

875999 Commits