commit 95dd77580c upstream.
On nfsv2 and nfsv3 the nfs server can export subsets of the same
filesystem and report the same filesystem identifier, so that the nfs
client can know they are the same filesystem. The subsets can be from
disjoint directory trees. The nfsv2 and nfsv3 filesystems provides no
way to find the common root of all directory trees exported form the
server with the same filesystem identifier.
The practical result is that in struct super s_root for nfs s_root is
not necessarily the root of the filesystem. The nfs mount code sets
s_root to the root of the first subset of the nfs filesystem that the
kernel mounts.
This effects the dcache invalidation code in generic_shutdown_super
currently called shrunk_dcache_for_umount and that code for years
has gone through an additional list of dentries that might be dentry
trees that need to be freed to accomodate nfs.
When I wrote path_connected I did not realize nfs was so special, and
it's hueristic for avoiding calling is_subdir can fail.
The practical case where this fails is when there is a move of a
directory from the subtree exposed by one nfs mount to the subtree
exposed by another nfs mount. This move can happen either locally or
remotely. With the remote case requiring that the move directory be cached
before the move and that after the move someone walks the path
to where the move directory now exists and in so doing causes the
already cached directory to be moved in the dcache through the magic
of d_splice_alias.
If someone whose working directory is in the move directory or a
subdirectory and now starts calling .. from the initial mount of nfs
(where s_root == mnt_root), then path_connected as a heuristic will
not bother with the is_subdir check. As s_root really is not the root
of the nfs filesystem this heuristic is wrong, and the path may
actually not be connected and path_connected can fail.
The is_subdir function might be cheap enough that we can call it
unconditionally. Verifying that will take some benchmarking and
the result may not be the same on all kernels this fix needs
to be backported to. So I am avoiding that for now.
Filesystems with snapshots such as nilfs and btrfs do something
similar. But as the directory tree of the snapshots are disjoint
from one another and from the main directory tree rename won't move
things between them and this problem will not occur.
Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Fixes: 397d425dc2 ("vfs: Test for and handle paths that are unreachable from their mnt_root")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0c08aaf873 ]
ISL9305_MAX_REGULATOR is the last index used to access the init_data[]
array, so we need to add one to this last index to obtain the necessary
array size.
This fixes the following smatch error:
drivers/regulator/isl9305.c:160 isl9305_i2c_probe() error: buffer overflow 'pdata->init_data' 3 <= 3
Fixes: dec38b5ce6 ("regulator: isl9305: Add Intersil ISL9305/H driver")
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fd08604555 ]
Commit 26988efe11 ("regulator: core: Allow to get voltage count and
list from parent") introduces the propagation of the parent voltage
count and list for regulators that don't provide this information
themselves. The goal is to support simple switch regulators, however as
a side effect normal continuous regulators can leak details of their
supplies and provide consumers with inconsistent information.
Limit the propagation of the voltage count and list to switch
regulators.
Fixes: 26988efe11 ("regulator: core: Allow to get voltage count and
list from parent")
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c48367427a ]
Because sysctl_tcp_adv_win_scale could be changed any time, so there
is one race in tcp_win_from_space.
For example,
1.sysctl_tcp_adv_win_scale<=0 (sysctl_tcp_adv_win_scale is negative now)
2.space>>(-sysctl_tcp_adv_win_scale) (sysctl_tcp_adv_win_scale is postive now)
As a result, tcp_win_from_space returns 0. It is unexpected.
Certainly if the compiler put the sysctl_tcp_adv_win_scale into one
register firstly, then use the register directly, it would be ok.
But we could not depend on the compiler behavior.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6f051e4a68 ]
[resend due to me forgetting to cc: linux-api the first time around I
posted these back on Feb 23]
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When userspace tries to use these defines, it complains that it needs to
be an unsigned 1 that is shifted, so libc implementations have to create
their own version. Fix this by defining it properly so that libcs can
just use the kernel uapi header.
Reported-by: Elliott Hughes <enh@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cb88a05887 upstream.
Corsair Strafe RGB keyboard does not respond to usb control messages
sometimes and hence generates timeouts.
Commit de3af5bf25 ("usb: quirks: add delay init quirk for Corsair
Strafe RGB keyboard") tried to fix those timeouts by adding
USB_QUIRK_DELAY_INIT.
Unfortunately, even with this quirk timeouts of usb_control_msg()
can still be seen, but with a lower frequency (approx. 1 out of 15):
[ 29.103520] usb 1-8: string descriptor 0 read error: -110
[ 34.363097] usb 1-8: can't set config #1, error -110
Adding further delays to different locations where usb control
messages are issued just moves the timeouts to other locations,
e.g.:
[ 35.400533] usbhid 1-8:1.0: can't add hid device: -110
[ 35.401014] usbhid: probe of 1-8:1.0 failed with error -110
The only way to reliably avoid those issues is having a pause after
each usb control message. In approx. 200 boot cycles no more timeouts
were seen.
Addionaly, keep USB_QUIRK_DELAY_INIT as it turned out to be necessary
to have the delay in hub_port_connect() after hub_port_init().
The overall boot time seems not to be influenced by these additional
delays, even on fast machines and lightweight distributions.
Fixes: de3af5bf25 ("usb: quirks: add delay init quirk for Corsair Strafe RGB keyboard")
Cc: stable@vger.kernel.org
Signed-off-by: Danilo Krummrich <danilokrummrich@dk-develop.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae0ac0ed6f upstream.
instead of allocating each xt_counter individually, allocate 4k chunks
and then use these for counter allocation requests.
This should speed up rule evaluation by increasing data locality,
also speeds up ruleset loading because we reduce calls to the percpu
allocator.
As Eric points out we can't use PAGE_SIZE, page_allocator would fail on
arches with 64k page size.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d31eef517 upstream.
On SMP we overload the packet counter (unsigned long) to contain
percpu offset. Hide this from callers and pass xt_counters address
instead.
Preparation patch to allocate the percpu counters in page-sized batch
chunks.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 25c058ccaf upstream.
Introduce a helper to determine if the current task is an output poll
worker.
This allows us to fix a long-standing deadlock in several DRM drivers
wherein the ->runtime_suspend callback waits for the output poll worker
to finish and the worker in turn calls a ->detect callback which waits
for runtime suspend to finish. The ->detect callback is invoked from
multiple call sites and waiting for runtime suspend to finish is the
correct thing to do except if it's executing in the context of the
worker.
v2: Expand kerneldoc to specifically mention deadlock between
output poll worker and autosuspend worker as use case. (Lyude)
Cc: Dave Airlie <airlied@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://patchwork.freedesktop.org/patch/msgid/3549ce32e7f1467102e70d3e9cbf70c46bfe108e.1518593424.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 15f35d49c9 ]
Since UDP-Lite is always using checksum, the following path is
triggered when calculating pseudo header for it:
udp4_csum_init() or udp6_csum_init()
skb_checksum_init_zero_check()
__skb_checksum_validate_complete()
The problem can appear if skb->len is less than CHECKSUM_BREAK. In
this particular case __skb_checksum_validate_complete() also invokes
__skb_checksum_complete(skb). If UDP-Lite is using partial checksum
that covers only part of a packet, the function will return bad
checksum and the packet will be dropped.
It can be fixed if we skip skb_checksum_init_zero_check() and only
set the required pseudo header checksum for UDP-Lite with partial
checksum before udp4_csum_init()/udp6_csum_init() functions return.
Fixes: ed70fcfcee ("net: Call skb_checksum_init in IPv4")
Fixes: e4f45b7f40 ("net: Call skb_checksum_init in IPv6")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c0bace7984 ]
libc-compat.h aims to prevent symbol collisions between uapi and libc
headers for each supported libc. This requires continuous coordination
between them.
The goal of this commit is to improve the situation for libcs (such as
musl) which are not yet supported and/or do not wish to be explicitly
supported, while not affecting supported libcs. More precisely, with
this commit, unsupported libcs can request the suppression of any
specific uapi definition by defining the correspondings _UAPI_DEF_*
macro as 0. This can fix symbol collisions for them, as long as the
libc headers are included before the uapi headers. Inclusion in the
other order is outside the scope of this commit.
All infrastructure in order to enable this fallback for unsupported
libcs is already in place, except that libc-compat.h unconditionally
defines all _UAPI_DEF_* macros to 1 for all unsupported libcs so that
any previous definitions are ignored. In order to fix this, this commit
merely makes these definitions conditional.
This commit together with the musl libc commit
http://git.musl-libc.org/cgit/musl/commit/?id=04983f2272382af92eb8f8838964ff944fbb8258
fixes for example the following compiler errors when <linux/in6.h> is
included after musl's <netinet/in.h>:
./linux/in6.h:32:8: error: redefinition of 'struct in6_addr'
./linux/in6.h:49:8: error: redefinition of 'struct sockaddr_in6'
./linux/in6.h:59:8: error: redefinition of 'struct ipv6_mreq'
The comments referencing glibc are still correct, but this file is not
only used for glibc any more.
Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41fce90f26 upstream.
The following namespace configuration attempt:
# ndctl create-namespace -e namespace0.0 -m devdax -a 1G -f
libndctl: ndctl_dax_enable: dax0.1: failed to enable
Error: namespace0.0: failed to enable
failed to reconfigure namespace: No such device or address
...fails when the backing memory range is not physically aligned to 1G:
# cat /proc/iomem | grep Persistent
210000000-30fffffff : Persistent Memory (legacy)
In the above example the 4G persistent memory range starts and ends on a
256MB boundary.
We handle this case correctly when needing to handle cases that violate
section alignment (128MB) collisions against "System RAM", and we simply
need to extend that padding/truncation for the 1GB alignment use case.
Cc: <stable@vger.kernel.org>
Fixes: 315c562536 ("libnvdimm, pfn: add 'align' attribute...")
Reported-and-tested-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bb6d28370 upstream.
Patch series "introduce get_user_pages_longterm()", v2.
Here is a new get_user_pages api for cases where a driver intends to
keep an elevated page count indefinitely. This is distinct from usages
like iov_iter_get_pages where the elevated page counts are transient.
The iov_iter_get_pages cases immediately turn around and submit the
pages to a device driver which will put_page when the i/o operation
completes (under kernel control).
In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future. This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime
of the block / page and needs reasonable limits on how long it can wait
for pages in a mapping to become idle.
Fixing filesystems to actually wait for dax pages to be idle before
blocks from a truncate/hole-punch operation are repurposed is saved for
a later patch series.
Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.
I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings
were supported by the kernel. The behavior regression this policy
change implies is one of the reasons we maintain the "dax enabled.
Warning: EXPERIMENTAL, use at your own risk" notification when mounting
a filesystem in dax mode.
It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.
This patch (of 4):
Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow long standing memory registrations against
filesytem-dax vmas. Device-dax vmas do not have this problem and are
explicitly allowed.
This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and
V4L2).
[akpm@linux-foundation.org: use kcalloc()]
Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a6 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit de5bbdd01c upstream.
pci_host_common_probe() is defined when CONFIG_PCI_HOST_COMMON=y;
therefore the function declaration should match that.
drivers/pci/host/pcie-tango.c:300:9: error:
implicit declaration of function 'pci_host_common_probe'
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 975b820b68 ]
In some cases the clock parent would be set NULL when doing re-parent,
it will cause a NULL pointer accessing if clk_set trace event is
enabled.
This patch sets the parent as "none" if the input parameter is NULL.
Fixes: dfc202ead3 (clk: Add tracepoints for hardware operations)
Signed-off-by: Cai Li <cai.li@spreadtrum.com>
Signed-off-by: Chunyan Zhang <chunyan.zhang@spreadtrum.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9fa68f6200 upstream.
Currently, almost none of the keyed hash algorithms check whether a key
has been set before proceeding. Some algorithms are okay with this and
will effectively just use a key of all 0's or some other bogus default.
However, others will severely break, as demonstrated using
"hmac(sha3-512-generic)", the unkeyed use of which causes a kernel crash
via a (potentially exploitable) stack buffer overflow.
A while ago, this problem was solved for AF_ALG by pairing each hash
transform with a 'has_key' bool. However, there are still other places
in the kernel where userspace can specify an arbitrary hash algorithm by
name, and the kernel uses it as unkeyed hash without checking whether it
is really unkeyed. Examples of this include:
- KEYCTL_DH_COMPUTE, via the KDF extension
- dm-verity
- dm-crypt, via the ESSIV support
- dm-integrity, via the "internal hash" mode with no key given
- drbd (Distributed Replicated Block Device)
This bug is especially bad for KEYCTL_DH_COMPUTE as that requires no
privileges to call.
Fix the bug for all users by adding a flag CRYPTO_TFM_NEED_KEY to the
->crt_flags of each hash transform that indicates whether the transform
still needs to be keyed or not. Then, make the hash init, import, and
digest functions return -ENOKEY if the key is still needed.
The new flag also replaces the 'has_key' bool which algif_hash was
previously using, thereby simplifying the algif_hash implementation.
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a208fa8f33 upstream.
We need to consistently enforce that keyed hashes cannot be used without
setting the key. To do this we need a reliable way to determine whether
a given hash algorithm is keyed or not. AF_ALG currently does this by
checking for the presence of a ->setkey() method. However, this is
actually slightly broken because the CRC-32 algorithms implement
->setkey() but can also be used without a key. (The CRC-32 "key" is not
actually a cryptographic key but rather represents the initial state.
If not overridden, then a default initial state is used.)
Prepare to fix this by introducing a flag CRYPTO_ALG_OPTIONAL_KEY which
indicates that the algorithm has a ->setkey() method, but it is not
required to be called. Then set it on all the CRC-32 algorithms.
The same also applies to the Adler-32 implementation in Lustre.
Also, the cryptd and mcryptd templates have to pass through the flag
from their underlying algorithm.
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f351574172 upstream.
Provide a function, kmemdup_nul(), that will create a NUL-terminated string
from an unterminated character array where the length is known in advance.
This is better than kstrndup() in situations where we already know the
string length as the strnlen() in kstrndup() is superfluous.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ecca8f88da upstream.
Now in sctp_setsockopt_maxseg user_frag or frag_point can be set with
val >= 8 and val <= SCTP_MAX_CHUNK_LEN. But both checks are incorrect.
val >= 8 means frag_point can even be less than SCTP_DEFAULT_MINSEGMENT.
Then in sctp_datamsg_from_user(), when it's value is greater than cookie
echo len and trying to bundle with cookie echo chunk, the first_len will
overflow.
The worse case is when it's value is equal as cookie echo len, first_len
becomes 0, it will go into a dead loop for fragment later on. In Hangbin
syzkaller testing env, oom was even triggered due to consecutive memory
allocation in that loop.
Besides, SCTP_MAX_CHUNK_LEN is the max size of the whole chunk, it should
deduct the data header for frag_point or user_frag check.
This patch does a proper check with SCTP_DEFAULT_MINSEGMENT subtracting
the sctphdr and datahdr, SCTP_MAX_CHUNK_LEN subtracting datahdr when
setting frag_point via sockopt. It also improves sctp_setsockopt_maxseg
codes.
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I ran into a 4.9 build warning in randconfig testing, starting with the
KAISER patches:
arch/x86/kernel/ldt.c: In function 'alloc_ldt_struct':
arch/x86/include/asm/pgtable_types.h:208:24: error: large integer implicitly truncated to unsigned type [-Werror=overflow]
#define __PAGE_KERNEL (__PAGE_KERNEL_EXEC | _PAGE_NX)
^
arch/x86/kernel/ldt.c:81:6: note: in expansion of macro '__PAGE_KERNEL'
__PAGE_KERNEL);
^~~~~~~~~~~~~
I originally ran into this last year when the patches were part of linux-next,
and tried to work around it by using the proper 'pteval_t' types consistently,
but that caused additional problems.
This takes a much simpler approach, and makes the argument type of the dummy
helper always 64-bit, which is wide enough for any page table layout and
won't hurt since this call is just an empty stub anyway.
Fixes: 8f0baadf2b ("kaiser: merged update")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f69120ce6c upstream.
Sphinx emits various (26) warnings when building make target 'htmldocs'.
Currently struct definitions contain duplicate documentation, some as
kernel-docs and some as standard c89 comments. We can reduce
duplication while cleaning up the kernel docs.
Move all kernel-docs to right above each struct member. Use the set of
all existing comments (kernel-doc and c89). Add documentation for
missing struct members and function arguments.
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a16e772e66 upstream.
Since Poly1305 requires a nonce per invocation, the Linux kernel
implementations of Poly1305 don't use the crypto API's keying mechanism
and instead expect the key and nonce as the first 32 bytes of the data.
But ->setkey() is still defined as a stub returning an error code. This
prevents Poly1305 from being used through AF_ALG and will also break it
completely once we start enforcing that all crypto API users (not just
AF_ALG) call ->setkey() if present.
Fix it by removing crypto_poly1305_setkey(), leaving ->setkey as NULL.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd6ed77ad5 upstream.
Templates that use an shash spawn can use crypto_shash_alg_has_setkey()
to determine whether the underlying algorithm requires a key or not.
But there was no corresponding function for ahash spawns. Add it.
Note that the new function actually has to support both shash and ahash
algorithms, since the ahash API can be used with either.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e343e87d2 upstream.
The map_word_() functions, dating back to linux-2.6.8, try to perform
bitwise operations on a 'map_word' structure. This may have worked
with compilers that were current then (gcc-3.4 or earlier), but end
up being rather inefficient on any version I could try now (gcc-4.4 or
higher). Specifically we hit a problem analyzed in gcc PR81715 where we
fail to reuse the stack space for local variables.
This can be seen immediately in the stack consumption for
cfi_staa_erase_varsize() and other functions that (with CONFIG_KASAN)
can be up to 2200 bytes. Changing the inline functions into macros brings
this down to 1280 bytes. Without KASAN, the same problem exists, but
the stack consumption is lower to start with, my patch shrinks it from
920 to 496 bytes on with arm-linux-gnueabi-gcc-5.4, and saves around
1KB in .text size for cfi_cmdset_0020.c, as it avoids copying map_word
structures for each call to one of these helpers.
With the latest gcc-8 snapshot, the problem is fixed in upstream gcc,
but nobody uses that yet, so we should still work around it in mainline
kernels and probably backport the workaround to stable kernels as well.
We had a couple of other functions that suffered from the same gcc bug,
and all of those had a simpler workaround involving dummy variables
in the inline function. Unfortunately that did not work here, the
macro hack was the best I could come up with.
It would also be helpful to have someone to a little performance testing
on the patch, to see how much it helps in terms of CPU utilitzation.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit caf7501a1b)
There's a risk that a kernel which has full retpoline mitigations becomes
vulnerable when a module gets loaded that hasn't been compiled with the
right compiler or the right option.
To enable detection of that mismatch at module load time, add a module info
string "retpoline" at build time when the module was compiled with
retpoline support. This only covers compiled C source, but assembler source
or prebuilt object files are not checked.
If a retpoline enabled kernel detects a non retpoline protected module at
load time, print a warning and report it in the sysfs vulnerability file.
[ tglx: Massaged changelog ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: gregkh@linuxfoundation.org
Cc: torvalds@linux-foundation.org
Cc: jeyu@kernel.org
Cc: arjan@linux.intel.com
Link: https://lkml.kernel.org/r/20180125235028.31211-1-andi@firstfloor.org
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b027e2298b upstream.
There can be a race, if receive_buf call comes before
tty initialization completes in n_tty_open and tty->disc_data
may be NULL.
CPU0 CPU1
---- ----
000|n_tty_receive_buf_common() n_tty_open()
-001|n_tty_receive_buf2() tty_ldisc_open.isra.3()
-002|tty_ldisc_receive_buf(inline) tty_ldisc_setup()
Using ldisc semaphore lock in tty_init_dev till disc_data
initializes completely.
Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org>
Reviewed-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1410a90ae4 upstream.
HW can implement UMR wqe re-transmission in various ways.
Thus, add HCA cap to distinguish the needed fence for UMR to make
sure that the wqe wouldn't fail on mkey checks.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: Marta Rybczynska <mrybczyn@kalray.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ upstream commit be95a845cc ]
In addition to commit b2157399cc ("bpf: prevent out-of-bounds
speculation") also change the layout of struct bpf_map such that
false sharing of fast-path members like max_entries is avoided
when the maps reference counter is altered. Therefore enforce
them to be placed into separate cachelines.
pahole dump after change:
struct bpf_map {
const struct bpf_map_ops * ops; /* 0 8 */
struct bpf_map * inner_map_meta; /* 8 8 */
void * security; /* 16 8 */
enum bpf_map_type map_type; /* 24 4 */
u32 key_size; /* 28 4 */
u32 value_size; /* 32 4 */
u32 max_entries; /* 36 4 */
u32 map_flags; /* 40 4 */
u32 pages; /* 44 4 */
u32 id; /* 48 4 */
int numa_node; /* 52 4 */
bool unpriv_array; /* 56 1 */
/* XXX 7 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
struct user_struct * user; /* 64 8 */
atomic_t refcnt; /* 72 4 */
atomic_t usercnt; /* 76 4 */
struct work_struct work; /* 80 32 */
char name[16]; /* 112 16 */
/* --- cacheline 2 boundary (128 bytes) --- */
/* size: 128, cachelines: 2, members: 17 */
/* sum members: 121, holes: 1, sum holes: 7 */
};
Now all entries in the first cacheline are read only throughout
the life time of the map, set up once during map creation. Overall
struct size and number of cachelines doesn't change from the
reordering. struct bpf_map is usually first member and embedded
in map structs in specific map implementations, so also avoid those
members to sit at the end where it could potentially share the
cacheline with first map values e.g. in the array since remote
CPUs could trigger map updates just as well for those (easily
dirtying members like max_entries intentionally as well) while
having subsequent values in cache.
Quoting from Google's Project Zero blog [1]:
Additionally, at least on the Intel machine on which this was
tested, bouncing modified cache lines between cores is slow,
apparently because the MESI protocol is used for cache coherence
[8]. Changing the reference counter of an eBPF array on one
physical CPU core causes the cache line containing the reference
counter to be bounced over to that CPU core, making reads of the
reference counter on all other CPU cores slow until the changed
reference counter has been written back to memory. Because the
length and the reference counter of an eBPF array are stored in
the same cache line, this also means that changing the reference
counter on one physical CPU core causes reads of the eBPF array's
length to be slow on other physical CPU cores (intentional false
sharing).
While this doesn't 'control' the out-of-bounds speculation through
masking the index as in commit b2157399cc, triggering a manipulation
of the map's reference counter is really trivial, so lets not allow
to easily affect max_entries from it.
Splitting to separate cachelines also generally makes sense from
a performance perspective anyway in that fast-path won't have a
cache miss if the map gets pinned, reused in other progs, etc out
of control path, thus also avoids unintentional false sharing.
[1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.html
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cd9ff4de01 ]
Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
to avoid making an entry for every remote ip the device needs to talk to.
This used the be the old behavior but became broken in a263b30936
(ipv4: Make neigh lookups directly in output packet path) and later removed
in 0bb4087cbe (ipv4: Fix neigh lookup keying over loopback/point-to-point
devices) because it was broken.
Signed-off-by: Jim Westfall <jwestfall@surrealistic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4ee806d511 ]
When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.
For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open. However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open. In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence. The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:
unregister_netdevice: waiting for lo to become free. Usage count = 1
These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.
After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman <ddstreet@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>