Commit Graph

802537 Commits

Author SHA1 Message Date
Jason A. Donenfeld
4e804a1896 UPSTREAM: crypto: curve25519 - generic C library implementations
This contains two formally verified C implementations of the Curve25519
scalar multiplication function, one for 32-bit systems, and one for
64-bit systems whose compiler supports efficient 128-bit integer types.
Not only are these implementations formally verified, but they are also
the fastest available C implementations. They have been modified to be
friendly to kernel space and to be generally less horrendous looking,
but still an effort has been made to retain their formally verified
characteristic, and so the C might look slightly unidiomatic.

The 64-bit version comes from HACL*: https://github.com/project-everest/hacl-star
The 32-bit version comes from Fiat: https://github.com/mit-plv/fiat-crypto

Information: https://cr.yp.to/ecdh.html

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
[ardb: - move from lib/zinc to lib/crypto
       - replace .c #includes with Kconfig based object selection
       - drop simd handling and simplify support for per-arch versions ]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 0ed42a6f43)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib60138ed7973ed6c75fe92f8666ca8adce3e17ca
2020-10-24 22:42:43 +02:00
Jason A. Donenfeld
db2e6ec625 UPSTREAM: crypto: blake2s - x86_64 SIMD implementation
These implementations from Samuel Neves support AVX and AVX-512VL.
Originally this used AVX-512F, but Skylake thermal throttling made
AVX-512VL more attractive and possible to do with negligable difference.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Co-developed-by: Samuel Neves <sneves@dei.uc.pt>
[ardb: move to arch/x86/crypto, wire into lib/crypto framework]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit ed0356eda1)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I26150abb48c1508d922727d7b9d788e0251f411c
2020-10-24 22:42:43 +02:00
Ard Biesheuvel
537dbe51f3 UPSTREAM: crypto: blake2s - implement generic shash driver
Wire up our newly added Blake2s implementation via the shash API.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 7f9b088092)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2cc105bb4fd251cfea616323f352730832f07aaa
2020-10-24 22:42:43 +02:00
Ard Biesheuvel
8e4def702e UPSTREAM: crypto: testmgr - add test cases for Blake2s
As suggested by Eric for the Blake2b implementation contributed by
David, introduce a set of test vectors for Blake2s covering different
digest and key sizes.

          blake2s-128  blake2s-160  blake2s-224  blake2s-256
         ---------------------------------------------------
len=0   | klen=0       klen=1       klen=16      klen=32
len=1   | klen=16      klen=32      klen=0       klen=1
len=7   | klen=32      klen=0       klen=1       klen=16
len=15  | klen=1       klen=16      klen=32      klen=0
len=64  | klen=0       klen=1       klen=16      klen=32
len=247 | klen=16      klen=32      klen=0       klen=1
len=256 | klen=32      klen=0       klen=1       klen=16

Cc: David Sterba <dsterba@suse.com>
Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 17e1df6702)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: If7b4154623ead41346837eefacbcee5354ff7765
2020-10-24 22:42:43 +02:00
Jason A. Donenfeld
dc97e1c5ba UPSTREAM: crypto: blake2s - generic C library implementation and selftest
The C implementation was originally based on Samuel Neves' public
domain reference implementation but has since been heavily modified
for the kernel. We're able to do compile-time optimizations by moving
some scaffolding around the final function into the header file.

Information: https://blake2.net/

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Co-developed-by: Samuel Neves <sneves@dei.uc.pt>
[ardb: - move from lib/zinc to lib/crypto
       - remove simd handling
       - rewrote selftest for better coverage
       - use fixed digest length for blake2s_hmac() and rename to
         blake2s256_hmac() ]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 66d7fb94e4)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia9f421fda204693adf8726c6b68f9eebdb4eb789
2020-10-24 22:42:43 +02:00
Ard Biesheuvel
688b66e083 UPSTREAM: crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized implementation
This is a straight import of the OpenSSL/CRYPTOGAMS Poly1305 implementation for
MIPS authored by Andy Polyakov, a prior 64-bit only version of which has been
contributed by him to the OpenSSL project. The file 'poly1305-mips.pl' is taken
straight from this upstream GitHub repository [0] at commit
d22ade312a7af958ec955620b0d241cf42c37feb, and already contains all the changes
required to build it as part of a Linux kernel module.

[0] https://github.com/dot-asm/cryptogams

Co-developed-by: Andy Polyakov <appro@cryptogams.org>
Signed-off-by: Andy Polyakov <appro@cryptogams.org>
Co-developed-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit a11d055e7a)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3f970f4d675b2a2f3e30ee6f28f32824e125e6e8
2020-10-24 22:42:43 +02:00
Ard Biesheuvel
2c07f92a4b UPSTREAM: crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation
This is a straight import of the OpenSSL/CRYPTOGAMS Poly1305 implementation
for NEON authored by Andy Polyakov, and contributed by him to the OpenSSL
project. The file 'poly1305-armv4.pl' is taken straight from this upstream
GitHub repository [0] at commit ec55a08dc0244ce570c4fc7cade330c60798952f,
and already contains all the changes required to build it as part of a
Linux kernel module.

[0] https://github.com/dot-asm/cryptogams

Co-developed-by: Andy Polyakov <appro@cryptogams.org>
Signed-off-by: Andy Polyakov <appro@cryptogams.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit a6b803b3dd)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ied190f22b3e93f1f88d0568b10a294ab83e11bfe
2020-10-24 22:42:43 +02:00
Ard Biesheuvel
36a372fd77 UPSTREAM: crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation
This is a straight import of the OpenSSL/CRYPTOGAMS Poly1305 implementation
for NEON authored by Andy Polyakov, and contributed by him to the OpenSSL
project. The file 'poly1305-armv8.pl' is taken straight from this upstream
GitHub repository [0] at commit ec55a08dc0244ce570c4fc7cade330c60798952f,
and already contains all the changes required to build it as part of a
Linux kernel module.

[0] https://github.com/dot-asm/cryptogams

Co-developed-by: Andy Polyakov <appro@cryptogams.org>
Signed-off-by: Andy Polyakov <appro@cryptogams.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit f569ca1647)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I28b5962751813edf2ce5fa15f2c9a228e8df5907
2020-10-24 17:02:15 +02:00
Ard Biesheuvel
4e42c8cac2 UPSTREAM: crypto: x86/poly1305 - expose existing driver as poly1305 library
Implement the arch init/update/final Poly1305 library routines in the
accelerated SIMD driver for x86 so they are accessible to users of
the Poly1305 library interface as well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit f0e89bcfbb)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib675528205383a4816e2a01f68e89eba2b0abb4d
2020-10-24 14:43:43 +02:00
Ard Biesheuvel
48c9cbd789 UPSTREAM: crypto: x86/poly1305 - depend on generic library not generic shash
Remove the dependency on the generic Poly1305 driver. Instead, depend
on the generic library so that we only reuse code without pulling in
the generic skcipher implementation as well.

While at it, remove the logic that prefers the non-SIMD path for short
inputs - this is no longer necessary after recent FPU handling changes
on x86.

Since this removes the last remaining user of the routines exported
by the generic shash driver, unexport them and make them static.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 1b2c6a5120)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1003433816e6f3f458d7ccbcd0d4790666cdf8d1
2020-10-24 14:43:43 +02:00
Ard Biesheuvel
d1fa3dd17d UPSTREAM: crypto: poly1305 - expose init/update/final library interface
Expose the existing generic Poly1305 code via a init/update/final
library interface so that callers are not required to go through
the crypto API's shash abstraction to access it. At the same time,
make some preparations so that the library implementation can be
superseded by an accelerated arch-specific version in the future.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit a1d9306409)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibd2539acddd9b8de1100d9fab832e58006736607
2020-10-24 14:43:42 +02:00
Ard Biesheuvel
c99c627218 UPSTREAM: crypto: x86/poly1305 - unify Poly1305 state struct with generic code
In preparation of exposing a Poly1305 library interface directly from
the accelerated x86 driver, align the state descriptor of the x86 code
with the one used by the generic driver. This is needed to make the
library interface unified between all implementations.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit ad8f5b8838)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4425d0b514eec8e1d3e88e9972f270cc17bf8064
2020-10-24 14:43:41 +02:00
Ard Biesheuvel
4afb116cc1 UPSTREAM: crypto: poly1305 - move core routines into a separate library
Move the core Poly1305 routines shared between the generic Poly1305
shash driver and the Adiantum and NHPoly1305 drivers into a separate
library so that using just this pieces does not pull in the crypto
API pieces of the generic Poly1305 routine.

In a subsequent patch, we will augment this generic library with
init/update/final routines so that Poyl1305 algorithm can be used
directly without the need for using the crypto API's shash abstraction.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 48ea8c6ebc)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1d1ebae722acdb3a908822b8a5b126689e2147c3
2020-10-24 14:43:40 +02:00
Ard Biesheuvel
aff2622b02 UPSTREAM: crypto: chacha - unexport chacha_generic routines
Now that all users of generic ChaCha code have moved to the core library,
there is no longer a need for the generic ChaCha skcpiher driver to
export parts of it implementation for reuse by other drivers. So drop
the exports, and make the symbols static.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 22cf705360)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie7a711756d34f1578c32c59cd5e1e8e3ef82368f
2020-10-24 14:43:39 +02:00
Ard Biesheuvel
b449e889ac UPSTREAM: crypto: mips/chacha - wire up accelerated 32r2 code from Zinc
This integrates the accelerated MIPS 32r2 implementation of ChaCha
into both the API and library interfaces of the kernel crypto stack.

The significance of this is that, in addition to becoming available
as an accelerated library implementation, it can also be used by
existing crypto API code such as Adiantum (for block encryption on
ultra low performance cores) or IPsec using chacha20poly1305. These
are use cases that have already opted into using the abstract crypto
API. In order to support Adiantum, the core assembler routine has
been adapted to take the round count as a function argument rather
than hardcoding it to 20.

Co-developed-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 3a2f58f3ba)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2ae4452b68026e40d3fe637e9ae28c65ada08501
2020-10-24 14:43:39 +02:00
Jason A. Donenfeld
c683297716 UPSTREAM: crypto: mips/chacha - import 32r2 ChaCha code from Zinc
This imports the accelerated MIPS 32r2 ChaCha20 implementation from the
Zinc patch set.

Co-developed-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 49aa7c00ed)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I106009737016069039353147237e25ac2d007251
2020-10-24 14:43:38 +02:00
Ard Biesheuvel
1567532217 UPSTREAM: crypto: arm/chacha - expose ARM ChaCha routine as library function
Expose the accelerated NEON ChaCha routine directly as a symbol
export so that users of the ChaCha library API can use it directly.

Given that calls into the library API will always go through the
routines in this module if it is enabled, switch to static keys
to select the optimal implementation available (which may be none
at all, in which case we defer to the generic implementation for
all invocations).

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit a44a3430d7)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: If192626fa151d4f36d53923e5cecb62d16cdb8d7
2020-10-24 14:43:37 +02:00
Ard Biesheuvel
221466ef1b UPSTREAM: crypto: arm/chacha - remove dependency on generic ChaCha driver
Instead of falling back to the generic ChaCha skcipher driver for
non-SIMD cases, use a fast scalar implementation for ARM authored
by Eric Biggers. This removes the module dependency on chacha-generic
altogether, which also simplifies things when we expose the ChaCha
library interface from this module.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit b36d8c09e7)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2614ce768f4b92613a38e5ea72691b8aed26807e
2020-10-24 14:43:36 +02:00
Ard Biesheuvel
1a95e0ba81 UPSTREAM: crypto: arm/chacha - import Eric Biggers's scalar accelerated ChaCha code
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 29621d099f)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idd585180e5d6c1dff60bd36f8b23bd42feef740b
2020-10-24 14:43:35 +02:00
Ard Biesheuvel
b8a786ecda UPSTREAM: crypto: arm64/chacha - expose arm64 ChaCha routine as library function
Expose the accelerated NEON ChaCha routine directly as a symbol
export so that users of the ChaCha library API can use it directly.

Given that calls into the library API will always go through the
routines in this module if it is enabled, switch to static keys
to select the optimal implementation available (which may be none
at all, in which case we defer to the generic implementation for
all invocations).

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit b3aad5bad2)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic282eca4a7a2b24c15bad2447d4093a89db96f4c
2020-10-24 14:43:34 +02:00
Ard Biesheuvel
9bb7accce8 UPSTREAM: crypto: arm64/chacha - depend on generic chacha library instead of crypto driver
Depend on the generic ChaCha library routines instead of pulling in the
generic ChaCha skcipher driver, which is more than we need, and makes
managing the dependencies between the generic library, generic driver,
accelerated library and driver more complicated.

While at it, drop the logic to prefer the scalar code on short inputs.
Turning the NEON on and off is cheap these days, and one major use case
for ChaCha20 is ChaCha20-Poly1305, which is guaranteed to hit the scalar
path upon every invocation  (when doing the Poly1305 nonce generation)

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit c77da4867c)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Icbd6ed2261003f188eb17295966ee1a60cc63f0f
2020-10-24 14:43:34 +02:00
Ard Biesheuvel
9603ada811 UPSTREAM: crypto: arm64/chacha - use combined SIMD/ALU routine for more speed
To some degree, most known AArch64 micro-architectures appear to be
able to issue ALU instructions in parellel to SIMD instructions
without affecting the SIMD throughput. This means we can use the ALU
to process a fifth ChaCha block while the SIMD is processing four
blocks in parallel.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 2fe55987b2)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic04a32abdd8718f2ba46ed9581fcd30f84f8a2a8
2020-10-24 14:43:33 +02:00
Ard Biesheuvel
e9da63a53e UPSTREAM: crypto: arm64/chacha - optimize for arbitrary length inputs
Update the 4-way NEON ChaCha routine so it can handle input of any
length >64 bytes in its entirety, rather than having to call into
the 1-way routine and/or memcpy()s via temp buffers to handle the
tail of a ChaCha invocation that is not a multiple of 256 bytes.

On inputs that are a multiple of 256 bytes (and thus in tcrypt
benchmarks), performance drops by around 1% on Cortex-A57, while
performance for inputs drawn randomly from the range [64, 1024)
increases by around 30%.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit f2ca1cbd0f)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I92f642e9bb0bfde9650c88c41d212ac9e01f5285
2020-10-24 14:43:32 +02:00
Ard Biesheuvel
0a524ae566 UPSTREAM: crypto: x86/chacha - expose SIMD ChaCha routine as library function
Wire the existing x86 SIMD ChaCha code into the new ChaCha library
interface, so that users of the library interface will get the
accelerated version when available.

Given that calls into the library API will always go through the
routines in this module if it is enabled, switch to static keys
to select the optimal implementation available (which may be none
at all, in which case we defer to the generic implementation for
all invocations).

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 84e03fa39f)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I888d7c807c2c1227195a924895ec68c0377b1771
2020-10-24 14:43:31 +02:00
Ard Biesheuvel
c2674ee0d9 UPSTREAM: crypto: x86/chacha - depend on generic chacha library instead of crypto driver
In preparation of extending the x86 ChaCha driver to also expose the ChaCha
library interface, drop the dependency on the chacha_generic crypto driver
as a non-SIMD fallback, and depend on the generic ChaCha library directly.
This way, we only pull in the code we actually need, without registering
a set of ChaCha skciphers that we will never use.

Since turning the FPU on and off is cheap these days, simplify the SIMD
routine by dropping the per-page yield, which makes for a cleaner switch
to the library API as well. This also allows use to invoke the skcipher
walk routines in non-atomic mode.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 28e8d89b1c)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I404df79078972b7623230d3afceaf1f34bafc048
2020-10-24 14:43:30 +02:00
Ard Biesheuvel
2aa92dfe28 UPSTREAM: crypto: chacha - move existing library code into lib/crypto
Currently, our generic ChaCha implementation consists of a permute
function in lib/chacha.c that operates on the 64-byte ChaCha state
directly [and which is always included into the core kernel since it
is used by the /dev/random driver], and the crypto API plumbing to
expose it as a skcipher.

In order to support in-kernel users that need the ChaCha streamcipher
but have no need [or tolerance] for going through the abstractions of
the crypto API, let's expose the streamcipher bits via a library API
as well, in a way that permits the implementation to be superseded by
an architecture specific one if provided.

So move the streamcipher code into a separate module in lib/crypto,
and expose the init() and crypt() routines to users of the library.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 5fb8ef2580)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7fe321d1fcbaea1dc3f9f65dec74a6f40da2d489
2020-10-24 14:43:29 +02:00
Ard Biesheuvel
c099f33354 UPSTREAM: crypto: lib - tidy up lib/crypto Kconfig and Makefile
In preparation of introducing a set of crypto library interfaces, tidy
up the Makefile and split off the Kconfig symbols into a separate file.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[Jason: fold in parts of dc51f257]
(cherry picked from commit 746b2e024c)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6b877d863ba62199745d9d573dda35caee42cc79
2020-10-24 14:43:29 +02:00
Eric Biggers
f383e974de UPSTREAM: crypto: chacha - constify ctx and iv arguments
Constify the ctx and iv arguments to crypto_chacha_init() and the
various chacha*_stream_xor() functions.  This makes it clear that they
are not modified.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 860ab2e502)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6dcb54a381e0867979e13558217c0ca74cf5da9e
2020-10-24 14:43:27 +02:00
Tommi Hirvola
0227da548c UPSTREAM: crypto: x86/poly1305 - Clear key material from stack in SSE2 variant
1-block SSE2 variant of poly1305 stores variables s1..s4 containing key
material on the stack. This commit adds missing zeroing of the stack
memory. Benchmarks show negligible performance hit (tested on i7-3770).

Signed-off-by: Tommi Hirvola <tommi@hirvola.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 7748168c66)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iada3921d6f93921385c4f1229973511c163d5aab
2020-10-24 14:43:26 +02:00
Eric Biggers
2be733e152 UPSTREAM: crypto: xchacha20 - fix comments for test vectors
The kernel's ChaCha20 uses the RFC7539 convention of the nonce being 12
bytes rather than 8, so actually I only appended 12 random bytes (not
16) to its test vectors to form 24-byte nonces for the XChaCha20 test
vectors.  The other 4 bytes were just from zero-padding the stream
position to 8 bytes.  Fix the comments above the test vectors.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 282c14852d)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1037d3e19792fd7dbf0d9623fe530ce711a0e8ad
2020-10-24 11:31:51 +02:00
Eric Biggers
b6685e597a UPSTREAM: crypto: xchacha - add test vector from XChaCha20 draft RFC
There is a draft specification for XChaCha20 being worked on.  Add the
XChaCha20 test vector from the appendix so that we can be extra sure the
kernel's implementation is compatible.

I also recomputed the ciphertext with XChaCha12 and added it there too,
to keep the tests for XChaCha20 and XChaCha12 in sync.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 5569e8c074)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib086679152e2499b0d99ffb2caa90a9ea83b359d
2020-10-24 11:31:50 +02:00
Eric Biggers
182b5fabaa UPSTREAM: crypto: arm64/chacha - add XChaCha12 support
Now that the ARM64 NEON implementation of ChaCha20 and XChaCha20 has
been refactored to support varying the number of rounds, add support for
XChaCha12.  This is identical to XChaCha20 except for the number of
rounds, which is 12 instead of 20.  This can be used by Adiantum.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 19c11c97c3)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia513fe9470bafed81232a8111485fbb157c160a9
2020-10-24 11:31:49 +02:00
Eric Biggers
4b74fb2e42 UPSTREAM: crypto: arm64/chacha20 - refactor to allow varying number of rounds
In preparation for adding XChaCha12 support, rename/refactor the ARM64
NEON implementation of ChaCha20 to support different numbers of rounds.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 95a34b779e)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I50c3fb5fb6f9faf0a3964c254e85063987653b54
2020-10-24 11:31:48 +02:00
Eric Biggers
158dbe88f9 UPSTREAM: crypto: arm64/chacha20 - add XChaCha20 support
Add an XChaCha20 implementation that is hooked up to the ARM64 NEON
implementation of ChaCha20.  This can be used by Adiantum.

A NEON implementation of single-block HChaCha20 is also added so that
XChaCha20 can use it rather than the generic implementation.  This
required refactoring the ChaCha20 permutation into its own function.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit cc7cf991e9)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6e0b5b01cfc5c86005fad911f399ad1068d0a511
2020-10-24 11:31:47 +02:00
Eric Biggers
2087809999 UPSTREAM: crypto: x86/chacha - avoid sleeping under kernel_fpu_begin()
Passing atomic=true to skcipher_walk_virt() only makes the later
skcipher_walk_done() calls use atomic memory allocations, not
skcipher_walk_virt() itself.  Thus, we have to move it outside of the
preemption-disabled region (kernel_fpu_begin()/kernel_fpu_end()).

(skcipher_walk_virt() only allocates memory for certain layouts of the
input scatterlist, hence why I didn't notice this earlier...)

Reported-by: syzbot+9bf843c33f782d73ae7d@syzkaller.appspotmail.com
Fixes: 4af7826187 ("crypto: x86/chacha20 - add XChaCha20 support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit f9c9bdb513)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ief3c90cc32e472011eb06e0c031cce3e3ce560ea
2020-10-24 11:31:46 +02:00
Eric Biggers
f03b06c3b9 UPSTREAM: crypto: x86/chacha - yield the FPU occasionally
To improve responsiveness, yield the FPU (temporarily re-enabling
preemption) every 4 KiB encrypted/decrypted, rather than keeping
preemption disabled during the entire encryption/decryption operation.

Alternatively we could do this for every skcipher_walk step, but steps
may be small in some cases, and yielding the FPU is expensive on x86.

Suggested-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit a033aed5a8)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic6910b498e9cfec0deecf00ef787a9453fec4616
2020-10-24 11:31:45 +02:00
Eric Biggers
c4d866c3ae UPSTREAM: crypto: x86/chacha - add XChaCha12 support
Now that the x86_64 SIMD implementations of ChaCha20 and XChaCha20 have
been refactored to support varying the number of rounds, add support for
XChaCha12.  This is identical to XChaCha20 except for the number of
rounds, which is 12 instead of 20.  This can be used by Adiantum.

Reviewed-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 7a507d6225)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic146122b11b8bf5a0962a914aa8c8f631f3d047a
2020-10-24 11:31:44 +02:00
Eric Biggers
3fdd3e79ce UPSTREAM: crypto: x86/chacha20 - refactor to allow varying number of rounds
In preparation for adding XChaCha12 support, rename/refactor the x86_64
SIMD implementations of ChaCha20 to support different numbers of rounds.

Reviewed-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 8b65f34c58)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3627b6a888b774193d4d0d8547dcedb96355275e
2020-10-24 11:31:43 +02:00
Eric Biggers
75fe50acb0 UPSTREAM: crypto: x86/chacha20 - add XChaCha20 support
Add an XChaCha20 implementation that is hooked up to the x86_64 SIMD
implementations of ChaCha20.  This can be used by Adiantum.

An SSSE3 implementation of single-block HChaCha20 is also added so that
XChaCha20 can use it rather than the generic implementation.  This
required refactoring the ChaCha permutation into its own function.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 4af7826187)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7118fd7438c63813e07d345655a72c42ef9e89b2
2020-10-24 11:31:42 +02:00
Martin Willi
7e63c8cebc UPSTREAM: crypto: x86/chacha20 - Add a 4-block AVX-512VL variant
This version uses the same principle as the AVX2 version by scheduling the
operations for two block pairs in parallel. It benefits from the AVX-512VL
rotate instructions and the more efficient partial block handling using
"vmovdqu8", resulting in a speedup of the raw block function of ~20%.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 180def6c4a)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifb9882aa5adaf2d6fdc027a23d79f74b569a95f6
2020-10-24 11:31:42 +02:00
Martin Willi
69d4a8650d UPSTREAM: crypto: x86/chacha20 - Add a 2-block AVX-512VL variant
This version uses the same principle as the AVX2 version. It benefits
from the AVX-512VL rotate instructions and the more efficient partial
block handling using "vmovdqu8", resulting in a speedup of ~20%.

Unlike the AVX2 version, it is faster than the single block SSSE3 version
to process a single block. Hence we engage that function for (partial)
single block lengths as well.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 29a47b54e0)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia1e32ee145c434a23b6c342783f0d4fa52a636fd
2020-10-24 11:31:41 +02:00
Martin Willi
4eb8f27c9d UPSTREAM: crypto: x86/chacha20 - Add a 8-block AVX-512VL variant
This variant is similar to the AVX2 version, but benefits from the AVX-512
rotate instructions and the additional registers, so it can operate without
any data on the stack. It uses ymm registers only to avoid the massive core
throttling on Skylake-X platforms. Nontheless does it bring a ~30% speed
improvement compared to the AVX2 variant for random encryption lengths.

The AVX2 version uses "rep movsb" for partial block XORing via the stack.
With AVX-512, the new "vmovdqu8" can do this much more efficiently. The
associated "kmov" instructions to work with dynamic masks is not part of
the AVX-512VL instruction set, hence we depend on AVX-512BW as well. Given
that the major AVX-512VL architectures provide AVX-512BW and this extension
does not affect core clocking, this seems to be no problem at least for
now.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit cee7a36ecb)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idf319fa56656cbc859ad13a3452c5f84562ed0a1
2020-10-24 11:31:40 +02:00
Martin Willi
3720fd6b76 UPSTREAM: crypto: x86/chacha20 - Add a 4-block AVX2 variant
This variant builds upon the idea of the 2-block AVX2 variant that
shuffles words after each round. The shuffling has a rather high latency,
so the arithmetic units are not optimally used.

Given that we have plenty of registers in AVX, this version parallelizes
the 2-block variant to do four blocks. While the first two blocks are
shuffling, the CPU can do the XORing on the second two blocks and
vice-versa, which makes this version much faster than the SSSE3 variant
for four blocks. The latter is now mostly for systems that do not have
AVX2, but there it is the work-horse, so we keep it in place.

The partial XORing function trailer is very similar to the AVX2 2-block
variant. While it could be shared, that code segment is rather short;
profiling is also easier with the trailer integrated, so we keep it per
function.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 8a5a79d555)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic9b38bd55dbc11966ea2e660ec8b82a7cd863afe
2020-10-24 11:31:39 +02:00
Martin Willi
3928787aea UPSTREAM: crypto: x86/chacha20 - Add a 2-block AVX2 variant
This variant uses the same principle as the single block SSSE3 variant
by shuffling the state matrix after each round. With the wider AVX
registers, we can do two blocks in parallel, though.

This function can increase performance and efficiency significantly for
lengths that would otherwise require a 4-block function.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit a5dd97f862)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie0011f26fd59257ace84299f7a65a6c11da13b47
2020-10-24 11:31:38 +02:00
Martin Willi
26b2d7b5d4 UPSTREAM: crypto: x86/chacha20 - Use larger block functions more aggressively
Now that all block functions support partial lengths, engage the wider
block sizes more aggressively. This prevents using smaller block
functions multiple times, where the next larger block function would
have been faster.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 9b17608f15)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic093ecab923675f28bedc3989292ac70f53338d7
2020-10-24 11:31:38 +02:00
Martin Willi
5057483342 UPSTREAM: crypto: x86/chacha20 - Support partial lengths in 8-block AVX2 variant
Add a length argument to the eight block function for AVX2, so the
block function may XOR only a partial length of eight blocks.

To avoid unnecessary operations, we integrate XORing of the first four
blocks in the final lane interleaving; this also avoids some work in
the partial lengths path.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit c3b734dd32)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1d8281bd6ee15255f3c3f0e0d80ce662403b2d7f
2020-10-24 11:31:37 +02:00
Martin Willi
bb5aadc704 UPSTREAM: crypto: x86/chacha20 - Support partial lengths in 4-block SSSE3 variant
Add a length argument to the quad block function for SSSE3, so the
block function may XOR only a partial length of four blocks.

As we already have the stack set up, the partial XORing does not need
to. This gives a slightly different function trailer, so we keep that
separate from the 1-block function.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit db8e15a249)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5e0bda127b2e109e16b7ebe83d563be1bc875b9e
2020-10-24 11:31:36 +02:00
Martin Willi
bca1a50e80 UPSTREAM: crypto: x86/chacha20 - Support partial lengths in 1-block SSSE3 variant
Add a length argument to the single block function for SSSE3, so the
block function may XOR only a partial length of the full block. Given
that the setup code is rather cheap, the function does not process more
than one block; this allows us to keep the block function selection in
the C glue code.

The required branching does not negatively affect performance for full
block sizes. The partial XORing uses simple "rep movsb" to copy the
data before and after doing XOR in SSE. This is rather efficient on
modern processors; movsw can be slightly faster, but the additional
complexity is probably not worth it.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit e4e72063d3)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I259d9e04291bf8439259641982384334b8e132b0
2020-10-24 11:31:35 +02:00
Alistair Delva
176784c645 ANDROID: GKI: Enable CONFIG_USB_ANNOUNCE_NEW_DEVICES
Print out informational messages when a user connects a USB device.

Bug: 170928377
Signed-off-by: Alistair Delva <adelva@google.com>
Change-Id: I9838d5aff51fe9c6cfa3b5e7c46cbe84ee878090
(cherry picked from commit d75c93b6ee)
2020-10-22 16:41:19 +00:00
Abdulla Kamar
b00febaf47 ANDROID: GKI: Enable CONFIG_X86_X2APIC
Enable support for CONFIG_X86_X2APIC, if available. This speeds up
operation in a hypervisor, as APIC is handled via MSRs.

Bug: 171287650
Change-Id: Ie3cf57d07b8dde08ec609dfb031cfb0029218501
Signed-off-by: Abdulla Kamar <abdulla@google.com>
2020-10-20 16:03:53 +00:00