Commit Graph

973782 Commits

Author SHA1 Message Date
Eric Biggers
14d98d667c ANDROID: mmc: cqhci: set blk_keyslot_manager::features
This is needed due to "ANDROID: block: add hardware-wrapped key
support", which isn't upstream yet.

Bug: 160883801
Change-Id: I053a1de8d8c600f8eaba60cd338e07b778d017b3
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:10:57 +01:00
Eric Biggers
b489f0324e UPSTREAM: mmc: sdhci-msm: add Inline Crypto Engine support
Add support for Qualcomm Inline Crypto Engine (ICE) to sdhci-msm.

The standard-compliant parts, such as querying the crypto capabilities
and enabling crypto for individual MMC requests, are already handled by
cqhci-crypto.c, which itself is wired into the blk-crypto framework.
However, ICE requires vendor-specific init, enable, and resume logic,
and it requires that keys be programmed and evicted by vendor-specific
SMC calls.  Make the sdhci-msm driver handle these details.

This is heavily inspired by the similar changes made for UFS, since the
UFS and eMMC ICE instances are very similar.  See commit df4ec2fa7a
("scsi: ufs-qcom: Add Inline Crypto Engine support").

I tested this on a Sony Xperia 10, which uses the Snapdragon 630 SoC,
which has basic upstream support.  Mainly, I used android-xfstests
(https://github.com/tytso/xfstests-bld/blob/master/Documentation/android-xfstests.md)
to run the ext4 and f2fs encryption tests in a Debian chroot:

	android-xfstests -c ext4,f2fs -g encrypt -m inlinecrypt

These tests included tests which verify that the on-disk ciphertext is
identical to that produced by a software implementation.  I also
verified that ICE was actually being used.

Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20210126001456.382989-9-ebiggers@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

(cherry picked from commit c93767cf64)
Bug: 161256007
Change-Id: I4cdd2eca67419940181d55c690b7308ae7433096
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:10:57 +01:00
Eric Biggers
23957f08ca UPSTREAM: dt-bindings: mmc: sdhci-msm: add ICE registers and clock
Document the bindings for the registers and clock for the MMC instance
of the Inline Crypto Engine (ICE) on Snapdragon SoCs.  These bindings
are needed in order for sdhci-msm to support inline encryption.

Reviewed-by: Satya Tangirala <satyat@google.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20210126001456.382989-8-ebiggers@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

(cherry picked from commit 5cc046eb13)
Bug: 161256007
Change-Id: Id64d6d66046ea4316cc4caca1fb9553ed71b6722
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:10:57 +01:00
Eric Biggers
05a22bd588 UPSTREAM: firmware: qcom_scm: update comment for ICE-related functions
The SCM calls QCOM_SCM_ES_INVALIDATE_ICE_KEY and
QCOM_SCM_ES_CONFIG_SET_ICE_KEY are also needed for eMMC inline
encryption support, not just for UFS.  Update the comments accordingly.

Reviewed-by: Satya Tangirala <satyat@google.com>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20210125183810.198008-7-ebiggers@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

(cherry picked from commit 433611ea8d)
Bug: 161256007
Change-Id: Ifd479558a27393f398e5097f5a7dfbbbc38919db
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:10:57 +01:00
Eric Biggers
d43a24006d UPSTREAM: mmc: cqhci: add cqhci_host_ops::program_key
On Snapdragon SoCs, the Linux kernel isn't permitted to directly access
the standard CQHCI crypto configuration registers.  Instead, programming
and evicting keys must be done through vendor-specific SMC calls.

To support this hardware, add a ->program_key() method to
'struct cqhci_host_ops'.  This allows overriding the standard CQHCI
crypto key programming / eviction procedure.

This is inspired by the corresponding UFS crypto support, which uses
these same SMC calls.  See commit 1bc726e26e ("scsi: ufs: Add
program_key() variant op").

Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Satya Tangirala <satyat@google.com>
Reviewed-and-tested-by: Peng Zhou <peng.zhou@mediatek.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20210126001456.382989-6-ebiggers@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

(cherry picked from commit 0a0c866f37)
Bug: 161256007
Change-Id: Id0a988c769f1ec71c192a634635fdddd5fa66896
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:10:57 +01:00
Eric Biggers
6a3217913c UPSTREAM: mmc: cqhci: add support for inline encryption
Add support for eMMC inline encryption using the blk-crypto framework
(Documentation/block/inline-encryption.rst).

eMMC inline encryption support is specified by the upcoming JEDEC eMMC
v5.2 specification.  It is only specified for the CQ interface, not the
non-CQ interface.  Although the eMMC v5.2 specification hasn't been
officially released yet, the crypto support was already agreed on
several years ago, and it was already implemented by at least two major
hardware vendors.  Lots of hardware in the field already supports and
uses it, e.g. Snapdragon 630 to give one example.

eMMC inline encryption support is very similar to the UFS inline
encryption support which was standardized in the UFS v2.1 specification
and was already upstreamed.  The only major difference is that eMMC
limits data unit numbers to 32 bits, unlike UFS's 64 bits.

Like we did with UFS, make the crypto support opt-in by individual
drivers; don't enable it automatically whenever the hardware declares
crypto support.  This is necessary because in every case we've seen,
some extra vendor-specific logic is needed to use the crypto support.

Co-developed-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Satya Tangirala <satyat@google.com>
Reviewed-and-tested-by: Peng Zhou <peng.zhou@mediatek.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20210125183810.198008-5-ebiggers@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

(cherry picked from commit 1e80709bdb)
Bug: 161256007
Change-Id: I37e62b68e601a50cc4b577ca7386d5a2badf96e4
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:10:57 +01:00
Eric Biggers
a41cb40e4d UPSTREAM: mmc: cqhci: initialize upper 64 bits of 128-bit task descriptors
Move the task descriptor initialization into cqhci_prep_task_desc().
In addition, make it explicitly initialize all 128 bits of the task
descriptor if the host controller is using 128-bit task descriptors,
rather than relying on the implicit zeroing from dmam_alloc_coherent().

This is needed to prepare for CQHCI inline encryption support, which
requires 128-bit task descriptors and uses the upper 64 bits.

Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Satya Tangirala <satyat@google.com>
Reviewed-and-tested-by: Peng Zhou <peng.zhou@mediatek.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20210126001456.382989-4-ebiggers@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

(cherry picked from commit ee49d0321f)
Bug: 161256007
Change-Id: I86b162281c1565dabcf57526c3d7c3f43806be52
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:10:57 +01:00
Eric Biggers
879803f951 UPSTREAM: mmc: cqhci: rename cqhci.c to cqhci-core.c
Rename cqhci.c to cqhci-core.c so that another source file can be added
to the cqhci module without having to rename the module.

Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-and-tested-by: Peng Zhou <peng.zhou@mediatek.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20210126001456.382989-3-ebiggers@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

(cherry picked from commit 0653300224)
Bug: 161256007
Change-Id: I22e8eacaf1194272eb3e6124f8e2becf3cc535a9
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:10:56 +01:00
Eric Biggers
cf3293b65b UPSTREAM: mmc: core: Add basic support for inline encryption
In preparation for adding CQHCI crypto engine (inline encryption)
support, add the code required to make mmc_core and mmc_block aware of
inline encryption.  Specifically:

- Add a capability flag MMC_CAP2_CRYPTO to struct mmc_host.  Drivers
  will set this if the host and driver support inline encryption.

- Embed a blk_keyslot_manager in struct mmc_host.  Drivers will
  initialize this (as a device-managed resource) if the host and driver
  support inline encryption.  mmc_block registers this keyslot manager
  with the request_queue of any MMC card attached to the host.

- Make mmc_block copy the crypto keyslot and crypto data unit number
  from struct request to struct mmc_request, so that drivers will have
  access to them.

- If the MMC host is reset, reprogram all the keyslots to ensure that
  the software state stays in sync with the hardware state.

Co-developed-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Satya Tangirala <satyat@google.com>
Reviewed-and-tested-by: Peng Zhou <peng.zhou@mediatek.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20210126001456.382989-2-ebiggers@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

(cherry picked from commit 93f1c150cb)
Bug: 161256007
Change-Id: I30095dcaab9e5e929657190ac5744b4f3676a990
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:10:56 +01:00
Eric Biggers
d205aa02f6 UPSTREAM: scsi: ufs: use devm_blk_ksm_init()
Use the new resource-managed variant of blk_ksm_init() so that the UFS
driver doesn't have to manually call blk_ksm_destroy().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Satya Tangirala <satyat@google.com>
Link: https://lore.kernel.org/r/20210121082155.111333-3-ebiggers@kernel.org
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

(cherry picked from commit d76d9d7d10)
Bug: 161256007
Change-Id: If9c5ba188d8fa45becd8393822f104017a61637d
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:10:56 +01:00
Eric Biggers
25b3bc709c UPSTREAM: block/keyslot-manager: introduce devm_blk_ksm_init()
Add a resource-managed variant of blk_ksm_init() so that drivers don't
have to worry about calling blk_ksm_destroy().

Note that the implementation uses a custom devres action to call
blk_ksm_destroy() rather than switching the two allocations to be
directly devres-managed, e.g. with devm_kmalloc().  This is because we
need to keep zeroing the memory containing the keyslots when it is
freed, and also because we want to continue using kvmalloc() (and there
is no devm_kvmalloc()).

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Satya Tangirala <satyat@google.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20210121082155.111333-2-ebiggers@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

(cherry picked from commit 5851d3b042)
Bug: 161256007
Change-Id: I3cda7fc5f7509f8d967882a88ca30ddc6f6a0426
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:10:56 +01:00
Eric Biggers
ffdad4351e ANDROID: gki_defconfig: enable BLAKE2b support
Bug: 178411248
Change-Id: Iec497954d29adcf7193da9ca4b27d61eac7615d9
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:20 +01:00
Eric Biggers
15023db7dc UPSTREAM: crypto: arm/blake2b - add NEON-accelerated BLAKE2b
Add a NEON-accelerated implementation of BLAKE2b.

On Cortex-A7 (which these days is the most common ARM processor that
doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as
SHA-256, and slightly faster than SHA-1.  It is also almost three times
as fast as the generic implementation of BLAKE2b:

	Algorithm            Cycles per byte (on 4096-byte messages)
	===================  =======================================
	blake2b-256-neon     14.0
	sha1-neon            16.3
	blake2s-256-arm      18.8
	sha1-asm             20.8
	blake2s-256-generic  26.0
	sha256-neon	     28.9
	sha256-asm	     32.0
	blake2b-256-generic  38.9

This implementation isn't directly based on any other implementation,
but it borrows some ideas from previous NEON code I've written as well
as from chacha-neon-core.S.  At least on Cortex-A7, it is faster than
the other NEON implementations of BLAKE2b I'm aware of (the
implementation in the BLAKE2 official repository using intrinsics, and
Andrew Moon's implementation which can be found in SUPERCOP).  It does
only one block at a time, so it performs well on short messages too.

NEON-accelerated BLAKE2b is useful because there is interest in using
BLAKE2b-256 for dm-verity on low-end Android devices (specifically,
devices that lack the ARMv8 Crypto Extensions) to replace SHA-1.  On
these devices, the performance cost of upgrading to SHA-256 may be
unacceptable, whereas BLAKE2b-256 would actually improve performance.

Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which
is intended for 32-bit platforms), on 32-bit ARM processors with NEON,
BLAKE2b is actually faster than BLAKE2s.  This is because NEON supports
64-bit operations, and because BLAKE2s's block size is too small for
NEON to be helpful for it.  The best I've been able to do with BLAKE2s
on Cortex-A7 is 18.8 cpb with an optimized scalar implementation.

(I didn't try BLAKE2sp and BLAKE3, which in theory would be faster, but
they're more complex as they require running multiple hashes at once.
Note that BLAKE2b already uses all the NEON bandwidth on the Cortex-A7,
so I expect that any speedup from BLAKE2sp or BLAKE3 would come only
from the smaller number of rounds, not from the extra parallelism.)

For now this BLAKE2b implementation is only wired up to the shash API,
since there is no library API for BLAKE2b yet.  However, I've tried to
keep things consistent with BLAKE2s, e.g. by defining
blake2b_compress_arch() which is analogous to blake2s_compress_arch()
and could be exported for use by the library API later if needed.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit 1862eb0073)
Bug: 178411248
Change-Id: I01557f0a63db0eb21778d0cc7b582ad89898d44a
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:20 +01:00
Eric Biggers
53f83e3aae UPSTREAM: crypto: blake2b - update file comment
The file comment for blake2b_generic.c makes it sound like it's the
reference implementation of BLAKE2b with only minor changes.  But it's
actually been changed a lot.  Update the comment to make this clearer.

Reviewed-by: David Sterba <dsterba@suse.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit 0cdc438e6e)
Bug: 178411248
Change-Id: I8b58cd0e2892a873866ead52fa4dba3edabdeeb3
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:20 +01:00
Eric Biggers
8bde3e29cb UPSTREAM: crypto: blake2b - sync with blake2s implementation
Sync the BLAKE2b code with the BLAKE2s code as much as possible:

- Move a lot of code into new headers <crypto/blake2b.h> and
  <crypto/internal/blake2b.h>, and adjust it to be like the
  corresponding BLAKE2s code, i.e. like <crypto/blake2s.h> and
  <crypto/internal/blake2s.h>.

- Rename constants, e.g. BLAKE2B_*_DIGEST_SIZE => BLAKE2B_*_HASH_SIZE.

- Use a macro BLAKE2B_ALG() to define the shash_alg structs.

- Export blake2b_compress_generic() for use as a fallback.

This makes it much easier to add optimized implementations of BLAKE2b,
as optimized implementations can use the helper functions
crypto_blake2b_{setkey,init,update,final}() and
blake2b_compress_generic().  The ARM implementation will use these.

But this change is also helpful because it eliminates unnecessary
differences between the BLAKE2b and BLAKE2s code, so that the same
improvements can easily be made to both.  (The two algorithms are
basically identical, except for the word size and constants.)  It also
makes it straightforward to add a library API for BLAKE2b in the future
if/when it's needed.

This change does make the BLAKE2b code slightly more complicated than it
needs to be, as it doesn't actually provide a library API yet.  For
example, __blake2b_update() doesn't really need to exist yet; it could
just be inlined into crypto_blake2b_update().  But I believe this is
outweighed by the benefits of keeping the code in sync.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit 28dcca4cc0)
Bug: 178411248
Change-Id: I755c64fefe1209b54d1417d91e8a4c207382a620
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:20 +01:00
Eric Biggers
f1cc7b47d8 UPSTREAM: wireguard: Kconfig: select CRYPTO_BLAKE2S_ARM
When available, select the new implementation of BLAKE2s for 32-bit ARM.
This is faster than the generic C implementation.

Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit a64bfe7ad4)
Bug: 152722841
Bug: 178411248
Change-Id: I27b0d049b689c8ef5d44e01151aa8c78af1bdd0e
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:20 +01:00
Eric Biggers
8cc1fdb8ac UPSTREAM: crypto: arm/blake2s - add ARM scalar optimized BLAKE2s
Add an ARM scalar optimized implementation of BLAKE2s.

NEON isn't very useful for BLAKE2s because the BLAKE2s block size is too
small for NEON to help.  Each NEON instruction would depend on the
previous one, resulting in poor performance.

With scalar instructions, on the other hand, we can take advantage of
ARM's "free" rotations (like I did in chacha-scalar-core.S) to get an
implementation get runs much faster than the C implementation.

Performance results on Cortex-A7 in cycles per byte using the shash API:

	4096-byte messages:
		blake2s-256-arm:     18.8
		blake2s-256-generic: 26.0

	500-byte messages:
		blake2s-256-arm:     20.3
		blake2s-256-generic: 27.9

	100-byte messages:
		blake2s-256-arm:     29.7
		blake2s-256-generic: 39.2

	32-byte messages:
		blake2s-256-arm:     50.6
		blake2s-256-generic: 66.2

Except on very short messages, this is still slower than the NEON
implementation of BLAKE2b which I've written; that is 14.0, 16.4, 25.8,
and 76.1 cpb on 4096, 500, 100, and 32-byte messages, respectively.
However, optimized BLAKE2s is useful for cases where BLAKE2s is used
instead of BLAKE2b, such as WireGuard.

This new implementation is added in the form of a new module
blake2s-arm.ko, which is analogous to blake2s-x86_64.ko in that it
provides blake2s_compress_arch() for use by the library API as well as
optionally register the algorithms with the shash API.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit 5172d322d3)
Bug: 152722841
Bug: 178411248
Change-Id: I0ba7a0ea53518d512ccbe54a56970d6bbe2a32e0
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:19 +01:00
Eric Biggers
290d0e42bc UPSTREAM: crypto: blake2s - include <linux/bug.h> instead of <asm/bug.h>
Address the following checkpatch warning:

	WARNING: Use #include <linux/bug.h> instead of <asm/bug.h>

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit bbda6e0f13)
Bug: 152722841
Bug: 178411248
Change-Id: Ib12103873c97387ba59ff897c44a18e4f1e6da3d
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:19 +01:00
Eric Biggers
f006426a4f UPSTREAM: crypto: blake2s - adjust include guard naming
Use the full path in the include guards for the BLAKE2s headers to avoid
ambiguity and to match the convention for most files in include/crypto/.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit 8786841bc2)
Bug: 152722841
Bug: 178411248
Change-Id: Iea57edbe14c9083ce3d7a77b2c3bd63cd6650e56
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:19 +01:00
Eric Biggers
c42fac6d71 UPSTREAM: crypto: blake2s - add comment for blake2s_state fields
The first three fields of 'struct blake2s_state' are used in assembly
code, which isn't immediately obvious, so add a comment to this effect.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit 7d87131fad)
Bug: 152722841
Bug: 178411248
Change-Id: Iab85ddd350a2db917b247ba8f6e224eba905b747
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:19 +01:00
Eric Biggers
46c4d1ccb0 UPSTREAM: crypto: blake2s - optimize blake2s initialization
If no key was provided, then don't waste time initializing the block
buffer, as its initial contents won't be used.

Also, make crypto_blake2s_init() and blake2s() call a single internal
function __blake2s_init() which treats the key as optional, rather than
conditionally calling blake2s_init() or blake2s_init_key().  This
reduces the compiled code size, as previously both blake2s_init() and
blake2s_init_key() were being inlined into these two callers, except
when the key size passed to blake2s() was a compile-time constant.

These optimizations aren't that significant for BLAKE2s.  However, the
equivalent optimizations will be more significant for BLAKE2b, as
everything is twice as big in BLAKE2b.  And it's good to keep things
consistent rather than making optimizations for BLAKE2b but not BLAKE2s.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit 42ad8cf821)
Bug: 152722841
Bug: 178411248
Change-Id: Id057e0c8a5f0e432f227dc3deaa8997a7cbc0577
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:19 +01:00
Eric Biggers
669d2197fc UPSTREAM: crypto: blake2s - share the "shash" API boilerplate code
Add helper functions for shash implementations of BLAKE2s to
include/crypto/internal/blake2s.h, taking advantage of
__blake2s_update() and __blake2s_final() that were added by the previous
patch to share more code between the library and shash implementations.

crypto_blake2s_setkey() and crypto_blake2s_init() are usable as
shash_alg::setkey and shash_alg::init directly, while
crypto_blake2s_update() and crypto_blake2s_final() take an extra
'blake2s_compress_t' function pointer parameter.  This allows the
implementation of the compression function to be overridden, which is
the only part that optimized implementations really care about.

The new functions are inline functions (similar to those in sha1_base.h,
sha256_base.h, and sm3_base.h) because this avoids needing to add a new
module blake2s_helpers.ko, they aren't *too* long, and this avoids
indirect calls which are expensive these days.  Note that they can't go
in blake2s_generic.ko, as that would require selecting CRYPTO_BLAKE2S
from CRYPTO_BLAKE2S_X86, which would cause a recursive dependency.

Finally, use these new helper functions in the x86 implementation of
BLAKE2s.  (This part should be a separate patch, but unfortunately the
x86 implementation used the exact same function names like
"crypto_blake2s_update()", so it had to be updated at the same time.)

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit 8c4a93a127)
Bug: 152722841
Bug: 178411248
Change-Id: I9d23b1897cee81fb1704b6eea14d3dc9b1c80fe0
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:19 +01:00
Eric Biggers
f411897ead UPSTREAM: crypto: blake2s - move update and final logic to internal/blake2s.h
Move most of blake2s_update() and blake2s_final() into new inline
functions __blake2s_update() and __blake2s_final() in
include/crypto/internal/blake2s.h so that this logic can be shared by
the shash helper functions.  This will avoid duplicating this logic
between the library and shash implementations.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit 057edc9c8b)
Bug: 152722841
Bug: 178411248
Change-Id: Idedf8147914839d5280e5d8eec9013264fbb45f9
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:19 +01:00
Eric Biggers
27aee704d0 UPSTREAM: crypto: blake2s - remove unneeded includes
It doesn't make sense for the generic implementation of BLAKE2s to
include <crypto/internal/simd.h> and <linux/jump_label.h>, as these are
things that would only be useful in an architecture-specific
implementation.  Remove these unnecessary includes.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit df412e7efd)
Bug: 152722841
Bug: 178411248
Change-Id: I8bccb9506d235c2b184b9899ee375fdc12d6b10f
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:19 +01:00
Eric Biggers
6c2952c3cd UPSTREAM: crypto: x86/blake2s - define shash_alg structs using macros
The shash_alg structs for the four variants of BLAKE2s are identical
except for the algorithm name, driver name, and digest size.  So, avoid
code duplication by using a macro to define these structs.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit 1aa90f4cf0)
Bug: 152722841
Bug: 178411248
Change-Id: Ibba710056182a609defca0a53133387153f0f63d
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:19 +01:00
Eric Biggers
2ccec4b7bf UPSTREAM: crypto: blake2s - define shash_alg structs using macros
The shash_alg structs for the four variants of BLAKE2s are identical
except for the algorithm name, driver name, and digest size.  So, avoid
code duplication by using a macro to define these structs.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit 0d396058f9)
Bug: 152722841
Bug: 178411248
Change-Id: I32b3a6c035b18f063ad0e03c43171091ca1490a8
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:19 +01:00
Herbert Xu
d0a59f9532 UPSTREAM: crypto: lib/blake2s - Move selftest prototype into header file
This patch fixes a missing prototype warning on blake2s_selftest.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit ce0d5d63e8)
Bug: 152722841
Change-Id: Ieb19048d60f6c9dc36cefda8b09b3d5c95e9118e
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-23 08:06:19 +01:00
Suren Baghdasaryan
aa8f690d71 ANDROID: vmscan: Fix sparse warnings for kswapd_threads
Fix the following warning by declaring kswapd_threads as static:
mm/vmscan.c:175:5: sparse: symbol 'kswapd_threads' was not declared.

Fixes: 0d61a651e4 ("ANDROID: vmscan: Support multiple kswapd threads per node")
Bug: 171351667
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I6ccb1e0e0f597fda27ed6893d254a77341139084
2021-02-22 18:13:47 -08:00
Suren Baghdasaryan
6a0a705473 ANDROID: mm: hide get_each_object_track declaration when CONFIG_SLUB=n
struct track and enum track_item are undefined when CONFIG_SLUB=n.
get_each_object_track which uses these types should not be compiled
in this configuration. Add missing ifdefs to prevent compilation errors.

Fixes: ee8d2c7884 ("ANDROID: mm: add get_each_object_track function")
Bug: 177377077
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I9ad15e6ef1572ba8f69b746ab837f051614c017c
2021-02-22 17:40:01 -08:00
Vlastimil Babka
10aaa1d5c7 FROMGIT: mm, compaction: make fast_isolate_freepages() stay within zone
Compaction always operates on pages from a single given zone when
isolating both pages to migrate and freepages.  Pageblock boundaries are
intersected with zone boundaries to be safe in case zone starts or ends in
the middle of pageblock.  The use of pageblock_pfn_to_page() protects
against non-contiguous pageblocks.

The functions fast_isolate_freepages() and fast_isolate_around() don't
currently protect the fast freepage isolation thoroughly enough against
these corner cases, and can result in freepage isolation operate outside
of zone boundaries:

- in fast_isolate_freepages() if we get a pfn from the first pageblock
  of a zone that starts in the middle of that pageblock, 'highest' can be
  a pfn outside of the zone.  If we fail to isolate anything in this
  function, we may then call fast_isolate_around() on a pfn outside of the
  zone and there effectively do a set_pageblock_skip(page_to_pfn(highest))
  which may currently hit a VM_BUG_ON() in some configurations

- fast_isolate_around() checks only the zone end boundary and not
  beginning, nor that the pageblock is contiguous (with
  pageblock_pfn_to_page()) so it's possible that we end up calling
  isolate_freepages_block() on a range of pfn's from two different zones
  and end up e.g.  isolating freepages under the wrong zone's lock.

This patch should fix the above issues.

Link: https://lkml.kernel.org/r/20210217173300.6394-1-vbabka@suse.cz
Fixes: 5a811889de ("mm, compaction: use free lists to quickly locate a migration target")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Bug: 180548389
Change-Id: I97c31374640255219de3b852097501e3d935c8ce
(cherry picked from https://lore.kernel.org/mm-commits/20210217191436.shTJB%25akpm@linux-foundation.org/)
2021-02-22 22:51:57 +00:00
Quentin Perret
d498075b65 ANDROID: sched: time: Export symbols needed for schedutil module
Export symbols needed to allow building a schedutil-based vendor module
with GKI.

This is a small price to pay to give vendors the flexibility they need,
and avoids littering cpufreq_schedutil.c with many vendor hooks.

Bug: 170511085
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I8ff8bdb32df5d47124236819efba881c1a2a538d
(cherry picked from commit 34cd6916744b8b2d2107d2d5f10cbacb181e4f6c)
(cherry picked from commit 7587bc9dcf)
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-02-22 11:58:23 -08:00
Charan Teja Reddy
0d61a651e4 ANDROID: vmscan: Support multiple kswapd threads per node
Page replacement is handled in the Linux Kernel in one of two ways:

1) Asynchronously via kswapd
2) Synchronously, via direct reclaim

At page allocation time the allocating task is immediately given a page
from the zone free list allowing it to go right back to work doing
whatever it was doing; Probably directly or indirectly executing business
logic.

Just prior to satisfying the allocation, free pages is checked to see if
it has reached the zone low watermark and if so, kswapd is awakened.
Kswapd will start scanning pages looking for inactive pages to evict to
make room for new page allocations. The work of kswapd allows tasks to
continue allocating memory from their respective zone free list without
incurring any delay.

When the demand for free pages exceeds the rate that kswapd tasks can
supply them, page allocation works differently. Once the allocating task
finds that the number of free pages is at or below the zone min watermark,
the task will no longer pull pages from the free list. Instead, the task
will run the same CPU-bound routines as kswapd to satisfy its own
allocation by scanning and evicting pages. This is called a direct reclaim.

The time spent performing a direct reclaim can be substantial, often
taking tens to hundreds of milliseconds for small order0 allocations to
half a second or more for order9 huge-page allocations. In fact, kswapd is
not actually required on a linux system. It exists for the sole purpose of
optimizing performance by preventing direct reclaims.

When memory shortfall is sufficient to trigger direct reclaims, they can
occur in any task that is running on the system. A single aggressive
memory allocating task can set the stage for collateral damage to occur in
small tasks that rarely allocate additional memory. Consider the impact of
injecting an additional 100ms of latency when nscd allocates memory to
facilitate caching of a DNS query.

The presence of direct reclaims 10 years ago was a fairly reliable
indicator that too much was being asked of a Linux system. Kswapd was
likely wasting time scanning pages that were ineligible for eviction.
Adding RAM or reducing the working set size would usually make the problem
go away. Since then hardware has evolved to bring a new struggle for
kswapd. Storage speeds have increased by orders of magnitude while CPU
clock speeds stayed the same or even slowed down in exchange for more
cores per package. This presents a throughput problem for a single
threaded kswapd that will get worse with each generation of new hardware.

Test Details

NOTE: The tests below were run with shadow entries disabled. See the
associated patch and cover letter for details

The tests below were designed with the assumption that a kswapd bottleneck
is best demonstrated using filesystem reads. This way, the inactive list
will be full of clean pages, simplifying the analysis and allowing kswapd
to achieve the highest possible steal rate. Maximum steal rates for kswapd
are likely to be the same or lower for any other mix of page types on the
system.

Tests were run on a 2U Oracle X7-2L with 52 Intel Xeon Skylake 2GHz cores,
756GB of RAM and 8 x 3.6 TB NVMe Solid State Disk drives. Each drive has
an XFS file system mounted separately as /d0 through /d7. SSD drives
require multiple concurrent streams to show their potential, so I created
eleven 250GB zero-filled files on each drive so that I could test with
parallel reads.

The test script runs in multiple stages. At each stage, the number of dd
tasks run concurrently is increased by 2. I did not include all of the
test output for brevity.

During each stage dd tasks are launched to read from each drive in a round
robin fashion until the specified number of tasks for the stage has been
reached. Then iostat, vmstat and top are started in the background with 10
second intervals. After five minutes, all of the dd tasks are killed and
the iostat, vmstat and top output is parsed in order to report the
following:

CPU consumption
- sy - aggregate kernel mode CPU consumption from vmstat output. The value
       doesn't tend to fluctuate much so I just grab the highest value.
       Each sample is averaged over 10 seconds
- dd_cpu - for all of the dd tasks averaged across the top samples since
           there is a lot of variation.

Throughput
- in Kbytes
- Command is iostat -x -d 10 -g total

This first test performs reads using O_DIRECT in order to show the maximum
throughput that can be obtained using these drives. It also demonstrates
how rapidly throughput scales as the number of dd tasks are increased.

The dd command for this test looks like this:

Command Used: dd iflag=direct if=/d${i}/$n of=/dev/null bs=4M

Test #1: Direct IO
dd sy dd_cpu throughput
6  0  2.33   14726026.40
10 1  2.95   19954974.80
16 1  2.63   24419689.30
22 1  2.63   25430303.20
28 1  2.91   26026513.20
34 1  2.53   26178618.00
40 1  2.18   26239229.20
46 1  1.91   26250550.40
52 1  1.69   26251845.60
58 1  1.54   26253205.60
64 1  1.43   26253780.80
70 1  1.31   26254154.80
76 1  1.21   26253660.80
82 1  1.12   26254214.80
88 1  1.07   26253770.00
90 1  1.04   26252406.40

Throughput was close to peak with only 22 dd tasks. Very little system CPU
was consumed as expected as the drives DMA directly into the user address
space when using direct IO.

In this next test, the iflag=direct option is removed and we only run the
test until the pgscan_kswapd from /proc/vmstat starts to increment. At
that point metrics are parsed and reported and the pagecache contents are
dropped prior to the next test. Lather, rinse, repeat.

Test #2: standard file system IO, no page replacement
dd sy dd_cpu throughput
6  2  28.78  5134316.40
10 3  31.40  8051218.40
16 5  34.73  11438106.80
22 7  33.65  14140596.40
28 8  31.24  16393455.20
34 10 29.88  18219463.60
40 11 28.33  19644159.60
46 11 25.05  20802497.60
52 13 26.92  22092370.00
58 13 23.29  22884881.20
64 14 23.12  23452248.80
70 15 22.40  23916468.00
76 16 22.06  24328737.20
82 17 20.97  24718693.20
88 16 18.57  25149404.40
90 16 18.31  25245565.60

Each read has to pause after the buffer in kernel space is populated while
those pages are added to the pagecache and copied into the user address
space. For this reason, more parallel streams are required to achieve peak
throughput. The copy operation consumes substantially more CPU than direct
IO as expected.

The next test measures throughput after kswapd starts running. This is the
same test only we wait for kswapd to wake up before we start collecting
metrics. The script actually keeps track of a few things that were not
mentioned earlier. It tracks direct reclaims and page scans by watching
the metrics in /proc/vmstat. CPU consumption for kswapd is tracked the
same way it is tracked for dd.

Since the test is 100% reads, you can assume that the page steal rate for
kswapd and direct reclaims is almost identical to the scan rate.

Test #3: 1 kswapd thread per node
dd sy dd_cpu kswapd0 kswapd1 throughput  dr    pgscan_kswapd pgscan_direct
10 4  26.07  28.56   27.03   7355924.40  0     459316976     0
16 7  34.94  69.33   69.66   10867895.20 0     872661643     0
22 10 36.03  93.99   99.33   13130613.60 489   1037654473    11268334
28 10 30.34  95.90   98.60   14601509.60 671   1182591373    15429142
34 14 34.77  97.50   99.23   16468012.00 10850 1069005644    249839515
40 17 36.32  91.49   97.11   17335987.60 18903 975417728     434467710
46 19 38.40  90.54   91.61   17705394.40 25369 855737040     582427973
52 22 40.88  83.97   83.70   17607680.40 31250 709532935     724282458
58 25 40.89  82.19   80.14   17976905.60 35060 657796473     804117540
64 28 41.77  73.49   75.20   18001910.00 39073 561813658     895289337
70 33 45.51  63.78   64.39   17061897.20 44523 379465571     1020726436
76 36 46.95  57.96   60.32   16964459.60 47717 291299464     1093172384
82 39 47.16  55.43   56.16   16949956.00 49479 247071062     1134163008
88 42 47.41  53.75   47.62   16930911.20 51521 195449924     1180442208
90 43 47.18  51.40   50.59   16864428.00 51618 190758156     1183203901

In the previous test where kswapd was not involved, the system-wide kernel
mode CPU consumption with 90 dd tasks was 16%. In this test CPU consumption
with 90 tasks is at 43%. With 52 cores, and two kswapd tasks (one per NUMA
node), kswapd can only be responsible for a little over 4% of the increase.
The rest is likely caused by 51,618 direct reclaims that scanned 1.2
billion pages over the five minute time period of the test.

Same test, more kswapd tasks:

Test #4: 4 kswapd threads per node
dd sy dd_cpu kswapd0 kswapd1 throughput  dr    pgscan_kswapd pgscan_direct
10 5  27.09  16.65   14.17   7842605.60  0     459105291     0
16 10 37.12  26.02   24.85   11352920.40 15    920527796     358515
22 11 36.94  37.13   35.82   13771869.60 0     1132169011     0
28 13 35.23  48.43   46.86   16089746.00 0     1312902070     0
34 15 33.37  53.02   55.69   18314856.40 0     1476169080     0
40 19 35.90  69.60   64.41   19836126.80 0     1629999149     0
46 22 36.82  88.55   57.20   20740216.40 0     1708478106     0
52 24 34.38  93.76   68.34   21758352.00 0     1794055559     0
58 24 30.51  79.20   82.33   22735594.00 0     1872794397     0
64 26 30.21  97.12   76.73   23302203.60 176   1916593721     4206821
70 33 32.92  92.91   92.87   23776588.00 3575  1817685086     85574159
76 37 31.62  91.20   89.83   24308196.80 4752  1812262569     113981763
82 29 25.53  93.23   92.33   24802791.20 306   2032093122     7350704
88 43 37.12  76.18   77.01   25145694.40 20310 1253204719     487048202
90 42 38.56  73.90   74.57   22516787.60 22774 1193637495     545463615

By increasing the number of kswapd threads, throughput increased by ~50%
while kernel mode CPU utilization decreased or stayed the same, likely due
to a decrease in the number of parallel tasks at any given time doing page
replacement.

Signed-off-by: Buddy Lumpkin <buddy.lumpkin@oracle.com>
Bug: 171351667
Link: https://lore.kernel.org/lkml/1522661062-39745-1-git-send-email-buddy.lumpkin@oracle.com
[charante@codeaurora.org]: Changes made to select number of kswapds through uapi
Change-Id: I8425cab7f40cbeaf65af0ea118c1a9ac7da0930e
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
2021-02-22 19:47:44 +00:00
Vijayanand Jitta
ee8d2c7884 ANDROID: mm: add get_each_object_track function
Add and export get_each_object_track which helps in
looping through all the slab objects of a page
and gets the track structure of each object, also
make track_item and track structure public, these
will be used by the minidump module to get slab
owner info.

Bug: 177377077
Change-Id: Ic207fd26a122a5f1b014f4929760d064f7af0225
Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org>
2021-02-22 19:26:26 +00:00
Chiawei Wang
db158b4ae0 ANDROID: mm: Add vendor hook in pagecache_get_page()
Add a vendor hook for pagecache hit/miss and other
vendor specific functions.

Bug: 174088128
Bug: 172987241
Signed-off-by: Chiawei Wang <chiaweiwang@google.com>
Change-Id: Ie9f14a69a86b8ed81de766e44e30f2eba1d9bd84
2021-02-19 17:59:52 +00:00
Chiawei Wang
369de37804 ANDROID: mm: Add vendor hook in rmqueue()
Add a vendor hook for costly order page counting
and other vendor specific functions.

Bug: 174521902
Bug: 172987241
Signed-off-by: Chiawei Wang <chiaweiwang@google.com>
Change-Id: I89206727a462548cc3500b695d85c83ff003eec7
2021-02-19 17:59:30 +00:00
Alistair Delva
ea15862d66 ANDROID: GKI: Build in VIRTIO_FS
This allows us to keep filesystem symbols non-exported, without bringing
much into GKI.

Bug: 180710829
Change-Id: If3a4a87448be94826390fa23b0e241e2239d9550
Signed-off-by: Alistair Delva <adelva@google.com>
2021-02-19 15:34:47 +00:00
Eric Biggers
537d3bb974 ANDROID: dm: sync inline crypto support with patches going upstream
Replace the following patches with upstream versions
(well, almost upstream; as of 2021-02-12 they are queued for 5.12 at
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=for-next):

	ANDROID-dm-add-support-for-passing-through-inline-crypto-support.patch
	ANDROID-dm-enable-may_passthrough_inline_crypto-on-some-targets.patch
	ANDROID-block-Introduce-passthrough-keyslot-manager.patch

Also, resolve conflicts with the following non-upstream patches for
hardware-wrapped key support.  Notably, we need to handle the field
blk_keyslot_manager::features in a few places:

	ANDROID-block-add-hardware-wrapped-key-support.patch
	ANDROID-dm-add-support-for-passing-through-derive_raw_secret.patch

Finally, update non-upstream device-mapper targets (dm-bow and
dm-default-key) to use the new way of specifying inline crypto
passthrough support (DM_TARGET_PASSES_CRYPTO) rather than the old way
(may_passthrough_inline_crypto).  These changes should be folded into:

	ANDROID-dm-bow-Add-dm-bow-feature.patch
	ANDROID-dm-add-dm-default-key-target-for-metadata-encryption.patch

Test: tested on db845c; verified that inline crypto support gets passed
      through over dm-linear.
Bug: 162257830
Change-Id: I5e3dea1aa09fc1215c90857b5b51d9e3720ef7db
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-02-19 10:48:51 +00:00
Pavankumar Kondeti
a56f081c5b ANDROID: sched: Add restricted vendor hooks in CFS scheduler
Add restricted vendor hooks in CFS scheduler class to allow
customizations in vendor modules.

Bug: 180668820
Change-Id: I69bd90e11220d7607b075a3aa687059deaa60439
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
2021-02-19 12:22:57 +05:30
John Stultz
cc048ffd7e ANDROID: drm: kirin: Remove dead code that was causing build failures
Somehow in forward porting commit 34ebaf13be ("ANDROID: drm:
kirin: Introduce kirin960"), which moves a bunch of code around
to be shared, a chunk of code removal was dropped (likely due
to some minor collision).

This caused build failures w allmodconfig that I missed.

This patch removes the dead code causing the trouble.

Bug: 180655267
Fixes: 34ebaf13be ("ANDROID: drm: kirin: Introduce kirin960")
Reported-by: Todd Kjos <tkjos@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Change-Id: I79a727115756f213502de0a4406cf3384c393ead
2021-02-19 01:32:16 +00:00
John Stultz
8adfa9950c ANDROID: adv7511: Add poweron delay to allow for EDID probing to work
For some reason on HiKey960 the edid probing doesn't work
properly unless we delay a bit at poweron.

Bug: 146450171
Signed-off-by: John Stultz <john.stultz@linaro.org>
Change-Id: Id8bcf9158d3060e065a6a9ec06bbe0323b73dc8e
2021-02-19 00:02:01 +00:00
John Stultz
795028f7e7 ANDROID: Add hikey960 build infrastructure file
Adds build.config.hikey960 and android/abi_gki_aarch64_hikey960 files

Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: Ice445cf09780b16059e5e4ef624ac30e300c6500
2021-02-18 23:59:34 +00:00
John Stultz
baae9497e8 ANDROID: Add hikey960 GKI config fragment
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: Ie6fe68f692ed57fd28cfca9f339d6392c33e9ff8
2021-02-18 23:52:29 +00:00
Youlin Wang
538e9699eb ANDROID: arm64: dts: hi3660-hikey960: Add i2s & sound device
Signed-off-by: Youlin Wang <wangyoulin1@hisilicon.com>
Signed-off-by: Tanglei Han <hantanglei@huawei.com>
Signed-off-by: Guangke Ji <jiguangke@huawei.com>
Signed-off-by: Feng Chen <puck.chen@hisilicon.com>
Signed-off-by: Kaihua Zhong <zhongkaihua@huawei.com>
Signed-off-by: Jun Chen <chenjun14@huawei.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Yiping Xu <xuyiping@hisilicon.com>
Signed-off-by: Pengcheng Li <lipengcheng8@huawei.com>
Cc: Wei Xu <xuwei5@hisilicon.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: I746999c06398779a5380c043a1fc7349f373694c
2021-02-18 23:49:43 +00:00
Youlin Wang
8da70a67f3 ANDROID: ASoC: add hikey960-i2s DT bindings
Adds DT bindings documentation for
the hikey960-i2s driver and audio card device.

Signed-off-by: Youlin Wang <wangyoulin1@hisilicon.com>
Signed-off-by: Tanglei Han <hantanglei@huawei.com>
Signed-off-by: Guangke Ji <jiguangke@huawei.com>
Signed-off-by: Feng Chen <puck.chen@hisilicon.com>
Signed-off-by: Kaihua Zhong <zhongkaihua@huawei.com>
Signed-off-by: Jun Chen <chenjun14@huawei.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Yiping Xu <xuyiping@hisilicon.com>
Signed-off-by: Pengcheng Li <lipengcheng8@huawei.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: If9f8ce65c78e1e1e6b2e0439b590e7558763d631
2021-02-18 23:44:26 +00:00
Youlin Wang
b4b11198ed ANDROID: sound: Add hikey960 i2s audio driver
Add i2s driver for hisi3660 soc found on the hikey960 board.
Add conpile line in make file.
Technical support by Guangke Ji.

Signed-off-by: Youlin Wang <wangyoulin1@hisilicon.com>
Signed-off-by: Tanglei Han <hantanglei@huawei.com>
Signed-off-by: Guangke Ji <jiguangke@huawei.com>
Signed-off-by: Feng Chen <puck.chen@hisilicon.com>
Signed-off-by: Kaihua Zhong <zhongkaihua@huawei.com>
Signed-off-by: Jun Chen <chenjun14@huawei.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Yiping Xu <xuyiping@hisilicon.com>
Signed-off-by: Pengcheng Li <lipengcheng8@huawei.com>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: I90e66ff47ebd2cd42fa233a4c4dbac6e74cd01d3
2021-02-18 23:41:44 +00:00
Shihui Zhao
121e60b7b1 ANDROID: arm64: dts: hi3660: enable gpu
enable gpu in the hi3660/hikey960 dts

Bug: 146450171
Signed-off-by: Shihui Zhao <zhaoshihui3@huawei.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Change-Id: I5e72a7dc14e760f442db5b9e55b0df26597cfbcd
2021-02-18 23:39:07 +00:00
cailiwei
ea6449d48a ANDROID: arm64: dts: hi3660: add display driver dts
Signed-off-by: Liwei Cai <cailiwei@hisilicon.com>
Signed-off-by: Xiubin Zhang <zhangxiubin1@huawei.com>
[jstultz: Cleanup unused hisifb bits]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: If9df111ad81475caec22fe75168544b010477e04
2021-02-18 23:36:49 +00:00
John Stultz
99ae6d076a ANDROID: arm64: dts: hikey960: Add CMA entry for DMA-BUF Heap/framebuffers
Add CMA entry, as the DRM driver requires phys contig
memory for framebuffers. These are normally allocated
out of DMA-BUF Heaps by gralloc for the graphics
framebuffer.

NOTE: On the 4gb boards, if we don't specify the address,
the cma region we use for the framebuffer and ion heap
may be placed in memory mapped above 32bits.

Due to addresses being masked to 32bits for DMA, this
resulted in crashes on newer 4gb boards.

Thus this patch forces CMA address to be lower in memory.

I've picked an address just past a previous CMA reserved
chunk, so we shouldn't be adding to any fragmentation.

Extra thanks to Ryan Grachek <ryan@edited.us> who
helped isolate the issue with the CMA buffers being over
the 32bit barrier.

Bug: 146450171
Change-Id: Ibb7ff1d85a9b93e41c440ffacf6a1ccf6aecb1ca
Signed-off-by: John Stultz <john.stultz@linaro.org>
2021-02-18 23:34:41 +00:00
John Stultz
a002be6ff0 ANDROID: drm: kirin960: Remove one mode-line that seems to be causing trouble
Alessio Balsini reported issues with his GeeekPi 7" monitor
after we added the wider mode support.

While this mode may work ok on HiKey, I suspect its just a bit
too far off for HiKey960.

So lets remove it for now.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: I54221c44f2ee6af16d3772831a24c2872a6f738c
2021-02-18 23:32:31 +00:00
Vincent Donnefort
d586305741 ANDROID: drm: kirin: remove wait for VACTIVE IRQ
For each display cycle, the Kirin960 display IP will generate a VACTIVE
interrupt followed by a VBLANK. During a FBIOPAN ioctl, the driver will then
wait for the first one to then wait for the second one. This is an issue when
the CPU load is too low: the wait_event() function might trigger a transition
to a deep sleep state and then, waking up from that state will take too much
time to catch the VBLANK interrupt on time, the difference between those two
interrupts being only 60 us.

  * Ideal case:                   ACT                VBL
                                   +                  +
                                   v                  v
                    ---> wait(ACT) +------> wait(VBL) +-->

  * Our case:                     ACT VBL        ACT VBL
                                   +   +          +   +
                                   v   v          v   v
                    ---> wait(ACT) +------> wait(VBL) +-->

The wait for VACTIVE IRQ can safely be removed: there is no hardware access
performed between the VACTIVE and the VBLANK IRQs.

This behavior has been introduced from 4.11 with the following patch:

    a3fbb53f4 drm/atomic: Wait for vblank whenever a plane is added to state.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Bug: 146450171
Change-Id: I66e276c08f04257135c3d05483ce70c58d5070b6
2021-02-18 23:30:28 +00:00