Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7.
This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors. This
series enables KFENCE for the x86 and arm64 architectures, and adds
KFENCE hooks to the SLAB and SLUB allocators.
KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.
KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error.
Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval,
the next allocation through the main allocator (SLAB or SLUB) returns a
guarded allocation from the KFENCE object pool. At this point, the timer
is reset, and the next allocation is set up after the expiration of the
interval.
To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE.
The KFENCE memory pool is of fixed size, and if the pool is exhausted no
further KFENCE allocations occur. The default config is conservative
with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
pages).
We have verified by running synthetic benchmarks (sysbench I/O,
hackbench) and production server-workload benchmarks that a kernel with
KFENCE (using sample intervals 100-500ms) is performance-neutral
compared to a non-KFENCE baseline kernel.
KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
properties. The name "KFENCE" is a homage to the Electric Fence Malloc
Debugger [2].
For more details, see Documentation/dev-tools/kfence.rst added in the
series -- also viewable here:
https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst
[1] http://llvm.org/docs/GwpAsan.html
[2] https://linux.die.net/man/3/efence
This patch (of 9):
This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.
KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.
KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error. To detect out-of-bounds
writes to memory within the object's page itself, KFENCE also uses
pattern-based redzones. The following figure illustrates the page
layout:
---+-----------+-----------+-----------+-----------+-----------+---
| xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx |
| xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx |
| x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x |
| xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx |
| xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx |
| xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx |
---+-----------+-----------+-----------+-----------+-----------+---
Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval, a
guarded allocation from the KFENCE object pool is returned to the main
allocator (SLAB or SLUB). At this point, the timer is reset, and the
next allocation is set up after the expiration of the interval.
To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE. To date, we have verified by running synthetic
benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE
is performance-neutral compared to the non-KFENCE baseline.
For more details, see Documentation/dev-tools/kfence.rst (added later in
the series).
Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: SeongJae Park <sjpark@amazon.de>
Co-developed-by: Marco Elver <elver@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[glider: resolved minor conflict in init/main.c]
Bug: 177201466
(cherry picked from commit 2a8dede73c3496bbd917644657f3735a4f508cb9
https://github.com/hnaz/linux-mm v5.11-rc4-mmots-2021-01-21-20-10)
Test: CONFIG_KFENCE_KUNIT_TEST=y passes on Cuttlefish
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I6b474675cc9732c31118df53fa06c3997f577218
If the system doesn't have enough memory when fuse_passthrough_read_iter
is requested in asynchronous IO, an error is directly returned without
restoring the caller's credentials.
Fix by always ensuring credentials are restored.
Fixes: aa29f32988 ("FROMLIST: fuse: Use daemon creds in passthrough mode")
Link: https://lore.kernel.org/lkml/YB0qPHVORq7bJy6G@google.com/
Reported-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
Change-Id: I4aff43f5dd8ddab2cc8871cd9f81438963ead5b6
This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.
Signed-off-by: Daniel Colascione <dancol@google.com>
[LG: Remove owner inode from userfaultfd_ctx]
[LG: Use anon_inode_getfd_secure() in userfaultfd syscall]
[LG: Use inode of file in userfaultfd_read() in resolve_userfault_fork()]
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
(cherry picked from commit b537900f15)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Ib2973ca3650a8defe15eded13294a3fb25356b9d
This change uses the anon_inodes and LSM infrastructure introduced in
the previous patches to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.
A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".
Example:
type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };
(The next patch in this series is necessary for making userfaultfd
support this new interface. The example above is just
for exposition.)
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
(cherry picked from commit 29cd6591ab)
Conflicts:
security/selinux/include/classmap.h
(1. Removed 'lockdown' mapping to be in sync with d9cb255af3)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Iaa9f236f43bf225f089f00ead17e64326adbb328
This change adds a new function, anon_inode_getfd_secure, that creates
anonymous-node file with individual non-S_PRIVATE inode to which security
modules can apply policy. Existing callers continue using the original
singleton-inode kind of anonymous-inode file. We can transition anonymous
inode users to the new kind of anonymous inode in individual patches for
the sake of bisection and review.
The new function accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules.
For example, in case of userfaultfd, the created inode is a 'logical child'
of the context_inode (userfaultfd inode of the parent process) in the sense
that it provides the security context required during creation of the child
process' userfaultfd inode.
Signed-off-by: Daniel Colascione <dancol@google.com>
[LG: Delete obsolete comments to alloc_anon_inode()]
[LG: Add context_inode description in comments to anon_inode_getfd_secure()]
[LG: Remove definition of anon_inode_getfile_secure() as there are no callers]
[LG: Make __anon_inode_getfile() static]
[LG: Use correct error cast in __anon_inode_getfile()]
[LG: Fix error handling in __anon_inode_getfile()]
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
(cherry picked from commit e7e832ce6f)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I3061c599f2951368914a2ca9f56ea60387d42a1d
This change adds a new LSM hook, inode_init_security_anon(), that will
be used while creating secure anonymous inodes. The hook allows/denies
its creation and assigns a security context to the inode.
The new hook accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules
for granting/denying permission to create an anon-inode of the same type.
This context_inode's security_context can also be used to initialize the
newly created anon-inode's security_context.
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
(cherry picked from commit 215b674b84)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I2bbbb7a5c2371103c5b632b791c5c397ae228e0b
Drivers supporting 4096-QAM rates as a vendor extension in HE mode need
to update the correct rate info to userspace while using 4096-QAM (MCS12
and MCS13) in HE mode. Add support to calculate bitrates of HE-MCS12 and
HE-MCS13 which represent the 4096-QAM modulation schemes. The MCS12 and
MCS13 bitrates are defined in IEEE P802.11be/D0.1.
In addition, scale up the bitrates by 3*2048 in order to accommodate
calculations for the new MCS12 and MCS13 rates without losing fraction
values.
Signed-off-by: Vamsi Krishna <vamsin@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20201029183457.7005-1-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Bug: 179454829
Change-Id: I0fed84d281031313e318402b3c985d2192c45434
(cherry picked from commit 9c97c88d2f)
Signed-off-by: Veerendranath Jakkam <vjakkam@codeaurora.org>
Add support to configure SAE PWE preference from userspace to drivers in
both AP and STA modes. This is needed for cases where the driver takes
care of Authentication frame processing (SME in the driver) so that
correct enforcement of the acceptable PWE derivation mechanism can be
performed.
The userspace applications can pass the sae_pwe value using the
NL80211_ATTR_SAE_PWE attribute in the NL80211_CMD_CONNECT and
NL80211_CMD_START_AP commands to the driver. This allows selection
between the hunting-and-pecking loop and hash-to-element options for PWE
derivation. For backwards compatibility, this new attribute is optional
and if not included, the driver is notified of the value being
unspecified.
Signed-off-by: Rohan Dutta <drohan@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20201027100910.22283-1-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Bug: 179454829
Change-Id: I6604da2ef738f49fc693b81009958b76043bc513
(cherry picked from commit 9f0ffa4184)
Signed-off-by: Veerendranath Jakkam <vjakkam@codeaurora.org>
Changes in 5.10.13
iwlwifi: provide gso_type to GSO packets
nbd: freeze the queue while we're adding connections
tty: avoid using vfs_iocb_iter_write() for redirected console writes
ACPI: sysfs: Prefer "compatible" modalias
ACPI: thermal: Do not call acpi_thermal_check() directly
kernel: kexec: remove the lock operation of system_transition_mutex
ALSA: hda/realtek: Enable headset of ASUS B1400CEPE with ALC256
ALSA: hda/via: Apply the workaround generically for Clevo machines
parisc: Enable -mlong-calls gcc option by default when !CONFIG_MODULES
media: cec: add stm32 driver
media: cedrus: Fix H264 decoding
media: hantro: Fix reset_raw_fmt initialization
media: rc: fix timeout handling after switch to microsecond durations
media: rc: ite-cir: fix min_timeout calculation
media: rc: ensure that uevent can be read directly after rc device register
ARM: dts: tbs2910: rename MMC node aliases
ARM: dts: ux500: Reserve memory carveouts
ARM: dts: imx6qdl-gw52xx: fix duplicate regulator naming
wext: fix NULL-ptr-dereference with cfg80211's lack of commit()
x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled
ASoC: AMD Renoir - refine DMI entries for some Lenovo products
Revert "drm/amdgpu/swsmu: drop set_fan_speed_percent (v2)"
drm/nouveau/kms/gk104-gp1xx: Fix > 64x64 cursors
drm/i915: Always flush the active worker before returning from the wait
drm/i915/gt: Always try to reserve GGTT address 0x0
drivers/nouveau/kms/nv50-: Reject format modifiers for cursor planes
bcache: only check feature sets when sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES
net: usb: qmi_wwan: added support for Thales Cinterion PLSx3 modem family
s390: uv: Fix sysfs max number of VCPUs reporting
s390/vfio-ap: No need to disable IRQ after queue reset
PM: hibernate: flush swap writer after marking
x86/entry: Emit a symbol for register restoring thunk
efi/apple-properties: Reinstate support for boolean properties
crypto: marvel/cesa - Fix tdma descriptor on 64-bit
drivers: soc: atmel: Avoid calling at91_soc_init on non AT91 SoCs
drivers: soc: atmel: add null entry at the end of at91_soc_allowed_list[]
btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch
btrfs: fix possible free space tree corruption with online conversion
KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]
KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh()
KVM: arm64: Filter out v8.1+ events on v8.0 HW
KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit
KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX
KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration
KVM: x86: get smi pending status correctly
KVM: Forbid the use of tagged userspace addresses for memslots
io_uring: fix wqe->lock/completion_lock deadlock
xen: Fix XenStore initialisation for XS_LOCAL
leds: trigger: fix potential deadlock with libata
arm64: dts: broadcom: Fix USB DMA address translation for Stingray
mt7601u: fix kernel crash unplugging the device
mt76: mt7663s: fix rx buffer refcounting
mt7601u: fix rx buffer refcounting
iwlwifi: Fix IWL_SUBDEVICE_NO_160 macro to use the correct bit.
drm/i915/gt: Clear CACHE_MODE prior to clearing residuals
drm/i915/pmu: Don't grab wakeref when enabling events
net/mlx5e: Fix IPSEC stats
ARM: dts: imx6qdl-kontron-samx6i: fix pwms for lcd-backlight
drm/nouveau/svm: fail NOUVEAU_SVM_INIT ioctl on unsupported devices
drm/vc4: Correct lbm size and calculation
drm/vc4: Correct POS1_SCL for hvs5
drm/nouveau/dispnv50: Restore pushing of all data.
drm/i915: Check for all subplatform bits
drm/i915/selftest: Fix potential memory leak
uapi: fix big endian definition of ipv6_rpl_sr_hdr
KVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VM
tee: optee: replace might_sleep with cond_resched
xen-blkfront: allow discard-* nodes to be optional
blk-mq: test QUEUE_FLAG_HCTX_ACTIVE for sbitmap_shared in hctx_may_queue
clk: imx: fix Kconfig warning for i.MX SCU clk
clk: mmp2: fix build without CONFIG_PM
clk: qcom: gcc-sm250: Use floor ops for sdcc clks
ARM: imx: build suspend-imx6.S with arm instruction set
ARM: zImage: atags_to_fdt: Fix node names on added root nodes
netfilter: nft_dynset: add timeout extension to template
Revert "RDMA/mlx5: Fix devlink deadlock on net namespace deletion"
Revert "block: simplify set_init_blocksize" to regain lost performance
xfrm: Fix oops in xfrm_replay_advance_bmp
xfrm: fix disable_xfrm sysctl when used on xfrm interfaces
selftests: xfrm: fix test return value override issue in xfrm_policy.sh
xfrm: Fix wraparound in xfrm_policy_addr_delta()
arm64: dts: ls1028a: fix the offset of the reset register
ARM: imx: fix imx8m dependencies
ARM: dts: imx6qdl-kontron-samx6i: fix i2c_lcd/cam default status
ARM: dts: imx6qdl-sr-som: fix some cubox-i platforms
arm64: dts: imx8mp: Correct the gpio ranges of gpio3
firmware: imx: select SOC_BUS to fix firmware build
RDMA/cxgb4: Fix the reported max_recv_sge value
ASoC: dt-bindings: lpass: Fix and common up lpass dai ids
ASoC: qcom: Fix incorrect volatile registers
ASoC: qcom: Fix broken support to MI2S TERTIARY and QUATERNARY
ASoC: qcom: lpass-ipq806x: fix bitwidth regmap field
spi: altera: Fix memory leak on error path
ASoC: Intel: Skylake: skl-topology: Fix OOPs ib skl_tplg_complete
powerpc/64s: prevent recursive replay_soft_interrupts causing superfluous interrupt
pNFS/NFSv4: Fix a layout segment leak in pnfs_layout_process()
pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn
ASoC: SOF: Intel: soundwire: fix select/depend unmet dependencies
ASoC: qcom: lpass: Fix out-of-bounds DAI ID lookup
iwlwifi: pcie: avoid potential PNVM leaks
iwlwifi: pnvm: don't skip everything when not reloading
iwlwifi: pnvm: don't try to load after failures
iwlwifi: pcie: set LTR on more devices
iwlwifi: pcie: use jiffies for memory read spin time limit
iwlwifi: pcie: reschedule in long-running memory reads
mac80211: pause TX while changing interface type
ice: fix FDir IPv6 flexbyte
ice: Implement flow for IPv6 next header (extension header)
ice: update dev_addr in ice_set_mac_address even if HW filter exists
ice: Don't allow more channels than LAN MSI-X available
ice: Fix MSI-X vector fallback logic
i40e: acquire VSI pointer only after VF is initialized
igc: fix link speed advertising
net/mlx5: Fix memory leak on flow table creation error flow
net/mlx5e: E-switch, Fix rate calculation for overflow
net/mlx5e: free page before return
net/mlx5e: Reduce tc unsupported key print level
net/mlx5: Maintain separate page trees for ECPF and PF functions
net/mlx5e: Disable hw-tc-offload when MLX5_CLS_ACT config is disabled
net/mlx5e: Fix CT rule + encap slow path offload and deletion
net/mlx5e: Correctly handle changing the number of queues when the interface is down
net/mlx5e: Revert parameters on errors when changing trust state without reset
net/mlx5e: Revert parameters on errors when changing MTU and LRO state without reset
net/mlx5: CT: Fix incorrect removal of tuple_nat_node from nat rhashtable
can: dev: prevent potential information leak in can_fill_info()
ACPI/IORT: Do not blindly trust DMA masks from firmware
of/device: Update dma_range_map only when dev has valid dma-ranges
iommu/amd: Use IVHD EFR for early initialization of IOMMU features
iommu/vt-d: Correctly check addr alignment in qi_flush_dev_iotlb_pasid()
nvme-multipath: Early exit if no path is available
selftests: forwarding: Specify interface when invoking mausezahn
rxrpc: Fix memory leak in rxrpc_lookup_local
NFC: fix resource leak when target index is invalid
NFC: fix possible resource leak
ASoC: mediatek: mt8183-da7219: ignore TDM DAI link by default
ASoC: mediatek: mt8183-mt6358: ignore TDM DAI link by default
ASoC: topology: Properly unregister DAI on removal
ASoC: topology: Fix memory corruption in soc_tplg_denum_create_values()
scsi: qla2xxx: Fix description for parameter ql2xenforce_iocb_limit
team: protect features update by RCU to avoid deadlock
tcp: make TCP_USER_TIMEOUT accurate for zero window probes
tcp: fix TLP timer not set when CA_STATE changes from DISORDER to OPEN
vsock: fix the race conditions in multi-transport support
Linux 5.10.13
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I75f419b25f24da559e446d62f75ce6bb9b0a5396
The scheduler now knows enough about these braindead systems to place
32-bit tasks accordingly, so throw out the safety checks and allow the
ret-to-user path to avoid do_notify_resume() if there is nothing to do.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 178507149
Link: https://lore.kernel.org/linux-arch/20201208132835.6151-16-will@kernel.org/
[will: Fixed trivial conflict with vendor hook in __switch_to()]
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I1258f5a95c2c4fc0548103810677b4b0a74320b4
If we want to support 32-bit applications, then when we identify a CPU
with mismatched 32-bit EL0 support we must ensure that we will always
have an active 32-bit CPU available to us from then on. This is important
for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
CPUs for compat tasks and forced migration due to a hotplug event will
hang if no 32-bit CPUs are available.
On detecting a mismatch, prevent offlining of either the mismatching CPU
if it is 32-bit capable, or find the first active 32-bit capable CPU
otherwise.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 178507149
Link: https://lore.kernel.org/linux-arch/20201208132835.6151-14-will@kernel.org/
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I330859dfd7b10082e1a3dd5341d76f2a90b1f124
Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.
Although userspace can carefully manage the affinity masks for such
tasks, one place where it is particularly problematic is execve()
because the CPU on which the execve() is occurring may be incompatible
with the new application image. In such a situation, it is desirable to
restrict the affinity mask of the task and ensure that the new image is
entered on a compatible CPU. From userspace's point of view, this looks
the same as if the incompatible CPUs have been hotplugged off in the
task's affinity mask.
In preparation for restricting the affinity mask for compat tasks on
arm64 systems without uniform support for 32-bit applications, introduce
force_compatible_cpus_allowed_ptr(), which restricts the affinity mask
for a task to contain only compatible CPUs.
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 178507149
Link: https://lore.kernel.org/linux-arch/20201208132835.6151-11-will@kernel.org/
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ief3e47f6aa8179eadf2e009f207cc0161b76f466
Reject explicit requests to change the affinity mask of a task via
set_cpus_allowed_ptr() if the requested mask is not a subset of the
mask returned by task_cpu_possible_mask(). This ensures that the
'cpus_mask' for a given task cannot contain CPUs which are incapable of
executing it, except in cases where the affinity is forced.
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 178507149
Link: https://lore.kernel.org/linux-arch/20201208132835.6151-10-will@kernel.org/
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Iadedd637c253cccfeb5fa4098afb2048bbfa6cc3
Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.
Modify guarantee_online_cpus() to take task_cpu_possible_mask() into
account when trying to find a suitable set of online CPUs for a given
task. This will avoid passing an invalid mask to set_cpus_allowed_ptr()
during ->attach() and will subsequently allow the cpuset hierarchy to be
taken into account when forcefully overriding the affinity mask for a
task which requires migration to a compatible CPU.
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 178507149
Link: https://lore.kernel.org/linux-arch/20201208132835.6151-9-will@kernel.org/
[will: Fixed conflict due to active_mask being used instead of online_mask]
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I5b4a50e7a257af928dccd87f1dbd961ea26ff834
Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.
On such a system, we must take care not to migrate a task to an
unsupported CPU when forcefully moving tasks in select_fallback_rq()
in response to a CPU hot-unplug operation.
Introduce a task_cpu_possible_mask() hook which, given a task argument,
allows an architecture to return a cpumask of CPUs that are capable of
executing that task. The default implementation returns the
cpu_possible_mask, since sane machines do not suffer from per-cpu ISA
limitations that affect scheduling. The new mask is used when selecting
the fallback runqueue as a last resort before forcing a migration to the
first active CPU.
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 178507149
Link: https://lore.kernel.org/linux-arch/20201208132835.6151-7-will@kernel.org/
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I75985976c196cee7b84043e1a03fcc62f8b6d1c4
Scheduling a 32-bit application on a 64-bit-only CPU is a bad idea.
Ensure that 32-bit applications always take the slow-path when returning
to userspace on a system with mismatched support at EL0, so that we can
avoid trying to run on a 64-bit-only CPU and force a SIGKILL instead.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 178507149
Link: https://lore.kernel.org/linux-arch/20201208132835.6151-5-will@kernel.org/
[will: Fixed trivial conflict with vendor hook in __switch_to()]
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I5ae90f3fb63499d7016f93d13e32693e26890f92
When confronted with a mixture of CPUs, some of which support 32-bit
applications and others which don't, we quite sensibly treat the system
as 64-bit only for userspace and prevent execve() of 32-bit binaries.
Unfortunately, some crazy folks have decided to build systems like this
with the intention of running 32-bit applications, so relax our
sanitisation logic to continue to advertise 32-bit support to userspace
on these systems and track the real 32-bit capable cores in a cpumask
instead. For now, the default behaviour remains but will be tied to
a command-line option in a later patch.
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 178507149
Link: https://lore.kernel.org/linux-arch/20201208132835.6151-3-will@kernel.org/
[will: Fix conflict in cpucaps definition, as ARM64_HARDEN_EL2_VECTORS renamed]
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ib8d2b8df2f4ce370a518685ee81789be9b2dd6f5
CONFIG_ASYMMETRIC_AARCH32 is about to go away, so remove its entry from
gki_defconfig.
Bug: 178507149
Signed-off-by: Will Deacon <will@kernel.org>
Change-Id: I67dbbd7c31637a4d1ec2aab8bd86d9fac515b9d3
Signed-off-by: Will Deacon <willdeacon@google.com>
The PELT half-life is currently hard-coded to 32ms.
Create cmdline arg to enable switching to a PELT half-life of 8mS.
Bug: 177593580
Change-Id: I9f8cfc3d9554a500eec0f6a1b161f4155c296b4d
Signed-off-by: Shaleen Agrawal <shalagra@codeaurora.org>
With CONFIG_CFI_CLANG, the compiler replaces function pointers with
jump table addresses, which results in __pa_symbol returning the
physical address of the jump table entry. As the jump table contains
an immediate jump to an EL1 virtual address, this typically won't
work as intended. Use __pa_function instead to get the address to
cpu_resume.
Bug: 145210207
Change-Id: Iebcb0950b074c0ed0ddc6ec6cd8c4ff539f00e7c
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
With CONFIG_CFI_CLANG, the compiler replaces function pointers with
jump table addresses, which results in __pa_symbol returning the
physical address of the jump table entry. As the jump table contains
an immediate jump to an EL1 virtual address, this typically won't
work as intended. Use __pa_function instead to get the address of
secondary_entry.
Bug: 178005287
Change-Id: I90aea4cacd66ac224aae5c1a577decda1d922c22
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Add hooks to gather data of kernel fault and summarize it with
other information.
Bug: 177483057
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I527eddf08be22fa842680bee850f1ef1f5a2c0ed
Add hooks to gather data of bad scheduling and summarize it with
other information.
Bug: 177483057
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I08a7097b60dd8eebc5c0205b31c463a36f576121
Add hooks to gather data of unfrozen tasks and summarize it
with other information.
Bug: 177483057
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I6f3ed7320e828a8dd1e7ae5d4449420085a75b17
Add hook to gather data of softlockup and summarize it with
other information.
Bug: 177483057
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I42b906f17ad689176f0cc5a1a46acd0b5971d6c5
Patch 'ANDROID: mm, oom: Avoid killing tasks with negative ADJ scores'
does not handle a special case when oom_evaluate_task is aborted and
sets oc->chosen to -1. Check for this condition to avoid invalid memory
access.
Bug: 179177151
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Id9a3f1b824c6a81d157782b8cb18115b3c577a50
The speculative page fault path does not sync the
rss in task_struct to mm_struct leading to large
variance in the RSS values observed by userspace
tools and also in the OOM task dump.
Change-Id: Id45f1b9b0a51a9afffbaf8e65f5ef747d409d0d7
Bug: 179217427
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Leaf changes summary: 2 artifacts changed
Changed leaf types summary: 1 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 1 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
1 function with some sub-type change:
[C] 'function dma_heap* dma_heap_add(const dma_heap_export_info*)' at dma-heap.c:283:1 has some sub-type changes:
CRC (modversions) changed from 0xeb9fba5f to 0x7708cda
'struct dma_heap_ops at dma-heap.h:23:1' changed:
type size changed from 64 to 128 (in bits)
1 data member insertion:
'long int (dma_heap*)* dma_heap_ops::get_pool_size', at offset 64 (in bits) at dma-heap.h:29:1
5 impacted interfaces
Bug: 167709539
Change-Id: Ie1669843bdf3ae48e31bf30ef61df33ee54c19b7
Signed-off-by: Hridya Valsaraju <hridya@google.com>
A number of systems need the dummy USB host controller driver for
testing, so enable it into the kernel to remove the need to support a
bunch of exported symbols just for that driver.
Bug: 157965270
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I74b3aa819984dd894cccd3a5239d210b9e0d43a5
Leaf changes summary: 76 artifacts changed
Changed leaf types summary: 2 leaf types changed
Removed/Changed/Added functions summary: 0 Removed, 74 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
74 functions with some sub-type change:
[C] 'function int __ion_device_add_heap(ion_heap*, module*)' at ion.c:312:1 has some sub-type changes:
CRC (modversions) changed from 0x1eddf3a5 to 0x7f958fe0
[C] 'function dma_buf_attachment* dma_buf_attach(dma_buf*, device*)' at dma-buf.h:585:1 has some sub-type changes:
CRC (modversions) changed from 0x1e0bba0e to 0x338ae462
[C] 'function int dma_buf_begin_cpu_access(dma_buf*, dma_data_direction)' at dma-buf.c:1125:1 has some sub-type changes:
CRC (modversions) changed from 0xe447ea92 to 0xca6c466d
... 71 omitted; 74 symbols have only CRC changes
'struct dma_buf at dma-buf.h:394:1' changed:
type size changed from 2048 to 2112 (in bits)
1 data member insertion:
'dma_buf_sysfs_entry* dma_buf::sysfs_entry', at offset 2048 (in bits) at dma-buf.h:426:1
93 impacted interfaces
'struct dma_buf_attachment at dma-buf.h:490:1' changed:
type size changed from 640 to 704 (in bits)
1 data member insertion:
'dma_buf_attach_sysfs_entry* dma_buf_attachment::sysfs_entry', at offset 640 (in bits) at dma-buf.h:506:1
93 impacted interface
Bug: 167709539
Change-Id: I3297a07ef29e63a0c2fda81b2a02cbf95fd3f372
Signed-off-by: Hridya Valsaraju <hridya@google.com>
This patch turns on CONFIG_DMABUF_SYSFS_STATS to enable the DMA-BUF
sysfs statistics.
Bug: 167709539
Change-Id: Idc4cb231edfedcdf672474119238e5d7e545002d
Signed-off-by: Hridya Valsaraju <hridya@google.com>