There are four different callback functions that are used for the
rproc_handle_resource_t callback that all have different second
parameter types.
rproc_handle_vdev -> struct fw_rsc_vdev
rproc_handle_trace -> struct fw_rsc_trace
rproc_handle_devmem -> struct fw_rsc_devmem
rproc_handle_carveout -> struct fw_rsc_carveout
These callbacks are cast to rproc_handle_resource_t so that there is no
error about incompatible pointer types. Unfortunately, this is a Clang's
Control-Flow Integrity checking violation, which verifies that the
callback function's types match the prototypes exactly before jumping.
[ 7.275750] Kernel panic - not syncing: CFI failure (target: rproc_handle_vdev+0x0/0x4)
[ 7.283763] CPU: 2 PID: 1 Comm: init Tainted: G C O 5.4.70-03301-g527af2c96672 #17
[ 7.292463] Hardware name: NXP i.MX8MPlus EVK board (DT)
[ 7.297779] Call trace:
[ 7.300232] dump_backtrace.cfi_jt+0x0/0x4
[ 7.304337] show_stack+0x18/0x24
[ 7.307660] dump_stack+0xb8/0x114
[ 7.311069] panic+0x164/0x3d4
[ 7.314130] __ubsan_handle_cfi_check_fail_abort+0x0/0x14
[ 7.319533] perf_proc_update_handler+0x0/0xcc
[ 7.323983] __cfi_check+0x63278/0x6a290
[ 7.327913] rproc_boot+0x3f8/0x738
[ 7.331404] rproc_add+0x68/0x110
[ 7.334738] imx_rproc_probe+0x5e4/0x708 [imx_rproc]
[ 7.339711] platform_drv_probe+0xac/0xf0
[ 7.343726] really_probe+0x260/0x65c
[ 7.347393] driver_probe_device+0x64/0x100
[ 7.351580] device_driver_attach+0x6c/0xac
[ 7.355766] __driver_attach+0xdc/0x184
[ 7.359609] bus_for_each_dev+0x98/0x104
[ 7.363537] driver_attach+0x24/0x30
[ 7.367117] bus_add_driver+0x100/0x1e0
[ 7.370958] driver_register+0x78/0x114
[ 7.374800] __platform_driver_register+0x44/0x50
[ 7.379514] init_module+0x20/0xfe8 [imx_rproc]
[ 7.384049] do_one_initcall+0x190/0x348
[ 7.387979] do_init_module+0x5c/0x210
[ 7.391731] load_module+0x2fbc/0x3590
[ 7.395485] __arm64_sys_finit_module+0xb8/0xec
[ 7.400025] el0_svc_common+0xb4/0x19c
[ 7.403777] el0_svc_handler+0x74/0x98
[ 7.407531] el0_svc+0x8/0xc
[ 7.410419] SMP: stopping secondary CPUs
[ 7.414648] Kernel Offset: disabled
[ 7.418142] CPU features: 0x00010002,2000200c
[ 7.422501] Memory Limit: none
To fix this, change the second parameter of all functions to void * and
use a local variable with the correct type so that everything works
properly. With this, we can remove casting to rproc_handle_resource_t
for these functions.
Signed-off-by: Jindong Yue <jindong.yue@nxp.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20210224055825.7417-1-jindong.yue@nxp.com
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
(cherry picked from commit 2bf2346159)
Bug: 187234877
Reported-by: Joy-mi Huang <joy-mi.huang@mediatek.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Change-Id: Icc7adeaf98f779fc46a979ff7e9f2ab5f963f636
It's needed by MTK board for reasons below:
After working with built-in mtk timer(timer-mediatek) as
tick-broadcast devices for few weeks, we found some issues caused
by linux timer framework:
1. tick broadcast installed by insmod cannot switch to oneshot
mode correctly (BUG: 161822795)
2. rcu warning will be shown if we force to re-enabled tick-broadcast
for each cpu when new tick broadcast device is added by insmod
(timer-mediatek.c in our case)
Bug: 161675989
Change-Id: If3056df11d8256f2af4a6928820faed9f690b3db
Signed-off-by: Freddy Hsin <freddy.hsin@mediatek.com>
Signed-off-by: Chun-Hung Wu <chun-hung.wu@mediatek.com>
(cherry picked from commit b995c14df40ff1c8d66525eb1133e3f28a759f54)
Add a vendor hook for modules to know when the topology
code has determined the max capacity of cpus.
Bug: 187234873
Change-Id: Ia3e22479059d2e57500cbdd46504aa4773af6e4a
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
In Android GKI, CONFIG_FAIR_GROUP_SCHED is enabled [1] to help
prioritize important work. Given that CPU shares of root cgroup
can't be changed, leaving the tasks inside root cgroup will give
them higher share compared to the other tasks inside important
cgroups. This is mitigated by moving all tasks inside root cgroup to
a different cgroup after Android is booted. However, there are many
kernel tasks stuck in the root cgroup after the boot.
It is possible to relax kernel threads and kworkers migrations under
certain scenarios. However the patch [2] posted at upstream is not
accepted. Hence add a restricted vendor hook to notify modules when a
kernel thread is requested for cgroup migration. The modules can relax
the restrictions forced by the kernel and allow the cgroup migration.
[1] f08f049de1
[2] https://lore.kernel.org/lkml/1617714261-18111-1-git-send-email-pkondeti@codeaurora.org
Bug: 184594949
Change-Id: I445a170ba797c8bece3b4b59b7a42cdd85438f1f
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Util-clamp places tasks in different buckets based on their clamp values
for performance reasons. However, the size of buckets is currently
computed using a rounding division, which can lead to an off-by-one
error in some configurations.
For instance, with 20 buckets, the bucket size will be 1024/20=51. A
task with a clamp of 1024 will be mapped to bucket id 1024/51=20. Sadly,
correct indexes are in range [0,19], hence leading to an out of bound
memory access.
Clamp the bucket id to fix the issue.
Bug: 186415778
Fixes: 69842cba9a ("sched/uclamp: Add CPU's clamp buckets refcounting")
Suggested-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/20210430151412.160913-1-qperret@google.com
Change-Id: Ibc28662de5554f80f97533b60e747f8a6e871c56
Changes in 5.10.34
iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_gen2_enqueue_hcmd()
mei: me: add Alder Lake P device id.
Linux 5.10.34
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I05dd0ce2038c3fc8c928e7be9e804584cc4d1e75
This reverts commit ef4ff626b3.
Picking "kfence: await for allocation using wait_event" resulted in
KFENCE spinning no more in toggle_allocation_gate(), which reportedly
fixed the power regression.
Bug: 185280916
Test: power team confirmed there's no regression
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I65d4206ff9199b06c879e506fd38129df50b393e
For data read commands, SDHC may initiate data transfers even before it
completely process the command response. In case command itself fails,
driver un-maps the memory associated with data transfer but this memory
can still be accessed by SDHC for the already initiated data transfer.
This scenario can lead to un-mapped memory access error.
To avoid this scenario, reset SDHC (when command fails) prior to
un-mapping memory. Resetting SDHC ensures that all in-flight data
transfers are either aborted or completed. So we don't run into this
scenario.
Swap the reset, un-map steps sequence in sdhci_request_done().
Bug: 186781699
(cherry picked from commit 21e35e898a
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git remotes/origin/next)
Link: https://lore.kernel.org/r/1614760331-43499-1-git-send-email-pragalla@qti.qualcomm.com
Suggested-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/1614760331-43499-1-git-send-email-pragalla@qti.qualcomm.com
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Pradeep P V K <pragalla@codeaurora.org>
Signed-off-by: Bao D. Nguyen <nguyenb@codeaurora.org>
Change-Id: I547ea7c1b3e2b9026313d6e17b09851eef520c22
KSM uses follow_page with FOLL_GET at several places so they should
use put_user_page to avoid page_pinner false positive.
Bug: 183414571
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I0234eeb5db7d801e70c4884146c3029582b715c1
munlock_vma_pages_range uses follow_page(FOLL_GET) so we need to use
put_user_page to avoid false positive. However, munlock path is quite
complicated to attribute them. At this point, do not make the muck.
Instead just unattribute them to avoid false positive.
Bug: 183414571
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I4776bd1a94247e226b29fceb3879c338e8c7323a
add_page_for_migration uses follow_page with FOLL_GET. Thus,
close the page_pinner false positive by using put_user_page.
Bug: 183414571
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I655b5610bafad86172dcb291573c33176989d94b
dump_user_range uses __get_user_pages_locked with FOLL_GET. Thus,
close the page-pinner false positive using put_user_page.
Bug: 183414571
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Change-Id: Ib343a9f61303655b108d60575edae1249ef687df
fuse could use get_user_pages_fast by iov_iter_get_pages at
fuse_copy_fill so close the false positive by attributing
it by put_user_page.
Page pinned via pid 670, ts 4554195916 ns
PFN 83125 Block 162 type Movable Flags 0xfffffc008001e(referenced|uptodate|dirty|lru|swapbacked)
try_grab_compound_head+0x1e8/0x240
internal_get_user_pages_fast+0x66d/0xca0
iov_iter_get_pages+0xd4/0x3a0
fuse_copy_fill+0x197/0x200
fuse_copy_one+0x6e/0xf0
fuse_dev_do_read.constprop.0+0x435/0x7e0
fuse_dev_read+0x5d/0x90
new_sync_read+0x115/0x1a0
vfs_read+0xf4/0x180
ksys_read+0x5f/0xe0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Bug: 183414571
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Idc80d4a34b546f25e8f6dbc68313d39586e914d9
CMA allocation can fail by temporal page refcount increasement
by get_page API as well as get_user_pages friends.
However, since get_page is one of the most hot function, it is
hard to hook get_page to get callstack everytime due to
performance concern. Furthermore, get_page could be nested
multiple times so we couldn't track all of the pin sites on
limited space of page_pinner.
Thus, here approach is keep tracking of put_page callsite rather
than get_page once VM found the page migration failed.
It's based on assumption:
1. Since it's temporal page refcount, it could be released soon
before overflowing dmesg log buffer
2. developer can find the pair of get_page by reviewing put_page.
By default, it's eanbled. If you want to disable it:
echo 0 > $debugfs/page_pinner/failure_tracking
You can capture the tracking using:
cat $debugfs/page_pinner/alloc_contig_failed
note: the example below is artificial:
Page pinned ts 386067292 us count 0
PFN 10162530 Block 9924 type Isolate Flags 0x800000000008000c(uptodate|dirty|swapbacked)
__page_pinner_migration_failed+0x30/0x104
putback_lru_page+0x90/0xac
putback_movable_pages+0xc4/0x204
__alloc_contig_migrate_range+0x290/0x31c
alloc_contig_range+0x114/0x2bc
cma_alloc+0x2d8/0x698
cma_alloc_write+0x58/0xb8
simple_attr_write+0xd4/0x124
debugfs_attr_write+0x50/0xd8
full_proxy_write+0x70/0xf8
vfs_write+0x168/0x3a8
ksys_write+0x7c/0xec
__arm64_sys_write+0x20/0x30
el0_svc_common+0xa4/0x180
do_el0_svc+0x28/0x88
el0_svc+0x14/0x24
Page pinned ts 385867394 us count 0
PFN 10162530 Block 9924 type Isolate Flags 0x800000000008000c(uptodate|dirty|swapbacked)
__page_pinner_migration_failed+0x30/0x104
__alloc_contig_migrate_range+0x200/0x31c
alloc_contig_range+0x114/0x2bc
cma_alloc+0x2d8/0x698
cma_alloc_write+0x58/0xb8
simple_attr_write+0xd4/0x124
debugfs_attr_write+0x50/0xd8
full_proxy_write+0x70/0xf8
vfs_write+0x168/0x3a8
ksys_write+0x7c/0xec
__arm64_sys_write+0x20/0x30
el0_svc_common+0xa4/0x180
do_el0_svc+0x28/0x88
el0_svc+0x14/0x24
el0_sync_handler+0x88/0xec
el0_sync+0x198/0x1c0
Bug: 183414571
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ie79902c18390eb9f320d823839bb9d9a7fdcdb31
For CMA allocation, it's really critical to migrate a page but
sometimes it fails. One of the reasons is some driver holds a
page refcount for a long time so VM couldn't migrate the page
at that time.
The concern here is there is no way to find the who hold the
refcount of the page effectively. This patch introduces feature
to keep tracking page's pinner. All get_page sites are vulnerable
to pin a page for a long time but the cost to keep track it would
be significat since get_page is the most frequent kernel operation.
Furthermore, the page could be not user page but kernel page which
is not related to the page migration failure. So, this patch keeps
tracking only get_user_pages/follow_page with (FOLL_GET|PIN friends
because they are the very common APIs to pin user pages which could
cause migration failure and the less frequent than get_page so
runtime cost wouldn't be that big but could cover many cases
effectively.
This patch also introduces put_user_page API. It aims for attributing
"the pinner releases the page from now on" while it release the
page refcount. Thus, any user of get_user_pages/follow_page(FOLL_GET)
must use put_user_page as pair of those functions. Otherwise,
page_pinner will treat them long term pinner as false postive but
nothing should affect stability.
* $debugfs/page_pinner/threshold
It indicates threshold(microsecond) to flag long term pinning.
It's configurable(Default is 300000us). Once you write new value
to the threshold, old data will clear.
* $debugfs/page_pinner/longterm_pinner
It shows call sites where the duration of pinning was greater than
the threshold. Internally, it uses a static array to keep 4096
elements and overwrites old ones once overflow happens. Therefore,
you could lose some information.
example)
Page pinned ts 76953865787 us count 1
PFN 9856945 Block 9625 type Movable Flags 0x8000000000080014(uptodate|lru|swapbacked)
__set_page_pinner+0x34/0xcc
try_grab_page+0x19c/0x1a0
follow_page_pte+0x1c0/0x33c
follow_page_mask+0xc0/0xc8
__get_user_pages+0x178/0x414
__gup_longterm_locked+0x80/0x148
internal_get_user_pages_fast+0x140/0x174
pin_user_pages_fast+0x24/0x40
CCC
BBB
AAA
__arm64_sys_ioctl+0x94/0xd0
el0_svc_common+0xa4/0x180
do_el0_svc+0x28/0x88
el0_svc+0x14/0x24
note: page_pinner doesn't guarantee attributing/unattributing are
atomic if they happen at the same time. It's just best effort so
false-positive could happen.
Bug: 183414571
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ife37ec360eef993d390b9c131732218a4dfd2f04
Fix the following build warning:
include/trace/hooks/psi.h:17:18: warning: declaration of
'struct psi_trigger' will not be visible outside of
this function [-Wvisibility]
Bug: 178721511
Fixes: commit b79d1815c4 ("ANDROID: psi: Add vendor hooks for PSI tracing")
Change-Id: I4c8c9730f5a0c94fa8d93c6995ce25f385b52043
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
When servicemanager process added service proxy from other process
register the service, we want to know the matching relation between
handle in the process and service name. When binder transaction
happened, We want to know what process calls what method on what service.
Bug: 186604985
Signed-off-by: zhengding chen <chenzhengding@oppo.com>
Change-Id: I813d1cde10294d8665f899f7fef0d444ec1f1f5e
DMA-BUF per-buffer attachment stats are present at
/sys/kernel/dma-buf/buffers/<ino>/attachments. Since a kset of name
'attachment' is created everytime a buffer is allocated, disable
uevents from the kset to avoid their userspace overhead.
The ksets 'dma-buf' and 'buffers' do not need uevents disabled since
they are emitted just once but the patch does it for uniformity.
Bug: 186155231
Change-Id: I29009dd2aa0bc018c18bca7a1e0a068f4480cf13
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Changes in 5.10.33
vhost-vdpa: protect concurrent access to vhost device iotlb
gpio: omap: Save and restore sysconfig
KEYS: trusted: Fix TPM reservation for seal/unseal
vdpa/mlx5: Set err = -ENOMEM in case dma_map_sg_attrs fails
pinctrl: lewisburg: Update number of pins in community
block: return -EBUSY when there are open partitions in blkdev_reread_part
pinctrl: core: Show pin numbers for the controllers with base = 0
arm64: dts: allwinner: Revert SD card CD GPIO for Pine64-LTS
bpf: Permits pointers on stack for helper calls
bpf: Allow variable-offset stack access
bpf: Refactor and streamline bounds check into helper
bpf: Tighten speculative pointer arithmetic mask
locking/qrwlock: Fix ordering in queued_write_lock_slowpath()
perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3
perf/x86/kvm: Fix Broadwell Xeon stepping in isolation_ucodes[]
perf auxtrace: Fix potential NULL pointer dereference
perf map: Fix error return code in maps__clone()
HID: google: add don USB id
HID: alps: fix error return code in alps_input_configured()
HID cp2112: fix support for multiple gpiochips
HID: wacom: Assign boolean values to a bool variable
soc: qcom: geni: shield geni_icc_get() for ACPI boot
dmaengine: xilinx: dpdma: Fix descriptor issuing on video group
dmaengine: xilinx: dpdma: Fix race condition in done IRQ
ARM: dts: Fix swapped mmc order for omap3
net: geneve: check skb is large enough for IPv4/IPv6 header
dmaengine: tegra20: Fix runtime PM imbalance on error
s390/entry: save the caller of psw_idle
arm64: kprobes: Restore local irqflag if kprobes is cancelled
xen-netback: Check for hotplug-status existence before watching
cavium/liquidio: Fix duplicate argument
kasan: fix hwasan build for gcc
csky: change a Kconfig symbol name to fix e1000 build error
ia64: fix discontig.c section mismatches
ia64: tools: remove duplicate definition of ia64_mf() on ia64
x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access
net: hso: fix NULL-deref on disconnect regression
USB: CDC-ACM: fix poison/unpoison imbalance
Linux 5.10.33
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I638db3c919ad938eaaaac3d687175252edcd7990
If the timestamp of the .config file is updated, config_data.gz is
regenerated, then vmlinux is re-linked. This occurs even if the content
of the .config has not changed at all.
This issue was mitigated by commit 67424f61f8 ("kconfig: do not write
.config if the content is the same"); Kconfig does not update the
.config when it ends up with the identical configuration.
The issue is remaining when the .config is created by *_defconfig with
some config fragment(s) applied on top.
This is typical for powerpc and mips, where several *_defconfig targets
are constructed by using merge_config.sh.
One workaround is to have the copy of the .config. The filechk rule
updates the copy, kernel/config_data, by checking the content instead
of the timestamp.
With this commit, the second run with the same configuration avoids
the needless rebuilds.
$ make ARCH=mips defconfig all
[ snip ]
$ make ARCH=mips defconfig all
*** Default configuration is based on target '32r2el_defconfig'
Using ./arch/mips/configs/generic_defconfig as base
Merging arch/mips/configs/generic/32r2.config
Merging arch/mips/configs/generic/el.config
Merging ./arch/mips/configs/generic/board-boston.config
Merging ./arch/mips/configs/generic/board-ni169445.config
Merging ./arch/mips/configs/generic/board-ocelot.config
Merging ./arch/mips/configs/generic/board-ranchu.config
Merging ./arch/mips/configs/generic/board-sead-3.config
Merging ./arch/mips/configs/generic/board-xilfpga.config
#
# configuration written to .config
#
SYNC include/config/auto.conf
CALL scripts/checksyscalls.sh
CALL scripts/atomic/check-atomics.sh
CHK include/generated/compile.h
CHK include/generated/autoksyms.h
Reported-by: Elliot Berman <eberman@codeaurora.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Bug: 179648610
(cherry picked from commit b33976d90d1ea7652fff662dcc2234f352346a33
https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git
kbuild)
[eberman: Fixed minor conflicts in kernel/.gitignore]
Change-Id: I8c93147c8d5a48d0f5e9abf855870b10c1a24efc
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
When CPUs with mismatched support for 32-bit EL0 are detected early,
before the system capabilities are finalised, then initialisation of
the compat hwcaps is left to the boot CPU to configure as it would do
for a system without a mismatch.
Unfortunately, this initialisation is only carried out if
system_supports_32bit_el0(), which isn't initialised until later in the
boot process via a CPU hotplug notifier. Consequently, the compat hwcaps
reported do not necessarily indicate all of the CPU features available.
Move the initialisation of compat hwcaps to the CPU hotplug notifier
itself so that we can ensure that they are configured as soon as we have
detected a mismatch.
Bug: 186482502
Cc: Quentin Perret <qperret@google.com>
Cc: Huang Yiwei <hyiwei@codeaurora.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I54edcc34f96be9dec15aa44407441c0227c17753
Take the 4 instruction byte swapping sequence from the decompressor's
head.S, and turn it into a rev_l GAS macro for general use. While
at it, make it use the 'rev' instruction when compiling for v6 or
later.
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
(cherry picked from commit 6468e898c6)
Bug: 178411248
Change-Id: I8433e97d2880f75cace215f1a8daadec7f29929c
Signed-off-by: Eric Biggers <ebiggers@google.com>
DTB stores all values as 32-bit big-endian integers.
Add a macro to convert such values to native CPU endianness, to reduce
duplication.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
(cherry picked from commit 0557ac83fd)
Bug: 178411248
Change-Id: I0807f36352dbfd5f5808959e358a7469dc9753bb
Signed-off-by: Eric Biggers <ebiggers@google.com>
Patch series "kfence: optimize timer scheduling", v2.
We have observed that mostly-idle systems with KFENCE enabled wake up
otherwise idle CPUs, preventing such to enter a lower power state.
Debugging revealed that KFENCE spends too much active time in
toggle_allocation_gate().
While the first version of KFENCE was using all the right bits to be
scheduling optimal, and thus power efficient, by simply using wait_event()
+ wake_up(), that code was unfortunately removed.
As KFENCE was exposed to various different configs and tests, the
scheduling optimal code slowly disappeared. First because of hung task
warnings, and finally because of deadlocks when an allocation is made by
timer code with debug objects enabled. Clearly, the "fixes" were not too
friendly for devices that want to be power efficient.
Therefore, let's try a little harder to fix the hung task and deadlock
problems that we have with wait_event() + wake_up(), while remaining as
scheduling friendly and power efficient as possible.
Crucially, we need to defer the wake_up() to an irq_work, avoiding any
potential for deadlock.
The result with this series is that on the devices where we observed a
power regression, power usage returns back to baseline levels.
This patch (of 3):
On mostly-idle systems, we have observed that toggle_allocation_gate() is
a cause of frequent wake-ups, preventing an otherwise idle CPU to go into
a lower power state.
A late change in KFENCE's development, due to a potential deadlock [1],
required changing the scheduling-friendly wait_event_timeout() and
wake_up() to an open-coded wait-loop using schedule_timeout(). [1]
https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com
To avoid unnecessary wake-ups, switch to using wait_event_timeout().
Unfortunately, we still cannot use a version with direct wake_up() in
__kfence_alloc() due to the same potential for deadlock as in [1].
Instead, add a level of indirection via an irq_work that is scheduled if
we determine that the kfence_timer requires a wake_up().
Link: https://lkml.kernel.org/r/20210421105132.3965998-1-elver@google.com
Link: https://lkml.kernel.org/r/20210421105132.3965998-2-elver@google.com
Fixes: 0ce20dd840 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Bug: 172317151
Bug: 185280916
Test: power team confirmed there's no regression
(cherry picked from commit 0ac66fbbadde5547db190e9d87ff2e77d245f9a2
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Idba39683f0b76eb3f70df1113e43d94845fab5bf
Because memblock allocations are registered with kmemleak, the KFENCE
pool was seen by kmemleak as one large object. Later allocations
through kfence_alloc() that were registered with kmemleak via
slab_post_alloc_hook() would then overlap and trigger a warning.
Therefore, once the pool is initialized, we can remove (free) it from
kmemleak again, since it should be treated as allocator-internal and be
seen as "free memory".
The second problem is that kmemleak is passed the rounded size, and not
the originally requested size, which is also the size of KFENCE objects.
To avoid kmemleak scanning past the end of an object and trigger a
KFENCE out-of-bounds error, fix the size if it is a KFENCE object.
For simplicity, to avoid a call to kfence_ksize() in
slab_post_alloc_hook() (and avoid new IS_ENABLED(CONFIG_DEBUG_KMEMLEAK)
guard), just call kfence_ksize() in mm/kmemleak.c:create_object().
Link: https://lkml.kernel.org/r/20210317084740.3099921-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Luis Henriques <lhenriques@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 172317151
Test: build and run on an ARM64 device
(cherry picked from commit 9551158069)
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Ida4d747d3b81b5e8a6f1047b864da89e8e110e61
Allow disabling symbol trimming on the command line when running
build.sh. This allows us to make GKI builds without trimming and without
modifying the build config. The main use case is when we want to update
the symbol list in a mixed build system.
Bug: 186549137
Signed-off-by: Will McVicker <willmcvicker@google.com>
Change-Id: I16d1c348270b4dbb378f009857286acd7b6d8aa3
Prior change
ANDROID: Incremental fs: stat should return actual used blocks
adds blocks to getattr. Unfortunately the code always looks for the
backing file, and pseudo files don't have backing files, so getattr
fails for pseudo files.
Bug: 186567511
Test: incfs_test passes, can do incremental installs on test device
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: Ia3df87f3683e095d05c822b69747515963c95f1c
(cherry picked from commit 9d00e67d8b)
Currently, the sched code checks if the rq clock has been
updated after its lock has been held when CONFIG_SCHED_DEBUG
is enabled. It tracks this by clearing the RQCF_UPDATED bit
when a lock is acquired and setting it upon a subsequent
update_rq_clock() call. It warns if rq clock is read without
RQCF_UPDATED flag indicating the code path missed updating
the clock.
When migrate_tasks() is called during a pause_cpus() event,
the local variable orf is updated with the contents of *rf,
prior to the call to update_rq_clock(). As a result, when
migrate_tasks() restores *rf from the local variable the
RQCF_UPDATED flag is lost. This clearing out of the
RQCF_UPDATED flag leads to a warning when the next task
is being pushed out.
For example in migrate_tasks()
orf = rf; // save flags, RQCF_UPDATE cleared
update_rq_clock() // set RQCF_UPDATE
for()
...
__migrate_task(dead_rq, new_cpu)
...
--> if migration, restore dead_rq's flags with orf.
--> We loose RQCF_UPDATE
rq_relock(dead_rq, orf)
This leaves the current cpu's rq clock_update_flags with the
RQCF_UPDATED flag cleared, an error condition with
CONFIG_SCHED_DEBUG enabled.
Fix the issue for by ensuring that the local variable orf
has the RQCF_UPDATED flag set, allowing the current
CPU's rq to have the flag set and leaving it in a good state
for future usage.
pause_cpus() is currently Android specific. As cpu_pause does
not rely on stop_machine_cpuslocked() like the regular
hotunplug path does, there's a risk for another CPU to
read the rq_clock, after we cleared RQCF_UPDATE, when using
pause_cpus(). This change will have little or no impact outside
of Android currently. If pause_cpus() or drain_rq_cpu_stop()
are merged upstream this change should be merged as well.
Bug: 186222712
Change-Id: Id241122e1449cdd4dcd15f94eb68735b40e3d6f5
Signed-off-by: Stephen Dickey <dickey@codeaurora.org>