Add hooks to capture various per-zone memory stats when
a trigger threshold is hit.
Bug: 178721511
Change-Id: Ibe39263ddb05ffc3fa63b5225497a90c6480c8d7
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
Temporarily disable KFENCE to investigate a power regression we've
encountered on idle systems.
There is a potential fix already, but we don't want to further block the
performance team.
Bug: 185280916
Test: build
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Ic5e4f48b48d9c059e3087f85e9edcb6921bee012
Reverse migration is used to do the balancing the occupancy of memory
zones in a node in the system whose imabalance may be caused by
migration of pages to other zones by an operation, eg: hotremove and
then hotadding the same memory. In this case there is a lot of free
memory in newly hotadd memory which can be filled up by the previous
migrated pages(as part of offline/hotremove) thus may free up some
pressure in other zones of the node.
Upstream discussion: https://lore.kernel.org/patchwork/cover/1382106/
Change-Id: Ib3137dab0db66ecf6858c4077dcadb9dfd0c6b1c
Bug: 175403896
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
TEO governor was implemented in android12-5.10, however CONFIG_CPU_IDLE_GOV_TEO
was not enabled in gki_defconfig and vendor can not select TEO governor.
This commit enable TEO and MENU governors in gki_defconfig, and vendors can
select the governor they wanted, e.g. in rc file:
write /sys/devices/system/cpu/cpuidle/current_governor "teo"
Besides, MENU governor's rating is 20, and higher than TEO governor,
so MENU governor is still the default governor if vendors not select the governor manually.
Bug: 185762657
Change-Id: I87be7c4d119f17901b921f22dd7df8b61ac539af
Signed-off-by: rogercl.yang <rogercl.yang@mediatek.com>
android_rvh_replace_next_task_fair hook allows vendor modules to
override the next task selected by CFS. However the current code is
not calling set_next_entity() on the hierarchy of the replaced task
in the case where the previous task is CFS. Fix this issue.
Bug: 184720311
Change-Id: If6c35b1ddefd0829cd236dd821e5ac8aef7347c6
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
To provide more fleixbility in debugging GKI builds, create a mechanism
for adding an optional file which may override some of the build vars,
such as POST_DEFCONFIG_CMDS.
Bug: 171343315
Signed-off-by: J. Avila <elavila@google.com>
Change-Id: I9f4a7bad2c19bcdb3b494c97d7c1eea5c7f311ba
Changes in 5.10.31
interconnect: core: fix error return code of icc_link_destroy()
gfs2: Flag a withdraw if init_threads() fails
KVM: arm64: Hide system instruction access to Trace registers
KVM: arm64: Disable guest access to trace filter controls
drm/imx: imx-ldb: fix out of bounds array access warning
gfs2: report "already frozen/thawed" errors
ftrace: Check if pages were allocated before calling free_pages()
tools/kvm_stat: Add restart delay
drm/tegra: dc: Don't set PLL clock to 0Hz
gpu: host1x: Use different lock classes for each client
XArray: Fix splitting to non-zero orders
block: only update parent bi_status when bio fail
radix tree test suite: Register the main thread with the RCU library
idr test suite: Take RCU read lock in idr_find_test_1
idr test suite: Create anchor before launching throbber
null_blk: fix command timeout completion handling
io_uring: don't mark S_ISBLK async work as unbounded
riscv,entry: fix misaligned base for excp_vect_table
block: don't ignore REQ_NOWAIT for direct IO
netfilter: x_tables: fix compat match/target pad out-of-bound write
perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches
net: sfp: relax bitrate-derived mode check
net: sfp: cope with SFPs that set both LOS normal and LOS inverted
xen/events: fix setting irq affinity
Linux 5.10.31
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I19a7cfbdaab23e578dd82c552aea86d367c2f40f
The backport of upstream patch 25da4618af ("xen/events: don't
unmask an event channel when an eoi is pending") introduced a
regression for stable kernels 5.10 and older: setting IRQ affinity for
IRQs related to interdomain events would no longer work, as moving the
IRQ to its new cpu was not included in the irq_ack callback for those
events.
Fix that by adding the needed call.
Note that kernels 5.11 and later don't need the explicit moving of the
IRQ to the target cpu in the irq_ack callback, due to a rework of the
affinity setting in kernel 5.11.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 624407d2cf upstream.
The SFP MSA defines two option bits in byte 65 to indicate how the
Rx_LOS signal on SFP pin 8 behaves:
bit 2 - Loss of Signal implemented, signal inverted from standard
definition in SFP MSA (often called "Signal Detect").
bit 1 - Loss of Signal implemented, signal as defined in SFP MSA
(often called "Rx_LOS").
Clearly, setting both bits results in a meaningless situation: it would
mean that LOS is implemented in both the normal sense (1 = signal loss)
and inverted sense (0 = signal loss).
Unfortunately, there are modules out there which set both bits, which
will be initially interpret as "inverted" sense, and then, if the LOS
signal changes state, we will toggle between LINK_UP and WAIT_LOS
states.
Change our LOS handling to give well defined behaviour: only interpret
these bits as meaningful if exactly one is set, otherwise treat it as
if LOS is not implemented.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/E1kyYQa-0004iR-CU@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cc: Pali Rohár <pali@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a77233ec6 upstream.
Do not check the encoding when deriving 1000BASE-X from the bitrate
when no other modes are discovered. Some GPON modules (VSOL V2801F
and CarlitoxxPro CPGOS03-0490 v2.0) indicate NRZ encoding with a
1200Mbaud bitrate, but should be driven with 1000BASE-X on the host
side.
Tested-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 77d02bd00c upstream.
Noticed on a debian:experimental mips and mipsel cross build build
environment:
perfbuilder@ec265a086e9b:~$ mips-linux-gnu-gcc --version | head -1
mips-linux-gnu-gcc (Debian 10.2.1-3) 10.2.1 20201224
perfbuilder@ec265a086e9b:~$
CC /tmp/build/perf/util/map.o
util/map.c: In function 'map__new':
util/map.c:109:5: error: '%s' directive output may be truncated writing between 1 and 2147483645 bytes into a region of size 4096 [-Werror=format-truncation=]
109 | "%s/platforms/%s/arch-%s/usr/lib/%s",
| ^~
In file included from /usr/mips-linux-gnu/include/stdio.h:867,
from util/symbol.h:11,
from util/map.c:2:
/usr/mips-linux-gnu/include/bits/stdio2.h:67:10: note: '__builtin___snprintf_chk' output 32 or more bytes (assuming 4294967321) into a destination of size 4096
67 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
68 | __bos (__s), __fmt, __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Since we have the lenghts for what lands in that place, use it to give
the compiler more info and make it happy.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f8b78caf21 ]
If IOCB_NOWAIT is set on submission, then that needs to get propagated to
REQ_NOWAIT on the block side. Otherwise we completely lose this
information, and any issuer of IOCB_NOWAIT IO will potentially end up
blocking on eg request allocation on the storage side.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ac8d0b901f ]
In RV64, the size of each entry in excp_vect_table is 8 bytes. If the
base of the table is not 8-byte aligned, loading an entry in the table
will raise a misaligned exception. Although such exception will be
handled by opensbi/bbl, this still causes performance degradation.
Signed-off-by: Zihao Yu <yuzihao@ict.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4b982bd0f3 ]
S_ISBLK is marked as unbounded work for async preparation, because it
doesn't match S_ISREG. That is incorrect, as any read/write to a block
device is also a bounded operation. Fix it up and ensure that S_ISBLK
isn't marked unbounded.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit de3510e52b ]
Memory backed or zoned null block devices may generate actual request
timeout errors due to the submission path being blocked on memory
allocation or zone locking. Unlike fake timeouts or injected timeouts,
the request submission path will call blk_mq_complete_request() or
blk_mq_end_request() for these real timeout errors, causing a double
completion and use after free situation as the block layer timeout
handler executes blk_mq_rq_timed_out() and __blk_mq_free_request() in
blk_mq_check_expired(). This problem often triggers a NULL pointer
dereference such as:
BUG: kernel NULL pointer dereference, address: 0000000000000050
RIP: 0010:blk_mq_sched_mark_restart_hctx+0x5/0x20
...
Call Trace:
dd_finish_request+0x56/0x80
blk_mq_free_request+0x37/0x130
null_handle_cmd+0xbf/0x250 [null_blk]
? null_queue_rq+0x67/0xd0 [null_blk]
blk_mq_dispatch_rq_list+0x122/0x850
__blk_mq_do_dispatch_sched+0xbb/0x2c0
__blk_mq_sched_dispatch_requests+0x13d/0x190
blk_mq_sched_dispatch_requests+0x30/0x60
__blk_mq_run_hw_queue+0x49/0x90
process_one_work+0x26c/0x580
worker_thread+0x55/0x3c0
? process_one_work+0x580/0x580
kthread+0x134/0x150
? kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x1f/0x30
This problem very often triggers when running the full btrfs xfstests
on a memory-backed zoned null block device in a VM with limited amount
of memory.
Avoid this by executing blk_mq_complete_request() in null_timeout_rq()
only for commands that are marked for a fake timeout completion using
the fake_timeout boolean in struct null_cmd. For timeout errors injected
through debugfs, the timeout handler will execute
blk_mq_complete_request()i as before. This is safe as the submission
path does not execute complete requests in this case.
In null_timeout_rq(), also make sure to set the command error field to
BLK_STS_TIMEOUT and to propagate this error through to the request
completion.
Reported-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Reviewed-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210331225244.126426-1-damien.lemoal@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 094ffbd1d8 ]
The throbber could race with creation of the anchor entry and cause the
IDR to have zero entries in it, which would cause the test to fail.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 703586410d ]
When run on a single CPU, this test would frequently access already-freed
memory. Due to timing, this bug never showed up on multi-CPU tests.
Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1bb4bd266c ]
Several test runners register individual worker threads with the
RCU library, but neglect to register the main thread, which can lead
to objects being freed while the main thread is in what appears to be
an RCU critical section.
Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3edf5346e4 ]
For multiple split bios, if one of the bio is fail, the whole
should return error to application. But we found there is a race
between bio_integrity_verify_fn and bio complete, which return
io success to application after one of the bio fail. The race as
following:
split bio(READ) kworker
nvme_complete_rq
blk_update_request //split error=0
bio_endio
bio_integrity_endio
queue_work(kintegrityd_wq, &bip->bip_work);
bio_integrity_verify_fn
bio_endio //split bio
__bio_chain_endio
if (!parent->bi_status)
<interrupt entry>
nvme_irq
blk_update_request //parent error=7
req_bio_endio
bio->bi_status = 7 //parent bio
<interrupt exit>
parent->bi_status = 0
parent->bi_end_io() // return bi_status=0
The bio has been split as two: split and parent. When split
bio completed, it depends on kworker to do endio, while
bio_integrity_verify_fn have been interrupted by parent bio
complete irq handler. Then, parent bio->bi_status which have
been set in irq handler will overwrite by kworker.
In fact, even without the above race, we also need to conside
the concurrency beteen mulitple split bio complete and update
the same parent bi_status. Normally, multiple split bios will
be issued to the same hctx and complete from the same irq
vector. But if we have updated queue map between multiple split
bios, these bios may complete on different hw queue and different
irq vector. Then the concurrency update parent bi_status may
cause the final status error.
Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210331115359.1125679-1-yuyufen@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3012110d71 ]
Splitting an order-4 entry into order-2 entries would leave the array
containing pointers to 000040008000c000 instead of 000044448888cccc.
This is a one-character fix, but enhance the test suite to check this
case.
Reported-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a24f98176d ]
To avoid false lockdep warnings, give each client lock a different
lock class, passed from the initialization site by macro.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f8fb97c915 ]
RGB output doesn't allow to change parent clock rate of the display and
PCLK rate is set to 0Hz in this case. The tegra_dc_commit_state() shall
not set the display clock to 0Hz since this change propagates to the
parent clock. The DISP clock is defined as a NODIV clock by the tegra-clk
driver and all NODIV clocks use the CLK_SET_RATE_PARENT flag.
This bug stayed unnoticed because by default PLLP is used as the parent
clock for the display controller and PLLP silently skips the erroneous 0Hz
rate changes because it always has active child clocks that don't permit
rate changes. The PLLP isn't acceptable for some devices that we want to
upstream (like Samsung Galaxy Tab and ASUS TF700T) due to a display panel
clock rate requirements that can't be fulfilled by using PLLP and then the
bug pops up in this case since parent clock is set to 0Hz, killing the
display output.
Don't touch DC clock if pclk=0 in order to fix the problem.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75f94ecbd0 ]
If this service is enabled and the system rebooted, Systemd's initial
attempt to start this unit file may fail in case the kvm module is not
loaded. Since we did not specify a delay for the retries, Systemd
restarts with a minimum delay a number of times before giving up and
disabling the service. Which means a subsequent kvm module load will
have kvm running without monitoring.
Adding a delay to fix this.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Message-Id: <20210325122949.1433271-1-raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ff132c5f93 ]
Before this patch, gfs2's freeze function failed to report an error
when the target file system was already frozen as it should (and as
generic vfs function freeze_super does. Similarly, gfs2's thaw function
failed to report an error when trying to thaw a file system that is not
frozen, as vfs function thaw_super does. The errors were checked, but
it always returned a 0 return code.
This patch adds the missing error return codes to gfs2 freeze and thaw.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 33ce7f2f95 ]
When CONFIG_OF is disabled, building with 'make W=1' produces warnings
about out of bounds array access:
drivers/gpu/drm/imx/imx-ldb.c: In function 'imx_ldb_set_clock.constprop':
drivers/gpu/drm/imx/imx-ldb.c:186:8: error: array subscript -22 is below array bounds of 'struct clk *[4]' [-Werror=array-bounds]
Add an error check before the index is used, which helps with the
warning, as well as any possible other error condition that may be
triggered at runtime.
The warning could be fixed by adding a Kconfig depedency on CONFIG_OF,
but Liu Ying points out that the driver may hit the out-of-bounds
problem at runtime anyway.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Liu Ying <victor.liu@nxp.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d676673d6 ]
Currently we advertise the ID_AA6DFR0_EL1.TRACEVER for the guest,
when the trace register accesses are trapped (CPTR_EL2.TTA == 1).
So, the guest will get an undefined instruction, if trusts the
ID registers and access one of the trace registers.
Lets be nice to the guest and hide the feature to avoid
unexpected behavior.
Even though this can be done at KVM sysreg emulation layer,
we do this by removing the TRACEVER from the sanitised feature
register field. This is fine as long as the ETM drivers
can handle the individual trace units separately, even
when there are differences among the CPUs.
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210323120647.454211-2-suzuki.poulose@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 62dd0f98a0 ]
Interrupting mount with ^C quickly enough can cause the kthread_run()
calls in gfs2's init_threads() to fail and the error path leads to a
deadlock on the s_umount rwsem. The abridged chain of events is:
[mount path]
get_tree_bdev()
sget_fc()
alloc_super()
down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); [acquired]
gfs2_fill_super()
gfs2_make_fs_rw()
init_threads()
kthread_run()
( Interrupted )
[Error path]
gfs2_gl_hash_clear()
flush_workqueue(glock_workqueue)
wait_for_completion()
[workqueue context]
glock_work_func()
run_queue()
do_xmote()
freeze_go_sync()
freeze_super()
down_write(&sb->s_umount) [deadlock]
In freeze_go_sync() there is a gfs2_withdrawn() check that we can use to
make sure freeze_super() is not called in the error path, so add a
gfs2_withdraw_delayed() call when init_threads() fails.
Ref: https://bugzilla.kernel.org/show_bug.cgi?id=212231
Reported-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Several symbols in DesignWare PCI Core driver are used
to support PCIe controller work at Endpoint Mode.
Export them to support such driver built as module.
Bug: 159736148
Signed-off-by: Jindong Yue <jindong.yue@nxp.com>
Change-Id: I2c77bfbfbec750ecdf21f0cd1bbb20fe339a2f0c