[ Upstream commit 6028da3f12 ]
When copying the DSCP bits for decap-dscp into IPv6 don't assume the
outer encap is always IPv6. Instead, as with the inner IPv4 case, copy
the DSCP bits from the correctly saved "tos" value in the control block.
Fixes: 227620e295 ("[IPSEC]: Separate inner/outer mode processing on input")
Signed-off-by: Christian Hopps <chopps@chopps.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b6ee896385 ]
int type = nla_type(nla);
if (type > XFRMA_MAX) {
return -EOPNOTSUPP;
}
@type is then used as an array index and can be used
as a Spectre v1 gadget.
if (nla_len(nla) < compat_policy[type].len) {
array_index_nospec() can be used to prevent leaking
content of kernel memory to malicious users.
Fixes: 5106f4a8ac ("xfrm/compat: Add 32=>64-bit messages translator")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 4ae5e1e97c upstream.
The ISO 11783-5 standard, in "4.5.2 - Address claim requirements", states:
d) No CF shall begin, or resume, transmission on the network until 250
ms after it has successfully claimed an address except when
responding to a request for address-claimed.
But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim
prioritization" show that the CF begins the transmission after 250 ms
from the first AC (address-claimed) message even if it sends another AC
message during that time window to resolve the address contention with
another CF.
As stated in "4.4.2.3 - Address-claimed message":
In order to successfully claim an address, the CF sending an address
claimed message shall not receive a contending claim from another CF
for at least 250 ms.
As stated in "4.4.3.2 - NAME management (NM) message":
1) A commanding CF can
d) request that a CF with a specified NAME transmit the address-
claimed message with its current NAME.
2) A target CF shall
d) send an address-claimed message in response to a request for a
matching NAME
Taking the above arguments into account, the 250 ms wait is requested
only during network initialization.
Do not restart the timer on AC message if both the NAME and the address
match and so if the address has already been claimed (timer has expired)
or the AC message has been sent to resolve the contention with another
CF (timer is still running).
Signed-off-by: Devid Antonio Filoni <devid.filoni@egluetechnologies.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20221125170418.34575-1-devid.filoni@egluetechnologies.com
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6933c01e4 upstream.
Commit 7a8b64d17e ("of/address: use range parser for of_dma_get_range")
converted the parsing of dma-range properties to use code shared with the
PCI range parser. The intent was to introduce no functional changes however
in the case where we fail to translate the first resource instead of
returning -EINVAL the new code we return 0. Restore the previous behaviour
by returning an error if we find no valid ranges, the original code only
handled the first range but subsequently support for parsing all supplied
ranges was added.
This avoids confusing code using the parsed ranges which doesn't expect to
successfully parse ranges but have only a list terminator returned, this
fixes breakage with so far as I can tell all DMA for on SoC devices on the
Socionext Synquacer platform which has a firmware supplied DT. A bisect
identified the original conversion as triggering the issues there.
Fixes: 7a8b64d17e ("of/address: use range parser for of_dma_get_range")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: Luca Di Stefano <luca.distefano@linaro.org>
Cc: 993612@bugs.debian.org
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230126-synquacer-boot-v2-1-cb80fd23c4e2@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e46d910d8 upstream.
poll() and select() on per_cpu trace_pipe and trace_pipe_raw do not work
since kernel 6.1-rc6. This issue is seen after the commit
42fb0a1e84 ("tracing/ring-buffer: Have
polling block on watermark").
This issue is firstly detected and reported, when testing the CXL error
events in the rasdaemon and also erified using the test application for poll()
and select().
This issue occurs for the per_cpu case, when calling the ring_buffer_poll_wait(),
in kernel/trace/ring_buffer.c, with the buffer_percent > 0 and then wait until the
percentage of pages are available. The default value set for the buffer_percent is 50
in the kernel/trace/trace.c.
As a fix, allow userspace application could set buffer_percent as 0 through
the buffer_percent_fops, so that the task will wake up as soon as data is added
to any of the specific cpu buffer.
Link: https://lore.kernel.org/linux-trace-kernel/20230202182309.742-2-shiju.jose@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <mchehab@kernel.org>
Cc: <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org
Fixes: 42fb0a1e84 ("tracing/ring-buffer: Have polling block on watermark")
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ff209ecc37 upstream.
This reverts commit 5e85eba6f5.
Thomas Witt reported that 5e85eba6f5 ("PCI/ASPM: Refactor L1 PM Substates
Control Register programming") broke suspend/resume on a Tuxedo
Infinitybook S 14 v5, which seems to use a Clevo L140CU Mainboard.
The main symptom is:
iwlwifi 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible
nvme 0000:03:00.0: Unable to change power state from D3hot to D0, device inaccessible
and the machine is only partially usable after resume. It can't run dmesg
and can't do a clean reboot. This happens on every suspend/resume cycle.
Revert 5e85eba6f5 until we can figure out the root cause.
Fixes: 5e85eba6f5 ("PCI/ASPM: Refactor L1 PM Substates Control Register programming")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Reported-by: Thomas Witt <kernel@witt.link>
Tested-by: Thomas Witt <kernel@witt.link>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v6.1+
Cc: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a7152be79b upstream.
This reverts commit 4ff116d0d5.
Tasev Nikola and Mark Enriquez reported that resume from suspend was broken
in v6.1-rc1. Tasev bisected to a47126ec29 ("PCI/PTM: Cache PTM
Capability offset"), but we can't figure out how that could be related.
Mark saw the same symptoms and bisected to 4ff116d0d5 ("PCI/ASPM: Save L1
PM Substates Capability for suspend/resume"), which does have a connection:
it restores L1 Substates configuration while ASPM L1 may be enabled:
pci_restore_state
pci_restore_aspm_l1ss_state
aspm_program_l1ss
pci_write_config_dword(PCI_L1SS_CTL1, ctl1) # L1SS restore
pci_restore_pcie_state
pcie_capability_write_word(PCI_EXP_LNKCTL, cap[i++]) # L1 restore
which is a problem because PCIe r6.0, sec 5.5.4, requires that:
If setting either or both of the enable bits for ASPM L1 PM
Substates, both ports must be configured as described in this
section while ASPM L1 is disabled.
Separately, Thomas Witt reported that 5e85eba6f5 ("PCI/ASPM: Refactor L1
PM Substates Control Register programming") broke suspend/resume, and it
depends on 4ff116d0d5.
Revert 4ff116d0d5 ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume") to fix the resume issue and enable revert of 5e85eba6f5
to fix the issue Thomas reported.
Note that reverting 4ff116d0d5 means L1 Substates config may be lost on
suspend/resume. As far as we know the system will use more power but will
still *work* correctly.
Fixes: 4ff116d0d5 ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Reported-by: Tasev Nikola <tasev.stefanoska@skynet.be>
Reported-by: Mark Enriquez <enriquezmark36@gmail.com>
Reported-by: Thomas Witt <kernel@witt.link>
Tested-by: Mark Enriquez <enriquezmark36@gmail.com>
Tested-by: Thomas Witt <kernel@witt.link>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v6.1+
Cc: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c538de0f2 upstream.
There was a recent regression in btrfs/177 that started happening with
the size class patches ("btrfs: introduce size class to block group
allocator"). This however isn't a regression introduced by those
patches, but rather the bug was uncovered by a change in behavior in
these patches. The patches triggered more chunk allocations in the
^free-space-tree case, which uncovered a race with device shrink.
The problem is we will set the device total size to the new size, and
use this to find a hole for a device extent. However during shrink we
may have device extents allocated past this range, so we could
potentially find a hole in a range past our new shrink size. We don't
actually limit our found extent to the device size anywhere, we assume
that we will not find a hole past our device size. This isn't true with
shrink as we're relocating block groups and thus creating holes past the
device size.
Fix this by making sure we do not search past the new device size, and
if we wander into any device extents that start after our device size
simply break from the loop and use whatever hole we've already found.
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3178e8a43 upstream.
The verifier skips invalid kfunc call in check_kfunc_call(), which
would be captured in fixup_kfunc_call() if such insn is not eliminated
by dead code elimination. However, this can lead to the following
warning in backtrack_insn(), also see [1]:
------------[ cut here ]------------
verifier backtracking bug
WARNING: CPU: 6 PID: 8646 at kernel/bpf/verifier.c:2756 backtrack_insn
kernel/bpf/verifier.c:2756
__mark_chain_precision kernel/bpf/verifier.c:3065
mark_chain_precision kernel/bpf/verifier.c:3165
adjust_reg_min_max_vals kernel/bpf/verifier.c:10715
check_alu_op kernel/bpf/verifier.c:10928
do_check kernel/bpf/verifier.c:13821 [inline]
do_check_common kernel/bpf/verifier.c:16289
[...]
So make backtracking conservative with this by returning ENOTSUPP.
[1] https://lore.kernel.org/bpf/CACkBjsaXNceR8ZjkLG=dT3P=4A8SBsg0Z5h5PWLryF5=ghKq=g@mail.gmail.com/
Reported-by: syzbot+4da3ff23081bafe74fc2@syzkaller.appspotmail.com
Signed-off-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230104014709.9375-1-sunhao.th@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 70376c7ff3 upstream.
Check if the inode size of stuffed (inline) inodes is within the allowed
range when reading inodes from disk (gfs2_dinode_in()). This prevents
us from on-disk corruption.
The two checks in stuffed_readpage() and gfs2_unstuffer_page() that just
truncate inline data to the maximum allowed size don't actually make
sense, and they can be removed now as well.
Reported-by: syzbot+7bb81dfa9cda07d9cd9d@syzkaller.appspotmail.com
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7db354444a upstream.
In each of the two functions, add an inode variable that points to
&ip->i_inode and use that throughout the rest of the function.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4920ab131b upstream.
This patch fixes slab-out-of-bounds reads in brcmfmac that occur in
brcmf_construct_chaninfo() and brcmf_enable_bw40_2g() when the count
value of channel specifications provided by the device is greater than
the length of 'list->element[]', decided by the size of the 'list'
allocated with kzalloc(). The patch adds checks that make the functions
free the buffer and return -EINVAL if that is the case. Note that the
negative return is handled by the caller, brcmf_setup_wiphybands() or
brcmf_cfg80211_attach().
Found by a modified version of syzkaller.
Crash Report from brcmf_construct_chaninfo():
==================================================================
BUG: KASAN: slab-out-of-bounds in brcmf_setup_wiphybands+0x1238/0x1430
Read of size 4 at addr ffff888115f24600 by task kworker/0:2/1896
CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G W O 5.14.0+ #132
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Workqueue: usb_hub_wq hub_event
Call Trace:
dump_stack_lvl+0x57/0x7d
print_address_description.constprop.0.cold+0x93/0x334
kasan_report.cold+0x83/0xdf
brcmf_setup_wiphybands+0x1238/0x1430
brcmf_cfg80211_attach+0x2118/0x3fd0
brcmf_attach+0x389/0xd40
brcmf_usb_probe+0x12de/0x1690
usb_probe_interface+0x25f/0x710
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_set_configuration+0x984/0x1770
usb_generic_driver_probe+0x69/0x90
usb_probe_device+0x9c/0x220
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_new_device.cold+0x463/0xf66
hub_event+0x10d5/0x3330
process_one_work+0x873/0x13e0
worker_thread+0x8b/0xd10
kthread+0x379/0x450
ret_from_fork+0x1f/0x30
Allocated by task 1896:
kasan_save_stack+0x1b/0x40
__kasan_kmalloc+0x7c/0x90
kmem_cache_alloc_trace+0x19e/0x330
brcmf_setup_wiphybands+0x290/0x1430
brcmf_cfg80211_attach+0x2118/0x3fd0
brcmf_attach+0x389/0xd40
brcmf_usb_probe+0x12de/0x1690
usb_probe_interface+0x25f/0x710
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_set_configuration+0x984/0x1770
usb_generic_driver_probe+0x69/0x90
usb_probe_device+0x9c/0x220
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_new_device.cold+0x463/0xf66
hub_event+0x10d5/0x3330
process_one_work+0x873/0x13e0
worker_thread+0x8b/0xd10
kthread+0x379/0x450
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff888115f24000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 1536 bytes inside of
2048-byte region [ffff888115f24000, ffff888115f24800)
Memory state around the buggy address:
ffff888115f24500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888115f24580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888115f24600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff888115f24680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888115f24700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Crash Report from brcmf_enable_bw40_2g():
==================================================================
BUG: KASAN: slab-out-of-bounds in brcmf_cfg80211_attach+0x3d11/0x3fd0
Read of size 4 at addr ffff888103787600 by task kworker/0:2/1896
CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G W O 5.14.0+ #132
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Workqueue: usb_hub_wq hub_event
Call Trace:
dump_stack_lvl+0x57/0x7d
print_address_description.constprop.0.cold+0x93/0x334
kasan_report.cold+0x83/0xdf
brcmf_cfg80211_attach+0x3d11/0x3fd0
brcmf_attach+0x389/0xd40
brcmf_usb_probe+0x12de/0x1690
usb_probe_interface+0x25f/0x710
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_set_configuration+0x984/0x1770
usb_generic_driver_probe+0x69/0x90
usb_probe_device+0x9c/0x220
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_new_device.cold+0x463/0xf66
hub_event+0x10d5/0x3330
process_one_work+0x873/0x13e0
worker_thread+0x8b/0xd10
kthread+0x379/0x450
ret_from_fork+0x1f/0x30
Allocated by task 1896:
kasan_save_stack+0x1b/0x40
__kasan_kmalloc+0x7c/0x90
kmem_cache_alloc_trace+0x19e/0x330
brcmf_cfg80211_attach+0x3302/0x3fd0
brcmf_attach+0x389/0xd40
brcmf_usb_probe+0x12de/0x1690
usb_probe_interface+0x25f/0x710
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_set_configuration+0x984/0x1770
usb_generic_driver_probe+0x69/0x90
usb_probe_device+0x9c/0x220
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_new_device.cold+0x463/0xf66
hub_event+0x10d5/0x3330
process_one_work+0x873/0x13e0
worker_thread+0x8b/0xd10
kthread+0x379/0x450
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff888103787000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 1536 bytes inside of
2048-byte region [ffff888103787000, ffff888103787800)
Memory state around the buggy address:
ffff888103787500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888103787580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888103787600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff888103787680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888103787700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Reported-by: Dokyung Song <dokyungs@yonsei.ac.kr>
Reported-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Reported-by: Minsuk Kang <linuxlovemin@yonsei.ac.kr>
Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Minsuk Kang <linuxlovemin@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221116142952.518241-1-linuxlovemin@yonsei.ac.kr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad53db4acb upstream.
The recent commit 76d588dddc ("powerpc/imc-pmu: Fix use of mutex in
IRQs disabled section") fixed warnings (and possible deadlocks) in the
IMC PMU driver by converting the locking to use spinlocks.
It also converted the init-time nest_init_lock to a spinlock, even
though it's not used at runtime in IRQ disabled sections or while
holding other spinlocks.
This leads to warnings such as:
BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc2-14719-gf12cd06109f4-dirty #1
Hardware name: Mambo,Simulated-System POWER9 0x4e1203 opal:v6.6.6 PowerNV
Call Trace:
dump_stack_lvl+0x74/0xa8 (unreliable)
__might_resched+0x178/0x1a0
__cpuhp_setup_state+0x64/0x1e0
init_imc_pmu+0xe48/0x1250
opal_imc_counters_probe+0x30c/0x6a0
platform_probe+0x78/0x110
really_probe+0x104/0x420
__driver_probe_device+0xb0/0x170
driver_probe_device+0x58/0x180
__driver_attach+0xd8/0x250
bus_for_each_dev+0xb4/0x140
driver_attach+0x34/0x50
bus_add_driver+0x1e8/0x2d0
driver_register+0xb4/0x1c0
__platform_driver_register+0x38/0x50
opal_imc_driver_init+0x2c/0x40
do_one_initcall+0x80/0x360
kernel_init_freeable+0x310/0x3b8
kernel_init+0x30/0x1a0
ret_from_kernel_thread+0x5c/0x64
Fix it by converting nest_init_lock back to a mutex, so that we can call
sleeping functions while holding it. There is no interaction between
nest_init_lock and the runtime spinlocks used by the actual PMU routines.
Fixes: 76d588dddc ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section")
Tested-by: Kajol Jain<kjain@linux.ibm.com>
Reviewed-by: Kajol Jain<kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230130014401.540543-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc88ef6632 upstream.
When PMI interrupts are soft-masked, local_irq_save() will clear the PMI
mask bit, allowing PMIs in and causing a race condition. This causes a
deadlock in native_hpte_insert via hash_preload, which depends on PMIs
being disabled since commit 8b91cee5ea ("powerpc/64s/hash: Make hash
faults work in NMI context"). native_hpte_insert calls local_irq_save().
It's possible the lpar hash code is also affected when tracing is
enabled because __trace_hcall_entry() calls local_irq_save().
Fix this by making arch_local_irq_save() _or_ the IRQS_DISABLED bit into
the mask.
This was found with the stress_hpt option with a kbuild workload running
together with `perf record -g`.
Fixes: f442d00480 ("powerpc/64s: Add support to mask perf interrupts and replay them")
Fixes: 8b91cee5ea ("powerpc/64s/hash: Make hash faults work in NMI context")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Just take the fix without the new warning]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121095352.2823517-1-npiggin@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98d0219e04 upstream.
If a relocatable kernel is loaded at an address that is not 2MB aligned
and told not to relocate to zero, the kernel can crash due to
mark_rodata_ro() incorrectly changing some read-write data to read-only.
Scenarios where the misalignment can occur are when the kernel is
loaded by kdump or using the RELOCATABLE_TEST config option.
Example crash with the kernel loaded at 5MB:
Run /sbin/init as init process
BUG: Unable to handle kernel data access on write at 0xc000000000452000
Faulting instruction address: 0xc0000000005b6730
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
CPU: 1 PID: 1 Comm: init Not tainted 6.2.0-rc1-00011-g349188be4841 #166
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-5b4c5a hv:linux,kvm pSeries
NIP: c0000000005b6730 LR: c000000000ae9ab8 CTR: 0000000000000380
REGS: c000000004503250 TRAP: 0300 Not tainted (6.2.0-rc1-00011-g349188be4841)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44288480 XER: 00000000
CFAR: c0000000005b66ec DAR: c000000000452000 DSISR: 0a000000 IRQMASK: 0
...
NIP memset+0x68/0x104
LR zero_user_segments.constprop.0+0xa8/0xf0
Call Trace:
ext4_mpage_readpages+0x7f8/0x830
ext4_readahead+0x48/0x60
read_pages+0xb8/0x380
page_cache_ra_unbounded+0x19c/0x250
filemap_fault+0x58c/0xae0
__do_fault+0x60/0x100
__handle_mm_fault+0x1230/0x1a40
handle_mm_fault+0x120/0x300
___do_page_fault+0x20c/0xa80
do_page_fault+0x30/0xc0
data_access_common_virt+0x210/0x220
This happens because mark_rodata_ro() tries to change permissions on the
range _stext..__end_rodata, but _stext sits in the middle of the 2MB
page from 4MB to 6MB:
radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
radix-mmu: Mapped 0x0000000000400000-0x0000000002400000 with 2.00 MiB pages (exec)
The logic that changes the permissions assumes the linear mapping was
split correctly at boot, so it marks the entire 2MB page read-only. That
leads to the write fault above.
To fix it, the boot time mapping logic needs to consider that if the
kernel is running at a non-zero address then _stext is a boundary where
it must split the mapping.
That leads to the mapping being split correctly, allowing the rodata
permission change to take happen correctly, with no spillover:
radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
radix-mmu: Mapped 0x0000000000400000-0x0000000000500000 with 64.0 KiB pages
radix-mmu: Mapped 0x0000000000500000-0x0000000000600000 with 64.0 KiB pages (exec)
radix-mmu: Mapped 0x0000000000600000-0x0000000002400000 with 2.00 MiB pages (exec)
If the kernel is loaded at a 2MB aligned address, the mapping continues
to use 2MB pages as before:
radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
radix-mmu: Mapped 0x0000000000400000-0x0000000002c00000 with 2.00 MiB pages (exec)
radix-mmu: Mapped 0x0000000002c00000-0x0000000100000000 with 2.00 MiB pages
Fixes: c55d7b5e64 ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230110124753.1325426-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit abce209d18 upstream.
Using the serio subsystem now requires the code to be reachable:
x86_64-linux-ld: drivers/platform/x86/amd/pmc.o: in function `amd_pmc_suspend_handler':
pmc.c:(.text+0x86c): undefined reference to `serio_bus'
Add the usual dependency: as other users of serio use 'select'
rather than 'depends on', use the same here.
Fixes: 8e60615e89 ("platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230127093950.2368575-1-arnd@kernel.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57e9af7831 upstream.
As DMA Rx can be completed from two places, it is possible that DMA Rx
completes before DMA completion callback had a chance to complete it.
Once the previous DMA Rx has been completed, a new one can be started
on the next UART interrupt. The following race is possible
(uart_unlock_and_check_sysrq_irqrestore() replaced with
spin_unlock_irqrestore() for simplicity/clarity):
CPU0 CPU1
dma_rx_complete()
serial8250_handle_irq()
spin_lock_irqsave(&port->lock)
handle_rx_dma()
serial8250_rx_dma_flush()
__dma_rx_complete()
dma->rx_running = 0
// Complete DMA Rx
spin_unlock_irqrestore(&port->lock)
serial8250_handle_irq()
spin_lock_irqsave(&port->lock)
handle_rx_dma()
serial8250_rx_dma()
dma->rx_running = 1
// Setup a new DMA Rx
spin_unlock_irqrestore(&port->lock)
spin_lock_irqsave(&port->lock)
// sees dma->rx_running = 1
__dma_rx_complete()
dma->rx_running = 0
// Incorrectly complete
// running DMA Rx
This race seems somewhat theoretical to occur for real but handle it
correctly regardless. Check what is the DMA status before complething
anything in __dma_rx_complete().
Reported-by: Gilles BULOZ <gilles.buloz@kontron.com>
Tested-by: Gilles BULOZ <gilles.buloz@kontron.com>
Fixes: 9ee4b83e51 ("serial: 8250: Add support for dmaengine")
Cc: stable@vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230130114841.25749-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab3428cfd9 upstream.
The i.MX6 CPU frequency driver sometimes fails to register at boot time
due to nvmem_cell_read_u32() sporadically returning -ENOENT.
This happens because there is a window where __nvmem_device_get() in
of_nvmem_cell_get() is able to return the nvmem device, but as cells
have been setup, nvmem_find_cell_entry_by_node() returns NULL.
The occurs because the nvmem core registration code violates one of the
fundamental principles of kernel programming: do not publish data
structures before their setup is complete.
Fix this by making nvmem core code conform with this principle.
Fixes: eace75cfdc ("nvmem: Add a simple NVMEM framework for nvmem providers")
Cc: stable@vger.kernel.org
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230127104015.23839-7-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 560181d3ac upstream.
If dev_set_name() fails, we leak nvmem->wp_gpio as the cleanup does not
put this. While a minimal fix for this would be to add the gpiod_put()
call, we can do better if we split device_register(), and use the
tested nvmem_release() cleanup code by initialising the device early,
and putting the device.
This results in a slightly larger fix, but results in clear code.
Note: this patch depends on "nvmem: core: initialise nvmem->id early"
and "nvmem: core: remove nvmem_config wp_gpio".
Fixes: 5544e90c81 ("nvmem: core: add error handling for dev_set_name")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[Srini: Fixed subject line and error code handing with wp_gpio while applying.]
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230127104015.23839-6-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>