Fallback to the default module path (/lib/modules/<uname>) if module
loading failed for the selected path in CONFIG_PKVM_MODULE_PATH. This
intends to follow the same mechanism as Android init.
Bug: 254835242
Change-Id: Ia7764d57fe71521e4a1fe6d2c85ba057790069a8
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Currently, no module path will be given to modprobe when loading a pKVM
module, the module must then be found in /lib/modules/<uname>. Add
CONFIG_PKVM_MODULE_PATH to allow setting a different path from the
kernel config.
Bug: 254835242
Change-Id: I4f355518628b44ac03de2cee3d7a90e1ad5bf1e2
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
This patch adds POSIX_FADV_NOREUSE to vma_has_recency() so that the LRU
algorithm can ignore access to mapped files marked by this flag.
The advantages of POSIX_FADV_NOREUSE are:
1. Unlike MADV_SEQUENTIAL and MADV_RANDOM, it does not alter the
default readahead behavior.
2. Unlike MADV_SEQUENTIAL and MADV_RANDOM, it does not split VMAs and
therefore does not take mmap_lock.
3. Unlike MADV_COLD, setting it has a negligible cost, regardless of
how many pages it affects.
Its limitations are:
1. Like POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL, it currently does
not support range. IOW, its scope is the entire file.
2. It currently does not ignore access through file descriptors.
Specifically, for the active/inactive LRU, given a file page shared
by two users and one of them having set POSIX_FADV_NOREUSE on the
file, this page will be activated upon the second user accessing
it. This corner case can be covered by checking POSIX_FADV_NOREUSE
before calling folio_mark_accessed() on the read path. But it is
considered not worth the effort.
There have been a few attempts to support POSIX_FADV_NOREUSE, e.g., [1].
This time the goal is to fill a niche: a few desktop applications, e.g.,
large file transferring and video encoding/decoding, want fast file
streaming with mmap() rather than direct IO. Among those applications, an
SVT-AV1 regression was reported when running with MGLRU [2]. The
following test can reproduce that regression.
kb=$(awk '/MemTotal/ { print $2 }' /proc/meminfo)
kb=$((kb - 8*1024*1024))
modprobe brd rd_nr=1 rd_size=$kb
dd if=/dev/zero of=/dev/ram0 bs=1M
mkfs.ext4 /dev/ram0
mount /dev/ram0 /mnt/
swapoff -a
fallocate -l 8G /mnt/swapfile
mkswap /mnt/swapfile
swapon /mnt/swapfile
wget http://ultravideo.cs.tut.fi/video/Bosphorus_3840x2160_120fps_420_8bit_YUV_Y4M.7z
7z e -o/mnt/ Bosphorus_3840x2160_120fps_420_8bit_YUV_Y4M.7z
SvtAv1EncApp --preset 12 -w 3840 -h 2160 \
-i /mnt/Bosphorus_3840x2160.y4m
For MGLRU, the following change showed a [9-11]% increase in FPS,
which makes it on par with the active/inactive LRU.
patch Source/App/EncApp/EbAppMain.c <<EOF
31a32
> #include <fcntl.h>
35d35
< #include <fcntl.h> /* _O_BINARY */
117a118
> posix_fadvise(config->mmap.fd, 0, 0, POSIX_FADV_NOREUSE);
EOF
[1] https://lore.kernel.org/r/1308923350-7932-1-git-send-email-andrea@betterlinux.com/
[2] https://openbenchmarking.org/result/2209259-PTS-MGLRU8GB57
Link: https://lkml.kernel.org/r/20221230215252.2628425-2-yuzhao@google.com
Change-Id: Iee2a7df5ccd86162089e007e32f9fa9b2b9f198b
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 17e810229c)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Add vma_has_recency() to indicate whether a VMA may exhibit temporal
locality that the LRU algorithm relies on.
This function returns false for VMAs marked by VM_SEQ_READ or
VM_RAND_READ. While the former flag indicates linear access, i.e., a
special case of spatial locality, both flags indicate a lack of temporal
locality, i.e., the reuse of an area within a relatively small duration.
"Recency" is chosen over "locality" to avoid confusion between temporal
and spatial localities.
Before this patch, the active/inactive LRU only ignored the accessed bit
from VMAs marked by VM_SEQ_READ. After this patch, the active/inactive
LRU and MGLRU share the same logic: they both ignore the accessed bit if
vma_has_recency() returns false.
For the active/inactive LRU, the following fio test showed a [6, 8]%
increase in IOPS when randomly accessing mapped files under memory
pressure.
kb=$(awk '/MemTotal/ { print $2 }' /proc/meminfo)
kb=$((kb - 8*1024*1024))
modprobe brd rd_nr=1 rd_size=$kb
dd if=/dev/zero of=/dev/ram0 bs=1M
mkfs.ext4 /dev/ram0
mount /dev/ram0 /mnt/
swapoff -a
fio --name=test --directory=/mnt/ --ioengine=mmap --numjobs=8 \
--size=8G --rw=randrw --time_based --runtime=10m \
--group_reporting
The discussion that led to this patch is here [1]. Additional test
results are available in that thread.
[1] https://lore.kernel.org/r/Y31s%2FK8T85jh05wH@google.com/
Link: https://lkml.kernel.org/r/20221230215252.2628425-1-yuzhao@google.com
Change-Id: I3e0dfa1ca2a3c14fb0239b8f612c697e1d8d6a64
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8788f67814)
[TJ: #include <linux/mm_inline.h>, folio -> page renames]
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
When running as a Xen PV guests commit eed9a328aa ("mm: x86: add
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG") can cause a protection violation in
pmdp_test_and_clear_young():
BUG: unable to handle page fault for address: ffff8880083374d0
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 3026067 P4D 3026067 PUD 3027067 PMD 7fee5067 PTE 8010000008337065
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 158 Comm: kswapd0 Not tainted 6.1.0-rc5-20221118-doflr+ #1
RIP: e030:pmdp_test_and_clear_young+0x25/0x40
This happens because the Xen hypervisor can't emulate direct writes to
page table entries other than PTEs.
This can easily be fixed by introducing arch_has_hw_nonleaf_pmd_young()
similar to arch_has_hw_pte_young() and test that instead of
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG.
Link: https://lkml.kernel.org/r/20221123064510.16225-1-jgross@suse.com
Fixes: eed9a328aa ("mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG")
Change-Id: Ib88d54891c619feccc7eb40a14d5ba6e349912d6
Signed-off-by: Juergen Gross <jgross@suse.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Yu Zhao <yuzhao@google.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: David Hildenbrand <david@redhat.com> [core changes]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 4aaf269c76)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Set KMI_GENERATION=4 for 4/12 KMI update
function symbol 'struct block_device* I_BDEV(struct inode*)' changed
CRC changed from 0x6d0d2bcc to 0xf2df037e
function symbol 'void* PDE_DATA(const struct inode*)' changed
CRC changed from 0xbb8786db to 0x3c36f860
function symbol 'void __ClearPageMovable(struct page*)' changed
CRC changed from 0x6cd6821a to 0xafefd4e
... 2828 omitted; 2831 symbols have only CRC changes
type 'struct dentry_operations' changed
member changed from 'void(* d_canonical_path)(const struct path*, struct path*)' to 'int(* d_canonical_path)(const struct path*, struct path*)'
type changed from 'void(*)(const struct path*, struct path*)' to 'int(*)(const struct path*, struct path*)'
pointed-to type changed from 'void(const struct path*, struct path*)' to 'int(const struct path*, struct path*)'
return type changed from 'void' to 'int'
Bug: 277759776
Change-Id: I5f3ed46e6804dcf0db745d4e6dc7c3a317f64648
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Previously errors from the daemon in FUSE_CANONICAL_PATH were simply
ignored. In order to block inotifys, it is useful to be able to return
errors from this opcode.
Bug: 238619640
Test: inotify no longer works on /storage/emulated/0/Android/media but
does on child folders
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: Icb15c090c6286c174338471a787712f8388de316
SoCs featuring peripherals that can issue non-coherent DMA traffic
beyond the point of coherency (PoC) present multiple challenges for the
DMA-API implementation in Linux. Many of these challenges can be
overcome by suitable configuration of the interconnect, however the
presence of a cacheable alias for non-cacheable buffers can still lead
to coherence issues arising when stale clean lines are back-snooped from
the cache hierarchy to satisfy a non-cacheable transaction at the PoC.
Removing all cacheable aliases on a case-by-cases basis is both
error-prone and expensive. Instead, leverage the stage-2 identity
mapping installed by pKVM to enforce consistent cacheability for all
stage-1 aliases.
Bug: 240786634
Change-Id: I78b0aa51fe3e23811bbd25481173086aa957c4bf
Signed-off-by: Will Deacon <willdeacon@google.com>
Enable UAC2 function driver in x86 gki_defconfig for feature parity with
arm64 gki_defconfig.
Bug: 277271545
Change-Id: I4c602a2e791ecc03dc7d63c131dbe0982f1998c8
Signed-off-by: Neill Kapron <nkapron@google.com>
(cherry picked from commit a6ef8539c18177fbd541cc577ddd6562c9c343a1)
This reverts commit 487a32ec24.
should_skip_kasan_poison() reads the PG_skip_kasan_poison flag from
page->flags. However, this line of code in free_pages_prepare():
page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
clears most of page->flags, including PG_skip_kasan_poison, before calling
should_skip_kasan_poison(), which meant that it would never return true as
a result of the page flag being set. Therefore, fix the code to call
should_skip_kasan_poison() before clearing the flags, as we were doing
before the reverted patch.
This fixes a measurable performance regression introduced in the reverted
commit, where munmap() takes longer than intended if HW tags KASAN is
supported and enabled at runtime. Without this patch, we see a
single-digit percentage performance regression in a particular
mmap()-heavy benchmark when enabling HW tags KASAN, and with the patch,
there is no statistically significant performance impact when enabling HW
tags KASAN.
Link: https://lkml.kernel.org/r/20230310042914.3805818-2-pcc@google.com
Fixes: 487a32ec24 ("kasan: drop skip_kasan_poison variable in free_pages_prepare")
Link: https://linux-review.googlesource.com/id/Ic4f13affeebd20548758438bb9ed9ca40e312b79
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org> [6.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit f446883d12)
Change-Id: Ic4f13affeebd20548758438bb9ed9ca40e312b79
Bug: 268383694
The greybus loopback test tool does not belong burried down in a
driver-specific directory. If it is needed, it should be somewhere
else, like in the testing directory. But as the loopback driver is
probably never going to be merged out of the staging directory, let's
just delete the test alltogether for now. If it's needed in the future,
it can be brought back with a revert.
Also, having an Android.mk file in the kernel source tree breaks some
Android build systems when trying to build from a read-only source tree,
the report of which prompted this being a good reason to remove it as
well.
Cc: Johan Hovold <johan@kernel.org>
Cc: Alex Elder <elder@kernel.org>
Cc: Jack Schofield <schofija@oregonstate.edu>
Cc: Vaibhav Nagare <vnagare@redhat.com>
Cc: greybus-dev@lists.linaro.org
Change-Id: Ie19bcd87d0062d8453ff0d3d998224a7e478f3d0
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 275824813
Link: https://lore.kernel.org/r/2023040613-huntsman-goldsmith-4a41@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
The HVC number limit must be the id inside the enum, not the fully
encoded version. Without this fix, all HVC would be rejected after
closing the pKVM modules door.
Bug: 254835242
Change-Id: Ia338859e07412ea1c2377b90ddee2c29c6fb0755
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
As a fail-safe mechanism, disable pKVM module loading after the IOMMU
init is complete. This intends to make sure nothing can be loaded after
the IOMMU driver is ready, in case a vendor would forget to call
__pkvm_close_module_registration
Bug: 254835242
Change-Id: I32a9752e145a8ef9787dae6319e37ba38544739b
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
pKVM module loading is disabled by rejecting the HVCs. This is a problem
for kvm_call_hyp_nvhe(__pkvm_alloc_module_va). First a WARN will be
trigger, then the res.a1 being very much likely non 0 the error will
not be caught. The loading would then proceed (even though no EL2 code
will be actually loaded).
Fix this behaviour by catching when the HVC is blocked, so the user gets
a meaningful error returned and the driver is not half-loaded.
Bug: 254835242
Change-Id: Ieeca6eebb083d99f8d6b79ebbc486a7c9f7d334e
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Factor out the logic for setting the SVE vector length at the
hypervisor in pKVM to the maximum hardware supported value when
saving and restoring host SVE state.
This shares common code, and elides an unused variable warning
when SVE is not configured.
Change-Id: Ibaa58e5bb44e4f85b9548a6abb498f0a769f6a4b
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
While determining the initial pin assignment to be sent in the configure
message, using the DP_PIN_ASSIGN_DP_ONLY_MASK mask causes the DFP_U to
send both Pin Assignment C and E when both are supported by the DFP_U and
UFP_U. The spec (Table 5-7 DFP_U Pin Assignment Selection Mandates,
VESA DisplayPort Alt Mode Standard v2.0) indicates that the DFP_U never
selects Pin Assignment E when Pin Assignment C is offered.
Update the DP_PIN_ASSIGN_DP_ONLY_MASK conditional to intially select only
Pin Assignment C if it is available.
Fixes: 0e3bb7d689 ("usb: typec: Add driver for DisplayPort alternate mode")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20230329215159.2046932-1-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 271912353
(cherry picked from commit eddebe3960https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-linus)
Change-Id: If1c2515e35a538e32d3b51f0c304b6de507df407
Signed-off-by: RD Babiera <rdbabiera@google.com>
Changes in 5.15.106
fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY
usb: dwc3: gadget: move cmd_endtransfer to extra function
usb: dwc3: gadget: Add 1ms delay after end transfer command without IOC
kernel: kcsan: kcsan_test: build without structleak plugin
kcsan: avoid passing -g for test
ksmbd: don't terminate inactive sessions after a few seconds
bus: imx-weim: fix branch condition evaluates to a garbage value
xfrm: Zero padding when dumping algos and encap
ASoC: codecs: tx-macro: Fix for KASAN: slab-out-of-bounds
md: avoid signed overflow in slot_store()
x86/PVH: obtain VGA console info in Dom0
net: hsr: Don't log netdev_err message on unknown prp dst node
ALSA: asihpi: check pao in control_message()
ALSA: hda/ca0132: fixup buffer overrun at tuning_ctl_set()
fbdev: tgafb: Fix potential divide by zero
sched_getaffinity: don't assume 'cpumask_size()' is fully initialized
fbdev: nvidia: Fix potential divide by zero
fbdev: intelfb: Fix potential divide by zero
fbdev: lxfb: Fix potential divide by zero
fbdev: au1200fb: Fix potential divide by zero
tools/power turbostat: Fix /dev/cpu_dma_latency warnings
tools/power turbostat: fix decoding of HWP_STATUS
tracing: Fix wrong return in kprobe_event_gen_test.c
ca8210: Fix unsigned mac_len comparison with zero in ca8210_skb_tx()
mips: bmips: BCM6358: disable RAC flush for TP1
ALSA: usb-audio: Fix recursive locking at XRUN during syncing
platform/x86: think-lmi: add missing type attribute
platform/x86: think-lmi: use correct possible_values delimiters
platform/x86: think-lmi: only display possible_values if available
platform/x86: think-lmi: Add possible_values for ThinkStation
mtd: rawnand: meson: invalidate cache on polling ECC bit
SUNRPC: fix shutdown of NFS TCP client socket
sfc: ef10: don't overwrite offload features at NIC reset
scsi: megaraid_sas: Fix crash after a double completion
scsi: mpt3sas: Don't print sense pool info twice
ptp_qoriq: fix memory leak in probe()
net: dsa: microchip: ksz8863_smi: fix bulk access
r8169: fix RTL8168H and RTL8107E rx crc error
regulator: Handle deferred clk
net/net_failover: fix txq exceeding warning
net: stmmac: don't reject VLANs when IFF_PROMISC is set
drm/i915/tc: Fix the ICL PHY ownership check in TC-cold state
platform/x86/intel/pmc: Alder Lake PCH slp_s0_residency fix
can: bcm: bcm_tx_setup(): fix KMSAN uninit-value in vfs_write
s390/vfio-ap: fix memory leak in vfio_ap device driver
loop: suppress uevents while reconfiguring the device
loop: LOOP_CONFIGURE: send uevents for partitions
net: mvpp2: classifier flow fix fragmentation flags
net: mvpp2: parser fix QinQ
net: mvpp2: parser fix PPPoE
smsc911x: avoid PHY being resumed when interface is not up
ice: add profile conflict check for AVF FDIR
ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
ALSA: ymfpci: Create card with device-managed snd_devm_card_new()
ALSA: ymfpci: Fix BUG_ON in probe function
net: ipa: compute DMA pool size properly
i40e: fix registers dump after run ethtool adapter self test
bnxt_en: Fix reporting of test result in ethtool selftest
bnxt_en: Fix typo in PCI id to device description string mapping
bnxt_en: Add missing 200G link speed reporting
net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
net: ethernet: mtk_eth_soc: fix flow block refcounting logic
pinctrl: ocelot: Fix alt mode for ocelot
iommu/vt-d: Allow zero SAGAW if second-stage not supported
Input: alps - fix compatibility with -funsigned-char
Input: focaltech - use explicitly signed char type
cifs: prevent infinite recursion in CIFSGetDFSRefer()
cifs: fix DFS traversal oops without CONFIG_CIFS_DFS_UPCALL
Input: goodix - add Lenovo Yoga Book X90F to nine_bytes_report DMI table
btrfs: fix race between quota disable and quota assign ioctls
btrfs: scan device in non-exclusive mode
zonefs: Always invalidate last cached page on append write
can: j1939: prevent deadlock by moving j1939_sk_errqueue()
xen/netback: don't do grant copy across page boundary
net: phy: dp83869: fix default value for tx-/rx-internal-delay
pinctrl: amd: Disable and mask interrupts on resume
pinctrl: at91-pio4: fix domain name assignment
powerpc: Don't try to copy PPR for task with NULL pt_regs
NFSv4: Fix hangs when recovering open state after a server reboot
ALSA: hda/conexant: Partial revert of a quirk for Lenovo
ALSA: usb-audio: Fix regression on detection of Roland VS-100
ALSA: hda/realtek: Add quirks for some Clevo laptops
ALSA: hda/realtek: Add quirk for Lenovo ZhaoYang CF4620Z
xtensa: fix KASAN report for show_stack
rcu: Fix rcu_torture_read ftrace event
drm/etnaviv: fix reference leak when mmaping imported buffer
drm/amd/display: Add DSC Support for Synaptics Cascaded MST Hub
KVM: arm64: Disable interrupts while walking userspace PTs
s390/uaccess: add missing earlyclobber annotations to __clear_user()
KVM: VMX: Move preemption timer <=> hrtimer dance to common x86
KVM: x86: Inject #GP on x2APIC WRMSR that sets reserved bits 63:32
KVM: x86: Purge "highest ISR" cache when updating APICv state
zonefs: Fix error message in zonefs_file_dio_append()
selftests/bpf: Test btf dump for struct with padding only fields
libbpf: Fix BTF-to-C converter's padding logic
selftests/bpf: Add few corner cases to test padding handling of btf_dump
libbpf: Fix btf_dump's packed struct determination
hsr: ratelimit only when errors are printed
x86/PVH: avoid 32-bit build warning when obtaining VGA console info
Linux 5.15.106
Change-Id: I3197b16c9f82b9bd6a17d4637a00b15e9bd5b873
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 4fb877aaa1 ]
Fix bug in btf_dump's logic of determining if a given struct type is
packed or not. The notion of "natural alignment" is not needed and is
even harmful in this case, so drop it altogether. The biggest difference
in btf_is_struct_packed() compared to its original implementation is
that we don't really use btf__align_of() to determine overall alignment
of a struct type (because it could be 1 for both packed and non-packed
struct, depending on specifci field definitions), and just use field's
actual alignment to calculate whether any field is requiring packing or
struct's size overall necessitates packing.
Add two simple test cases that demonstrate the difference this change
would make.
Fixes: ea2ce1ba99 ("libbpf: Fix BTF-to-C converter's padding logic")
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20221215183605.4149488-1-andrii@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ea2ce1ba99 ]
Turns out that btf_dump API doesn't handle a bunch of tricky corner
cases, as reported by Per, and further discovered using his testing
Python script ([0]).
This patch revamps btf_dump's padding logic significantly, making it
more correct and also avoiding unnecessary explicit padding, where
compiler would pad naturally. This overall topic turned out to be very
tricky and subtle, there are lots of subtle corner cases. The comments
in the code tries to give some clues, but comments themselves are
supposed to be paired with good understanding of C alignment and padding
rules. Plus some experimentation to figure out subtle things like
whether `long :0;` means that struct is now forced to be long-aligned
(no, it's not, turns out).
Anyways, Per's script, while not completely correct in some known
situations, doesn't show any obvious cases where this logic breaks, so
this is a nice improvement over the previous state of this logic.
Some selftests had to be adjusted to accommodate better use of natural
alignment rules, eliminating some unnecessary padding, or changing it to
`type: 0;` alignment markers.
Note also that for when we are in between bitfields, we emit explicit
bit size, while otherwise we use `: 0`, this feels much more natural in
practice.
Next patch will add few more test cases, found through randomized Per's
script.
[0] https://lore.kernel.org/bpf/85f83c333f5355c8ac026f835b18d15060725fcb.camel@ericsson.com/
Reported-by: Per Sundström XP <per.xp.sundstrom@ericsson.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-6-andrii@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 88b170088a upstream.
Since the expected write location in a sequential file is always at the
end of the file (append write), when an invalid write append location is
detected in zonefs_file_dio_append(), print the invalid written location
instead of the expected write location.
Fixes: a608da3bd7 ("zonefs: Detect append writes at invalid locations")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 97a71c444a upstream.
Purge the "highest ISR" cache when updating APICv state on a vCPU. The
cache must not be used when APICv is active as hardware may emulate EOIs
(and other operations) without exiting to KVM.
This fixes a bug where KVM will effectively block IRQs in perpetuity due
to the "highest ISR" never getting reset if APICv is activated on a vCPU
while an IRQ is in-service. Hardware emulates the EOI and KVM never gets
a chance to update its cache.
Fixes: b26a695a1d ("kvm: lapic: Introduce APICv update helper function")
Cc: stable@vger.kernel.org
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98c25ead5e upstream.
Handle the switch to/from the hypervisor/software timer when a vCPU is
blocking in common x86 instead of in VMX. Even though VMX is the only
user of a hypervisor timer, the logic and all functions involved are
generic x86 (unless future CPUs do something completely different and
implement a hypervisor timer that runs regardless of mode).
Handling the switch in common x86 will allow for the elimination of the
pre/post_blocks hooks, and also lets KVM switch back to the hypervisor
timer if and only if it was in use (without additional params). Add a
comment explaining why the switch cannot be deferred to kvm_sched_out()
or kvm_vcpu_block().
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211208015236.1616697-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ta: Fix conflicts in vmx_pre_block and vmx_post_block as per Paolo's
suggestion. Add Reported-by and Link tags.]
Reported-by: syzbot+b6a74be92b5063a0f1ff@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=489beb3d76ef14cc6cd18125782dc6f86051a605
Tested-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e86fc1a3a3 upstream.
We walk the userspace PTs to discover what mapping size was
used there. However, this can race against the userspace tables
being freed, and we end-up in the weeds.
Thankfully, the mm code is being generous and will IPI us when
doing so. So let's implement our part of the bargain and disable
interrupts around the walk. This ensures that nothing terrible
happens during that time.
We still need to handle the removal of the page tables before
the walk. For that, allow get_user_mapping_size() to return an
error, and make sure this error can be propagated all the way
to the the exit handler.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230316174546.3777507-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 963b2e8c42 upstream.
drm_gem_prime_mmap() takes a reference on the GEM object, but before that
drm_gem_mmap_obj() already takes a reference, which will be leaked as only
one reference is dropped when the mapping is closed. Drop the extra
reference when dma_buf_mmap() succeeds.
Cc: stable@vger.kernel.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d18a04157f upstream.
Fix the rcutorturename field so that its size is correctly reported in
the text format embedded in trace.dat files. As it stands, it is
reported as being of size 1:
field:char rcutorturename[8]; offset:8; size:1; signed:0;
Signed-off-by: Douglas Raillard <douglas.raillard@arm.com>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Cc: stable@vger.kernel.org
Fixes: 04ae87a520 ("ftrace: Rework event_create_dir()")
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
[ boqun: Add "Cc" and "Fixes" tags per Steven ]
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1d3b7a788c upstream.
show_stack dumps raw stack contents which may trigger an unnecessary
KASAN report. Fix it by copying stack contents to a temporary buffer
with __memcpy and then printing that buffer instead of passing stack
pointer directly to the print_hex_dump.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fa4e7a6fa1 upstream.
It's been reported that the recent kernel can't probe the PCM devices
on Roland VS-100 properly, and it turned out to be a regression by the
recent addition of the bit shift range check for the format bits.
In the old code, we just did bit-shift and it resulted in zero, which
is then corrected to the standard PCM format, while the new code
explicitly returns an error in such a case.
For addressing the regression, relax the check and fallback to the
standard PCM type (with the info output).
Fixes: 43d5ca88df ("ALSA: usb-audio: Fix potential out-of-bounds shift")
Cc: <stable@vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217084
Link: https://lore.kernel.org/r/20230324075005.19403-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b871cb971c upstream.
The recent commit f83bb25924 ("ALSA: hda/conexant: Add quirk for
LENOVO 20149 Notebook model") introduced a quirk for the device with
17aa:3977, but this caused a regression on another model (Lenovo
Ideadpad U31) with the very same PCI SSID. And, through skimming over
the net, it seems that this PCI SSID is used for multiple different
models, so it's no good idea to apply the quirk with the SSID.
Although we may take a different ID check (e.g. the codec SSID instead
of the PCI SSID), unfortunately, the original patch author couldn't
identify the hardware details any longer as the machine was returned,
and we can't develop the further proper fix.
In this patch, instead, we partially revert the change so that the
quirk won't be applied as default for addressing the regression.
Meanwhile, the quirk function itself is kept, and it's now made to be
applicable via the explicit model=lenovo-20149 option.
Fixes: f83bb25924 ("ALSA: hda/conexant: Add quirk for LENOVO 20149 Notebook model")
Reported-by: Jetro Jormalainen <jje-lxkl@jetro.fi>
Link: https://lore.kernel.org/r/20230308215009.4d3e58a6@mopti
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230320140954.31154-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6165a16a5a upstream.
When we're using a cached open stateid or a delegation in order to avoid
sending a CLAIM_PREVIOUS open RPC call to the server, we don't have a
new open stateid to present to update_open_stateid().
Instead rely on nfs4_try_open_cached(), just as if we were doing a
normal open.
Fixes: d2bfda2e7a ("NFSv4: don't reprocess cached open CLAIM_PREVIOUS")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fd72761894 upstream.
powerpc sets up PF_KTHREAD and PF_IO_WORKER with a NULL pt_regs, which
from my (arguably very short) checking is not commonly done for other
archs. This is fine, except when PF_IO_WORKER's have been created and
the task does something that causes a coredump to be generated. Then we
get this crash:
Kernel attempted to read user page (160) - exploit attempt? (uid: 1000)
BUG: Kernel NULL pointer dereference on read at 0x00000160
Faulting instruction address: 0xc0000000000c3a60
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=32 NUMA pSeries
Modules linked in: bochs drm_vram_helper drm_kms_helper xts binfmt_misc ecb ctr syscopyarea sysfillrect cbc sysimgblt drm_ttm_helper aes_generic ttm sg libaes evdev joydev virtio_balloon vmx_crypto gf128mul drm dm_mod fuse loop configfs drm_panel_orientation_quirks ip_tables x_tables autofs4 hid_generic usbhid hid xhci_pci xhci_hcd usbcore usb_common sd_mod
CPU: 1 PID: 1982 Comm: ppc-crash Not tainted 6.3.0-rc2+ #88
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
NIP: c0000000000c3a60 LR: c000000000039944 CTR: c0000000000398e0
REGS: c0000000041833b0 TRAP: 0300 Not tainted (6.3.0-rc2+)
MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 88082828 XER: 200400f8
...
NIP memcpy_power7+0x200/0x7d0
LR ppr_get+0x64/0xb0
Call Trace:
ppr_get+0x40/0xb0 (unreliable)
__regset_get+0x180/0x1f0
regset_get_alloc+0x64/0x90
elf_core_dump+0xb98/0x1b60
do_coredump+0x1c34/0x24a0
get_signal+0x71c/0x1410
do_notify_resume+0x140/0x6f0
interrupt_exit_user_prepare_main+0x29c/0x320
interrupt_exit_user_prepare+0x6c/0xa0
interrupt_return_srr_user+0x8/0x138
Because ppr_get() is trying to copy from a PF_IO_WORKER with a NULL
pt_regs.
Check for a valid pt_regs in both ppc_get/ppr_set, and return an error
if not set. The actual error value doesn't seem to be important here, so
just pick -EINVAL.
Fixes: fa439810cc ("powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[mpe: Trim oops in change log, add Fixes & Cc stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/d9f63344-fe7c-56ae-b420-4a1a04a2ae4c@kernel.dk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b26cd9325b upstream.
This fixes a similar problem to the one observed in:
commit 4e5a04be88 ("pinctrl: amd: disable and mask interrupts on probe").
On some systems, during suspend/resume cycle firmware leaves
an interrupt enabled on a pin that is not used by the kernel.
This confuses the AMD pinctrl driver and causes spurious interrupts.
The driver already has logic to detect if a pin is used by the kernel.
Leverage it to re-initialize interrupt fields of a pin only if it's not
used by us.
Cc: stable@vger.kernel.org
Fixes: dbad75dd1f ("pinctrl: add AMD GPIO driver support.")
Signed-off-by: Kornel Dulęba <korneld@chromium.org>
Link: https://lore.kernel.org/r/20230320093259.845178-1-korneld@chromium.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>