This is updated by
tools/bazel run //common-modules/virtual-device:virtual_device_aarch64_abi_update_symbol_list
Test: TH
Bug: 267694690
Change-Id: Ibe7a8c14fce126701159c1f223a36155f0f6f20d
Signed-off-by: Yifan Hong <elsk@google.com>
This optimization allows us to re-create higher order block mappings in
the host stage2 pagetables after we teardown a guest VM. The coalescing
code is triggered on host_stage2_set_owner_locked path when we annotate
the entries in the host stage2 page-tables with an invalid entry that has
the owner set to PKVM_ID_HOST. This can also be triggered from
page_relinquish when we do page insertion in the ballooning code.
When the host reclaims ownership during guest teardown, the page table
walker drops the refcount of the counted entries and clears out
unreferenced entries (refcount == 1). Clearing out the entry installs a
zero PTE. When the host stage2 receives a data abort because there is no
mapping associated, it will try to create the largest possible block
mapping from the founded leaf entry.
With the current patch, we increase the chances of finding a leaf entry
that has level < 3 if the requested region comes from a reclaimed torned
down VM memory. This has the advantage of reducing the TLB pressure at
host stage2.
To be able to do coalescing, we modify the way we do refcounting by not
counting the following descriptor types at host stage 2:
- non-zero invalid PTEs
- any descriptor that has at least one of the reserved-high bits(58-55)
toogled
- non-default attribute mappings
- page table descriptors
The algorithm works as presented below:
Is refcount(child(pte_table)) == 1 ?
Yes -> (because we left only default mappings)
Zap the table by setting 0 in the pte_table
and put the page that holds the level 3 entries
back into the memcache
level 2
+---------+
| |
| ... |
| pte_table---+ level 3 -> we can now re-create a 2Mb mapping
| ... | +---> +---------+
| | | |
| | | |
| | |def entry|
+---------+ | |
|def entry|
| |
| ... |
+---------+
This (v3) is a re-work of the previous version which fixes some issues on
the stage2_unmap path:
When we register a pKVM IOMMU we unmap the MMIO region from the host
stage2. While we treat most of the MMIO regions as default mappings in
the coalescing change, we end up decrementing the page table page
refcount for a default mapping which breaks the refcounting. Fix this by
adding a check which verifies if we have a default mapping before
decrementing the reference.
Bug: 222044487
Test: dump the host stage2 pagetables and view the mapping
Change-Id: I518fcbd7f022e77965eef54dd59dac07425db3a5
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Signed-off-by: Will Deacon <willdeacon@google.com>
Add tracepoint/traceiter symbol to QCOM symbol list that are
required to attach to the vendor hook android_vh_do_wake_up_sync.
1 Added function:
[A] 'function __tracepoint_android_vh_do_wake_up_sync
[A] 'function __traceiter_android_vh_do_wake_up_sync
Bug: 268410186
Change-Id: I7ddda19db7e1b042aa0466893e09e3d3f8271674
Signed-off-by: Raul Martinez <quic_mraul@quicinc.com>
Changes in 6.1.11
firewire: fix memory leak for payload of request subaction to IEC 61883-1 FCP region
bus: sunxi-rsb: Fix error handling in sunxi_rsb_init()
arm64: dts: imx8m-venice: Remove incorrect 'uart-has-rtscts'
arm64: dts: freescale: imx8dxl: fix sc_pwrkey's property name linux,keycode
ASoC: amd: acp-es8336: Drop reference count of ACPI device after use
ASoC: Intel: bytcht_es8316: Drop reference count of ACPI device after use
ASoC: Intel: bytcr_rt5651: Drop reference count of ACPI device after use
ASoC: Intel: bytcr_rt5640: Drop reference count of ACPI device after use
ASoC: Intel: bytcr_wm5102: Drop reference count of ACPI device after use
ASoC: Intel: sof_es8336: Drop reference count of ACPI device after use
ASoC: Intel: avs: Implement PCI shutdown
bpf: Fix off-by-one error in bpf_mem_cache_idx()
bpf: Fix a possible task gone issue with bpf_send_signal[_thread]() helpers
ALSA: hda/via: Avoid potential array out-of-bound in add_secret_dac_path()
bpf: Fix to preserve reg parent/live fields when copying range info
selftests/filesystems: grant executable permission to run_fat_tests.sh
ASoC: SOF: ipc4-mtrace: prevent underflow in sof_ipc4_priority_mask_dfs_write()
bpf: Add missing btf_put to register_btf_id_dtor_kfuncs
media: v4l2-ctrls-api.c: move ctrl->is_new = 1 to the correct line
bpf, sockmap: Check for any of tcp_bpf_prots when cloning a listener
arm64: dts: imx8mm: Fix pad control for UART1_DTE_RX
arm64: dts: imx8mm-verdin: Do not power down eth-phy
drm/vc4: hdmi: make CEC adapter name unique
drm/ssd130x: Init display before the SSD130X_DISPLAY_ON command
scsi: Revert "scsi: core: map PQ=1, PDT=other values to SCSI_SCAN_TARGET_PRESENT"
bpf: Fix the kernel crash caused by bpf_setsockopt().
ALSA: memalloc: Workaround for Xen PV
vhost/net: Clear the pending messages when the backend is removed
copy_oldmem_kernel() - WRITE is "data source", not destination
WRITE is "data source", not destination...
READ is "data destination", not source...
zcore: WRITE is "data source", not destination...
memcpy_real(): WRITE is "data source", not destination...
fix iov_iter_bvec() "direction" argument
fix 'direction' argument of iov_iter_{init,bvec}()
fix "direction" argument of iov_iter_kvec()
use less confusing names for iov_iter direction initializers
vhost-scsi: unbreak any layout for response
ice: Prevent set_channel from changing queues while RDMA active
qede: execute xdp_do_flush() before napi_complete_done()
virtio-net: execute xdp_do_flush() before napi_complete_done()
dpaa_eth: execute xdp_do_flush() before napi_complete_done()
dpaa2-eth: execute xdp_do_flush() before napi_complete_done()
skb: Do mix page pool and page referenced frags in GRO
sfc: correctly advertise tunneled IPv6 segmentation
net: phy: dp83822: Fix null pointer access on DP83825/DP83826 devices
net: wwan: t7xx: Fix Runtime PM initialization
block, bfq: replace 0/1 with false/true in bic apis
block, bfq: fix uaf for bfqq in bic_set_bfqq()
netrom: Fix use-after-free caused by accept on already connected socket
fscache: Use wait_on_bit() to wait for the freeing of relinquished volume
platform/x86/amd/pmf: update to auto-mode limits only after AMT event
platform/x86/amd/pmf: Add helper routine to update SPS thermals
platform/x86/amd/pmf: Fix to update SPS default pprof thermals
platform/x86/amd/pmf: Add helper routine to check pprof is balanced
platform/x86/amd/pmf: Fix to update SPS thermals when power supply change
platform/x86/amd/pmf: Ensure mutexes are initialized before use
platform/x86: thinkpad_acpi: Fix thinklight LED brightness returning 255
drm/i915/guc: Fix locking when searching for a hung request
drm/i915: Fix request ref counting during error capture & debugfs dump
drm/i915: Fix up locking around dumping requests lists
drm/i915/adlp: Fix typo for reference clock
net/tls: tls_is_tx_ready() checked list_entry
ALSA: firewire-motu: fix unreleased lock warning in hwdep device
netfilter: br_netfilter: disable sabotage_in hook after first suppression
block: ublk: extending queue_size to fix overflow
kunit: fix kunit_test_init_section_suites(...)
squashfs: harden sanity check in squashfs_read_xattr_id_table
maple_tree: should get pivots boundary by type
sctp: do not check hb_timer.expires when resetting hb_timer
net: phy: meson-gxl: Add generic dummy stubs for MMD register access
drm/panel: boe-tv101wum-nl6: Ensure DSI writes succeed during disable
ip/ip6_gre: Fix changing addr gen mode not generating IPv6 link local address
ip/ip6_gre: Fix non-point-to-point tunnel not generating IPv6 link local address
riscv: kprobe: Fixup kernel panic when probing an illegal position
igc: return an error if the mac type is unknown in igc_ptp_systim_to_hwtstamp()
octeontx2-af: Fix devlink unregister
can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate
can: raw: fix CAN FD frame transmissions over CAN XL devices
can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq
ata: libata: Fix sata_down_spd_limit() when no link speed is reported
selftests: net: udpgso_bench_rx: Fix 'used uninitialized' compiler warning
selftests: net: udpgso_bench_rx/tx: Stop when wrong CLI args are provided
selftests: net: udpgso_bench: Fix racing bug between the rx/tx programs
selftests: net: udpgso_bench_tx: Cater for pending datagrams zerocopy benchmarking
virtio-net: Keep stop() to follow mirror sequence of open()
net: openvswitch: fix flow memory leak in ovs_flow_cmd_new
efi: fix potential NULL deref in efi_mem_reserve_persistent
rtc: sunplus: fix format string for printing resource
certs: Fix build error when PKCS#11 URI contains semicolon
kbuild: modinst: Fix build error when CONFIG_MODULE_SIG_KEY is a PKCS#11 URI
i2c: designware-pci: Add new PCI IDs for AMD NAVI GPU
i2c: mxs: suppress probe-deferral error message
scsi: target: core: Fix warning on RT kernels
x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings
perf/x86/intel: Add Emerald Rapids
perf/x86/intel/cstate: Add Emerald Rapids
scsi: iscsi_tcp: Fix UAF during logout when accessing the shost ipaddress
scsi: iscsi_tcp: Fix UAF during login when accessing the shost ipaddress
i2c: rk3x: fix a bunch of kernel-doc warnings
Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one"
x86/build: Move '-mindirect-branch-cs-prefix' out of GCC-only block
platform/x86: dell-wmi: Add a keymap for KEY_MUTE in type 0x0010 table
platform/x86: hp-wmi: Handle Omen Key event
platform/x86: gigabyte-wmi: add support for B450M DS3H WIFI-CF
platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN
net/x25: Fix to not accept on connected socket
drm/amd/display: Fix timing not changning when freesync video is enabled
bcache: Silence memcpy() run-time false positive warnings
iio: adc: stm32-dfsdm: fill module aliases
usb: dwc3: qcom: enable vbus override when in OTG dr-mode
usb: gadget: f_fs: Fix unbalanced spinlock in __ffs_ep0_queue_wait
vc_screen: move load of struct vc_data pointer in vcs_read() to avoid UAF
fbcon: Check font dimension limits
cgroup/cpuset: Fix wrong check in update_parent_subparts_cpumask()
hv_netvsc: Fix missed pagebuf entries in netvsc_dma_map/unmap()
ARM: dts: imx7d-smegw01: Fix USB host over-current polarity
net: qrtr: free memory on error path in radix_tree_insert()
can: isotp: split tx timer into transmission and timeout
can: isotp: handle wait_event_interruptible() return values
watchdog: diag288_wdt: do not use stack buffers for hardware data
watchdog: diag288_wdt: fix __diag288() inline assembly
ALSA: hda/realtek: Add Acer Predator PH315-54
ALSA: hda/realtek: fix mute/micmute LEDs, speaker don't work for a HP platform
ASoC: codecs: wsa883x: correct playback min/max rates
ASoC: SOF: sof-audio: unprepare when swidget->use_count > 0
ASoC: SOF: sof-audio: skip prepare/unprepare if swidget is NULL
ASoC: SOF: keep prepare/unprepare widgets in sink path
efi: Accept version 2 of memory attributes table
rtc: efi: Enable SET/GET WAKEUP services as optional
iio: hid: fix the retval in accel_3d_capture_sample
iio: hid: fix the retval in gyro_3d_capture_sample
iio: adc: xilinx-ams: fix devm_krealloc() return value check
iio: adc: berlin2-adc: Add missing of_node_put() in error path
iio: imx8qxp-adc: fix irq flood when call imx8qxp_adc_read_raw()
iio:adc:twl6030: Enable measurements of VUSB, VBAT and others
iio: light: cm32181: Fix PM support on system with 2 I2C resources
iio: imu: fxos8700: fix ACCEL measurement range selection
iio: imu: fxos8700: fix incomplete ACCEL and MAGN channels readback
iio: imu: fxos8700: fix IMU data bits returned to user space
iio: imu: fxos8700: fix map label of channel type to MAGN sensor
iio: imu: fxos8700: fix swapped ACCEL and MAGN channels readback
iio: imu: fxos8700: fix incorrect ODR mode readback
iio: imu: fxos8700: fix failed initialization ODR mode assignment
iio: imu: fxos8700: remove definition FXOS8700_CTRL_ODR_MIN
iio: imu: fxos8700: fix MAGN sensor scale and unit
nvmem: brcm_nvram: Add check for kzalloc
nvmem: sunxi_sid: Always use 32-bit MMIO reads
nvmem: qcom-spmi-sdam: fix module autoloading
parisc: Fix return code of pdc_iodc_print()
parisc: Replace hardcoded value with PRIV_USER constant in ptrace.c
parisc: Wire up PTRACE_GETREGS/PTRACE_SETREGS for compat case
riscv: disable generation of unwind tables
Revert "mm: kmemleak: alloc gray object for reserved region with direct map"
mm: multi-gen LRU: fix crash during cgroup migration
mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps
mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath()
usb: gadget: f_uac2: Fix incorrect increment of bNumEndpoints
usb: typec: ucsi: Don't attempt to resume the ports before they exist
usb: gadget: udc: do not clear gadget driver.bus
kernel/irq/irqdomain.c: fix memory leak with using debugfs_lookup()
HV: hv_balloon: fix memory leak with using debugfs_lookup()
x86/debug: Fix stack recursion caused by wrongly ordered DR7 accesses
fpga: m10bmc-sec: Fix probe rollback
fpga: stratix10-soc: Fix return value check in s10_ops_write_init()
mm/uffd: fix pte marker when fork() without fork event
mm/swapfile: add cond_resched() in get_swap_pages()
mm/khugepaged: fix ->anon_vma race
mm, mremap: fix mremap() expanding for vma's with vm_ops->close()
mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
highmem: round down the address passed to kunmap_flush_on_unmap()
ia64: fix build error due to switch case label appearing next to declaration
Squashfs: fix handling and sanity checking of xattr_ids count
maple_tree: fix mas_empty_area_rev() lower bound validation
migrate: hugetlb: check for hugetlb shared PMD in node migration
dma-buf: actually set signaling bit for private stub fences
serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler
drm/i915: Avoid potential vm use-after-free
drm/i915: Fix potential bit_17 double-free
drm/amd: Fix initialization for nbio 4.3.0
drm/amd/pm: drop unneeded dpm features disablement for SMU 13.0.4/11
drm/amdgpu: update wave data type to 3 for gfx11
nvmem: core: initialise nvmem->id early
nvmem: core: remove nvmem_config wp_gpio
nvmem: core: fix cleanup after dev_set_name()
nvmem: core: fix registration vs use race
nvmem: core: fix device node refcounting
nvmem: core: fix cell removal on error
nvmem: core: fix return value
phy: qcom-qmp-combo: fix runtime suspend
serial: 8250_dma: Fix DMA Rx completion race
serial: 8250_dma: Fix DMA Rx rearm race
platform/x86/amd: pmc: add CONFIG_SERIO dependency
ASoC: SOF: sof-audio: prepare_widgets: Check swidget for NULL on sink failure
iio:adc:twl6030: Enable measurement of VAC
powerpc/64s/radix: Fix crash with unaligned relocated kernel
powerpc/64s: Fix local irq disable when PMIs are disabled
powerpc/imc-pmu: Revert nest_init_lock to being a mutex
fs/ntfs3: Validate attribute data and valid sizes
ovl: Use "buf" flexible array for memcpy() destination
f2fs: initialize locks earlier in f2fs_fill_super()
fbdev: smscufx: fix error handling code in ufx_usb_probe
f2fs: fix to do sanity check on i_extra_isize in is_alive()
wifi: brcmfmac: Check the count value of channel spec to prevent out-of-bounds reads
gfs2: Cosmetic gfs2_dinode_{in,out} cleanup
gfs2: Always check inode size of inline inodes
bpf: Skip invalid kfunc call in backtrack_insn
Linux 6.1.11
Change-Id: I69722bc9711b91f2fca18de59746ada373f64c5e
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit d3178e8a43 upstream.
The verifier skips invalid kfunc call in check_kfunc_call(), which
would be captured in fixup_kfunc_call() if such insn is not eliminated
by dead code elimination. However, this can lead to the following
warning in backtrack_insn(), also see [1]:
------------[ cut here ]------------
verifier backtracking bug
WARNING: CPU: 6 PID: 8646 at kernel/bpf/verifier.c:2756 backtrack_insn
kernel/bpf/verifier.c:2756
__mark_chain_precision kernel/bpf/verifier.c:3065
mark_chain_precision kernel/bpf/verifier.c:3165
adjust_reg_min_max_vals kernel/bpf/verifier.c:10715
check_alu_op kernel/bpf/verifier.c:10928
do_check kernel/bpf/verifier.c:13821 [inline]
do_check_common kernel/bpf/verifier.c:16289
[...]
So make backtracking conservative with this by returning ENOTSUPP.
[1] https://lore.kernel.org/bpf/CACkBjsaXNceR8ZjkLG=dT3P=4A8SBsg0Z5h5PWLryF5=ghKq=g@mail.gmail.com/
Reported-by: syzbot+4da3ff23081bafe74fc2@syzkaller.appspotmail.com
Signed-off-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230104014709.9375-1-sunhao.th@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 70376c7ff3 upstream.
Check if the inode size of stuffed (inline) inodes is within the allowed
range when reading inodes from disk (gfs2_dinode_in()). This prevents
us from on-disk corruption.
The two checks in stuffed_readpage() and gfs2_unstuffer_page() that just
truncate inline data to the maximum allowed size don't actually make
sense, and they can be removed now as well.
Reported-by: syzbot+7bb81dfa9cda07d9cd9d@syzkaller.appspotmail.com
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7db354444a upstream.
In each of the two functions, add an inode variable that points to
&ip->i_inode and use that throughout the rest of the function.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4920ab131b upstream.
This patch fixes slab-out-of-bounds reads in brcmfmac that occur in
brcmf_construct_chaninfo() and brcmf_enable_bw40_2g() when the count
value of channel specifications provided by the device is greater than
the length of 'list->element[]', decided by the size of the 'list'
allocated with kzalloc(). The patch adds checks that make the functions
free the buffer and return -EINVAL if that is the case. Note that the
negative return is handled by the caller, brcmf_setup_wiphybands() or
brcmf_cfg80211_attach().
Found by a modified version of syzkaller.
Crash Report from brcmf_construct_chaninfo():
==================================================================
BUG: KASAN: slab-out-of-bounds in brcmf_setup_wiphybands+0x1238/0x1430
Read of size 4 at addr ffff888115f24600 by task kworker/0:2/1896
CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G W O 5.14.0+ #132
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Workqueue: usb_hub_wq hub_event
Call Trace:
dump_stack_lvl+0x57/0x7d
print_address_description.constprop.0.cold+0x93/0x334
kasan_report.cold+0x83/0xdf
brcmf_setup_wiphybands+0x1238/0x1430
brcmf_cfg80211_attach+0x2118/0x3fd0
brcmf_attach+0x389/0xd40
brcmf_usb_probe+0x12de/0x1690
usb_probe_interface+0x25f/0x710
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_set_configuration+0x984/0x1770
usb_generic_driver_probe+0x69/0x90
usb_probe_device+0x9c/0x220
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_new_device.cold+0x463/0xf66
hub_event+0x10d5/0x3330
process_one_work+0x873/0x13e0
worker_thread+0x8b/0xd10
kthread+0x379/0x450
ret_from_fork+0x1f/0x30
Allocated by task 1896:
kasan_save_stack+0x1b/0x40
__kasan_kmalloc+0x7c/0x90
kmem_cache_alloc_trace+0x19e/0x330
brcmf_setup_wiphybands+0x290/0x1430
brcmf_cfg80211_attach+0x2118/0x3fd0
brcmf_attach+0x389/0xd40
brcmf_usb_probe+0x12de/0x1690
usb_probe_interface+0x25f/0x710
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_set_configuration+0x984/0x1770
usb_generic_driver_probe+0x69/0x90
usb_probe_device+0x9c/0x220
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_new_device.cold+0x463/0xf66
hub_event+0x10d5/0x3330
process_one_work+0x873/0x13e0
worker_thread+0x8b/0xd10
kthread+0x379/0x450
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff888115f24000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 1536 bytes inside of
2048-byte region [ffff888115f24000, ffff888115f24800)
Memory state around the buggy address:
ffff888115f24500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888115f24580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888115f24600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff888115f24680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888115f24700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Crash Report from brcmf_enable_bw40_2g():
==================================================================
BUG: KASAN: slab-out-of-bounds in brcmf_cfg80211_attach+0x3d11/0x3fd0
Read of size 4 at addr ffff888103787600 by task kworker/0:2/1896
CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G W O 5.14.0+ #132
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Workqueue: usb_hub_wq hub_event
Call Trace:
dump_stack_lvl+0x57/0x7d
print_address_description.constprop.0.cold+0x93/0x334
kasan_report.cold+0x83/0xdf
brcmf_cfg80211_attach+0x3d11/0x3fd0
brcmf_attach+0x389/0xd40
brcmf_usb_probe+0x12de/0x1690
usb_probe_interface+0x25f/0x710
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_set_configuration+0x984/0x1770
usb_generic_driver_probe+0x69/0x90
usb_probe_device+0x9c/0x220
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_new_device.cold+0x463/0xf66
hub_event+0x10d5/0x3330
process_one_work+0x873/0x13e0
worker_thread+0x8b/0xd10
kthread+0x379/0x450
ret_from_fork+0x1f/0x30
Allocated by task 1896:
kasan_save_stack+0x1b/0x40
__kasan_kmalloc+0x7c/0x90
kmem_cache_alloc_trace+0x19e/0x330
brcmf_cfg80211_attach+0x3302/0x3fd0
brcmf_attach+0x389/0xd40
brcmf_usb_probe+0x12de/0x1690
usb_probe_interface+0x25f/0x710
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_set_configuration+0x984/0x1770
usb_generic_driver_probe+0x69/0x90
usb_probe_device+0x9c/0x220
really_probe+0x1be/0xa90
__driver_probe_device+0x2ab/0x460
driver_probe_device+0x49/0x120
__device_attach_driver+0x18a/0x250
bus_for_each_drv+0x123/0x1a0
__device_attach+0x207/0x330
bus_probe_device+0x1a2/0x260
device_add+0xa61/0x1ce0
usb_new_device.cold+0x463/0xf66
hub_event+0x10d5/0x3330
process_one_work+0x873/0x13e0
worker_thread+0x8b/0xd10
kthread+0x379/0x450
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff888103787000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 1536 bytes inside of
2048-byte region [ffff888103787000, ffff888103787800)
Memory state around the buggy address:
ffff888103787500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888103787580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888103787600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff888103787680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888103787700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Reported-by: Dokyung Song <dokyungs@yonsei.ac.kr>
Reported-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr>
Reported-by: Minsuk Kang <linuxlovemin@yonsei.ac.kr>
Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Minsuk Kang <linuxlovemin@yonsei.ac.kr>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20221116142952.518241-1-linuxlovemin@yonsei.ac.kr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad53db4acb upstream.
The recent commit 76d588dddc ("powerpc/imc-pmu: Fix use of mutex in
IRQs disabled section") fixed warnings (and possible deadlocks) in the
IMC PMU driver by converting the locking to use spinlocks.
It also converted the init-time nest_init_lock to a spinlock, even
though it's not used at runtime in IRQ disabled sections or while
holding other spinlocks.
This leads to warnings such as:
BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc2-14719-gf12cd06109f4-dirty #1
Hardware name: Mambo,Simulated-System POWER9 0x4e1203 opal:v6.6.6 PowerNV
Call Trace:
dump_stack_lvl+0x74/0xa8 (unreliable)
__might_resched+0x178/0x1a0
__cpuhp_setup_state+0x64/0x1e0
init_imc_pmu+0xe48/0x1250
opal_imc_counters_probe+0x30c/0x6a0
platform_probe+0x78/0x110
really_probe+0x104/0x420
__driver_probe_device+0xb0/0x170
driver_probe_device+0x58/0x180
__driver_attach+0xd8/0x250
bus_for_each_dev+0xb4/0x140
driver_attach+0x34/0x50
bus_add_driver+0x1e8/0x2d0
driver_register+0xb4/0x1c0
__platform_driver_register+0x38/0x50
opal_imc_driver_init+0x2c/0x40
do_one_initcall+0x80/0x360
kernel_init_freeable+0x310/0x3b8
kernel_init+0x30/0x1a0
ret_from_kernel_thread+0x5c/0x64
Fix it by converting nest_init_lock back to a mutex, so that we can call
sleeping functions while holding it. There is no interaction between
nest_init_lock and the runtime spinlocks used by the actual PMU routines.
Fixes: 76d588dddc ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section")
Tested-by: Kajol Jain<kjain@linux.ibm.com>
Reviewed-by: Kajol Jain<kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230130014401.540543-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc88ef6632 upstream.
When PMI interrupts are soft-masked, local_irq_save() will clear the PMI
mask bit, allowing PMIs in and causing a race condition. This causes a
deadlock in native_hpte_insert via hash_preload, which depends on PMIs
being disabled since commit 8b91cee5ea ("powerpc/64s/hash: Make hash
faults work in NMI context"). native_hpte_insert calls local_irq_save().
It's possible the lpar hash code is also affected when tracing is
enabled because __trace_hcall_entry() calls local_irq_save().
Fix this by making arch_local_irq_save() _or_ the IRQS_DISABLED bit into
the mask.
This was found with the stress_hpt option with a kbuild workload running
together with `perf record -g`.
Fixes: f442d00480 ("powerpc/64s: Add support to mask perf interrupts and replay them")
Fixes: 8b91cee5ea ("powerpc/64s/hash: Make hash faults work in NMI context")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Just take the fix without the new warning]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121095352.2823517-1-npiggin@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98d0219e04 upstream.
If a relocatable kernel is loaded at an address that is not 2MB aligned
and told not to relocate to zero, the kernel can crash due to
mark_rodata_ro() incorrectly changing some read-write data to read-only.
Scenarios where the misalignment can occur are when the kernel is
loaded by kdump or using the RELOCATABLE_TEST config option.
Example crash with the kernel loaded at 5MB:
Run /sbin/init as init process
BUG: Unable to handle kernel data access on write at 0xc000000000452000
Faulting instruction address: 0xc0000000005b6730
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
CPU: 1 PID: 1 Comm: init Not tainted 6.2.0-rc1-00011-g349188be4841 #166
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-5b4c5a hv:linux,kvm pSeries
NIP: c0000000005b6730 LR: c000000000ae9ab8 CTR: 0000000000000380
REGS: c000000004503250 TRAP: 0300 Not tainted (6.2.0-rc1-00011-g349188be4841)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44288480 XER: 00000000
CFAR: c0000000005b66ec DAR: c000000000452000 DSISR: 0a000000 IRQMASK: 0
...
NIP memset+0x68/0x104
LR zero_user_segments.constprop.0+0xa8/0xf0
Call Trace:
ext4_mpage_readpages+0x7f8/0x830
ext4_readahead+0x48/0x60
read_pages+0xb8/0x380
page_cache_ra_unbounded+0x19c/0x250
filemap_fault+0x58c/0xae0
__do_fault+0x60/0x100
__handle_mm_fault+0x1230/0x1a40
handle_mm_fault+0x120/0x300
___do_page_fault+0x20c/0xa80
do_page_fault+0x30/0xc0
data_access_common_virt+0x210/0x220
This happens because mark_rodata_ro() tries to change permissions on the
range _stext..__end_rodata, but _stext sits in the middle of the 2MB
page from 4MB to 6MB:
radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
radix-mmu: Mapped 0x0000000000400000-0x0000000002400000 with 2.00 MiB pages (exec)
The logic that changes the permissions assumes the linear mapping was
split correctly at boot, so it marks the entire 2MB page read-only. That
leads to the write fault above.
To fix it, the boot time mapping logic needs to consider that if the
kernel is running at a non-zero address then _stext is a boundary where
it must split the mapping.
That leads to the mapping being split correctly, allowing the rodata
permission change to take happen correctly, with no spillover:
radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
radix-mmu: Mapped 0x0000000000400000-0x0000000000500000 with 64.0 KiB pages
radix-mmu: Mapped 0x0000000000500000-0x0000000000600000 with 64.0 KiB pages (exec)
radix-mmu: Mapped 0x0000000000600000-0x0000000002400000 with 2.00 MiB pages (exec)
If the kernel is loaded at a 2MB aligned address, the mapping continues
to use 2MB pages as before:
radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
radix-mmu: Mapped 0x0000000000400000-0x0000000002c00000 with 2.00 MiB pages (exec)
radix-mmu: Mapped 0x0000000002c00000-0x0000000100000000 with 2.00 MiB pages
Fixes: c55d7b5e64 ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230110124753.1325426-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit abce209d18 upstream.
Using the serio subsystem now requires the code to be reachable:
x86_64-linux-ld: drivers/platform/x86/amd/pmc.o: in function `amd_pmc_suspend_handler':
pmc.c:(.text+0x86c): undefined reference to `serio_bus'
Add the usual dependency: as other users of serio use 'select'
rather than 'depends on', use the same here.
Fixes: 8e60615e89 ("platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230127093950.2368575-1-arnd@kernel.org
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57e9af7831 upstream.
As DMA Rx can be completed from two places, it is possible that DMA Rx
completes before DMA completion callback had a chance to complete it.
Once the previous DMA Rx has been completed, a new one can be started
on the next UART interrupt. The following race is possible
(uart_unlock_and_check_sysrq_irqrestore() replaced with
spin_unlock_irqrestore() for simplicity/clarity):
CPU0 CPU1
dma_rx_complete()
serial8250_handle_irq()
spin_lock_irqsave(&port->lock)
handle_rx_dma()
serial8250_rx_dma_flush()
__dma_rx_complete()
dma->rx_running = 0
// Complete DMA Rx
spin_unlock_irqrestore(&port->lock)
serial8250_handle_irq()
spin_lock_irqsave(&port->lock)
handle_rx_dma()
serial8250_rx_dma()
dma->rx_running = 1
// Setup a new DMA Rx
spin_unlock_irqrestore(&port->lock)
spin_lock_irqsave(&port->lock)
// sees dma->rx_running = 1
__dma_rx_complete()
dma->rx_running = 0
// Incorrectly complete
// running DMA Rx
This race seems somewhat theoretical to occur for real but handle it
correctly regardless. Check what is the DMA status before complething
anything in __dma_rx_complete().
Reported-by: Gilles BULOZ <gilles.buloz@kontron.com>
Tested-by: Gilles BULOZ <gilles.buloz@kontron.com>
Fixes: 9ee4b83e51 ("serial: 8250: Add support for dmaengine")
Cc: stable@vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230130114841.25749-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab3428cfd9 upstream.
The i.MX6 CPU frequency driver sometimes fails to register at boot time
due to nvmem_cell_read_u32() sporadically returning -ENOENT.
This happens because there is a window where __nvmem_device_get() in
of_nvmem_cell_get() is able to return the nvmem device, but as cells
have been setup, nvmem_find_cell_entry_by_node() returns NULL.
The occurs because the nvmem core registration code violates one of the
fundamental principles of kernel programming: do not publish data
structures before their setup is complete.
Fix this by making nvmem core code conform with this principle.
Fixes: eace75cfdc ("nvmem: Add a simple NVMEM framework for nvmem providers")
Cc: stable@vger.kernel.org
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230127104015.23839-7-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 560181d3ac upstream.
If dev_set_name() fails, we leak nvmem->wp_gpio as the cleanup does not
put this. While a minimal fix for this would be to add the gpiod_put()
call, we can do better if we split device_register(), and use the
tested nvmem_release() cleanup code by initialising the device early,
and putting the device.
This results in a slightly larger fix, but results in clear code.
Note: this patch depends on "nvmem: core: initialise nvmem->id early"
and "nvmem: core: remove nvmem_config wp_gpio".
Fixes: 5544e90c81 ("nvmem: core: add error handling for dev_set_name")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[Srini: Fixed subject line and error code handing with wp_gpio while applying.]
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230127104015.23839-6-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f6c02fa71 upstream.
Requesting an interrupt with IRQF_ONESHOT will run the primary handler
in the hard-IRQ context even in the force-threaded mode. The
force-threaded mode is used by PREEMPT_RT in order to avoid acquiring
sleeping locks (spinlock_t) in hard-IRQ context. This combination
makes it impossible and leads to "sleeping while atomic" warnings.
Use one interrupt handler for both handlers (primary and secondary)
and drop the IRQF_ONESHOT flag which is not needed.
Fixes: e359b4411c ("serial: stm32: fix threaded interrupt handling")
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Valentin Caron <valentin.caron@foss.st.com> # V3
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230120160332.57930-1-marex@denx.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f65c4bbbd6 upstream.
A Sysbot [1] corrupted filesystem exposes two flaws in the handling and
sanity checking of the xattr_ids count in the filesystem. Both of these
flaws cause computation overflow due to incorrect typing.
In the corrupted filesystem the xattr_ids value is 4294967071, which
stored in a signed variable becomes the negative number -225.
Flaw 1 (64-bit systems only):
The signed integer xattr_ids variable causes sign extension.
This causes variable overflow in the SQUASHFS_XATTR_*(A) macros. The
variable is first multiplied by sizeof(struct squashfs_xattr_id) where the
type of the sizeof operator is "unsigned long".
On a 64-bit system this is 64-bits in size, and causes the negative number
to be sign extended and widened to 64-bits and then become unsigned. This
produces the very large number 18446744073709548016 or 2^64 - 3600. This
number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and
divided by SQUASHFS_METADATA_SIZE overflows and produces a length of 0
(stored in len).
Flaw 2 (32-bit systems only):
On a 32-bit system the integer variable is not widened by the unsigned
long type of the sizeof operator (32-bits), and the signedness of the
variable has no effect due it always being treated as unsigned.
The above corrupted xattr_ids value of 4294967071, when multiplied
overflows and produces the number 4294963696 or 2^32 - 3400. This number
when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by
SQUASHFS_METADATA_SIZE overflows again and produces a length of 0.
The effect of the 0 length computation:
In conjunction with the corrupted xattr_ids field, the filesystem also has
a corrupted xattr_table_start value, where it matches the end of
filesystem value of 850.
This causes the following sanity check code to fail because the
incorrectly computed len of 0 matches the incorrect size of the table
reported by the superblock (0 bytes).
len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids);
indexes = SQUASHFS_XATTR_BLOCKS(*xattr_ids);
/*
* The computed size of the index table (len bytes) should exactly
* match the table start and end points
*/
start = table_start + sizeof(*id_table);
end = msblk->bytes_used;
if (len != (end - start))
return ERR_PTR(-EINVAL);
Changing the xattr_ids variable to be "usigned int" fixes the flaw on a
64-bit system. This relies on the fact the computation is widened by the
unsigned long type of the sizeof operator.
Casting the variable to u64 in the above macro fixes this flaw on a 32-bit
system.
It also means 64-bit systems do not implicitly rely on the type of the
sizeof operator to widen the computation.
[1] https://lore.kernel.org/lkml/000000000000cd44f005f1a0f17f@google.com/
Link: https://lkml.kernel.org/r/20230127061842.10965-1-phillip@squashfs.org.uk
Fixes: 506220d2ba ("squashfs: add more sanity checks in xattr id lookup")
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: <syzbot+082fa4af80a5bb1a9843@syzkaller.appspotmail.com>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Fedor Pchelkin <pchelkin@ispras.ru>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit edb5d0cf55 upstream.
In commit 34488399fa ("mm/madvise: add file and shmem support to
MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none():
- if (!pmd_present(pmde))
- return SCAN_PMD_NULL;
+ if (pmd_none(pmde))
+ return SCAN_PMD_NONE;
This was for-use by MADV_COLLAPSE file/shmem codepaths, where
MADV_COLLAPSE might identify a pte-mapped hugepage, only to have
khugepaged race-in, free the pte table, and clear the pmd. Such codepaths
include:
A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER
already in the pagecache.
B) In retract_page_tables(), if we fail to grab mmap_lock for the target
mm/address.
In these cases, collapse_pte_mapped_thp() really does expect a none (not
just !present) pmd, and we want to suitably identify that case separate
from the case where no pmd is found, or it's a bad-pmd (of course, many
things could happen once we drop mmap_lock, and the pmd could plausibly
undergo multiple transitions due to intervening fault, split, etc).
Regardless, the code is prepared install a huge-pmd only when the existing
pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd.
However, the commit introduces a logical hole; namely, that we've allowed
!none- && !huge- && !bad-pmds to be classified as genuine
pte-table-mapping-pmds. One such example that could leak through are swap
entries. The pmd values aren't checked again before use in
pte_offset_map_lock(), which is expecting nothing less than a genuine
pte-table-mapping-pmd.
We want to put back the !pmd_present() check (below the pmd_none() check),
but need to be careful to deal with subtleties in pmd transitions and
treatments by various arch.
The issue is that __split_huge_pmd_locked() temporarily clears the present
bit (or otherwise marks the entry as invalid), but pmd_present() and
pmd_trans_huge() still need to return true while the pmd is in this
transitory state. For example, x86's pmd_present() also checks the
_PAGE_PSE , riscv's version also checks the _PAGE_LEAF bit, and arm64 also
checks a PMD_PRESENT_INVALID bit.
Covering all 4 cases for x86 (all checks done on the same pmd value):
1) pmd_present() && pmd_trans_huge()
All we actually know here is that the PSE bit is set. Either:
a) We aren't racing with __split_huge_page(), and PRESENT or PROTNONE
is set.
=> huge-pmd
b) We are currently racing with __split_huge_page(). The danger here
is that we proceed as-if we have a huge-pmd, but really we are
looking at a pte-mapping-pmd. So, what is the risk of this
danger?
The only relevant path is:
madvise_collapse() -> collapse_pte_mapped_thp()
Where we might just incorrectly report back "success", when really
the memory isn't pmd-backed. This is fine, since split could
happen immediately after (actually) successful madvise_collapse().
So, it should be safe to just assume huge-pmd here.
2) pmd_present() && !pmd_trans_huge()
Either:
a) PSE not set and either PRESENT or PROTNONE is.
=> pte-table-mapping pmd (or PROT_NONE)
b) devmap. This routine can be called immediately after
unlocking/locking mmap_lock -- or called with no locks held (see
khugepaged_scan_mm_slot()), so previous VMA checks have since been
invalidated.
3) !pmd_present() && pmd_trans_huge()
Not possible.
4) !pmd_present() && !pmd_trans_huge()
Neither PRESENT nor PROTNONE set
=> not present
I've checked all archs that implement pmd_trans_huge() (arm64, riscv,
powerpc, longarch, x86, mips, s390) and this logic roughly translates
(though devmap treatment is unique to x86 and powerpc, and (3) doesn't
necessarily hold in general -- but that doesn't matter since
!pmd_present() always takes failure path).
Also, add a comment above find_pmd_or_thp_or_none() to help future
travelers reason about the validity of the code; namely, the possible
mutations that might happen out from under us, depending on how mmap_lock
is held (if at all).
Link: https://lkml.kernel.org/r/20230125225358.2576151-1-zokeefe@google.com
Fixes: 34488399fa ("mm/madvise: add file and shmem support to MADV_COLLAPSE")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reported-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d014cd7c1c upstream.
Fabian has reported another regression in 6.1 due to ca3d76b0aa ("mm:
add merging after mremap resize"). The problem is that vma_merge() can
fail when vma has a vm_ops->close() method, causing is_mergeable_vma()
test to be negative. This was happening for vma mapping a file from
fuse-overlayfs, which does have the method. But when we are simply
expanding the vma, we never remove it due to the "merge" with the added
area, so the test should not prevent the expansion.
As a quick fix, check for such vmas and expand them using vma_adjust()
directly as was done before commit ca3d76b0aa. For a more robust long
term solution we should try to limit the check for vma_ops->close only to
cases that actually result in vma removal, so that no merge would be
prevented unnecessarily.
[akpm@linux-foundation.org: fix indenting whitespace, reflow comment]
Link: https://lkml.kernel.org/r/20230117101939.9753-1-vbabka@suse.cz
Fixes: ca3d76b0aa ("mm: add merging after mremap resize")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Fabian Vogt <fvogt@suse.com>
Link: https://bugzilla.suse.com/show_bug.cgi?id=1206359#c35
Tested-by: Fabian Vogt <fvogt@suse.com>
Cc: Jakub Matěna <matenajakub@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 023f47a825 upstream.
If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires
it to be locked.
Page table traversal is allowed under any one of the mmap lock, the
anon_vma lock (if the VMA is associated with an anon_vma), and the
mapping lock (if the VMA is associated with a mapping); and so to be
able to remove page tables, we must hold all three of them.
retract_page_tables() bails out if an ->anon_vma is attached, but does
this check before holding the mmap lock (as the comment above the check
explains).
If we racily merged an existing ->anon_vma (shared with a child
process) from a neighboring VMA, subsequent rmap traversals on pages
belonging to the child will be able to see the page tables that we are
concurrently removing while assuming that nothing else can access them.
Repeat the ->anon_vma check once we hold the mmap lock to ensure that
there really is no concurrent page table access.
Hitting this bug causes a lockdep warning in collapse_and_free_pmd(),
in the line "lockdep_assert_held_write(&vma->anon_vma->root->rwsem)".
It can also lead to use-after-free access.
Link: https://lore.kernel.org/linux-mm/CAG48ez3434wZBKFFbdx4M9j6eUwSUVPd4dxhzW_k_POneSDF+A@mail.gmail.com/
Link: https://lkml.kernel.org/r/20230111133351.807024-1-jannh@google.com
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Jann Horn <jannh@google.com>
Reported-by: Zach O'Keefe <zokeefe@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@intel.linux.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>