Currently, the tick_entry vendor hook in the scheduler tick path is
called before the rq clock is updated.
Change the vendor hook placement to after the clock update so that an
updated clock value is used.
Fixes: ca60d78542 ("Revert "Revert "ANDROID: Sched: Add restricted
vendor hooks for scheduler""")
Signed-off-by: Sai Harshini Nimmala <quic_snimmala@quicinc.com>
Change-Id: Ieee88da1cf803c68212b96b79e45d216d0696d0d
When memory is tight, system may start to compact memory for large
continuous memory demands. If one process tries to lock a memory page
that is being locked and isolated for compaction, it may wait a long time
or even forever. This is because compaction will perform non-atomic
PG_Isolated clear while holding page lock, this may overwrite PG_waiters
set by the process that can't obtain the page lock and add itself to the
waiting queue to wait for the lock to be unlocked.
CPU1 CPU2
lock_page(page); (successful)
lock_page(); (failed)
__ClearPageIsolated(page); SetPageWaiters(page) (may be overwritten)
unlock_page(page);
The solution is to not perform non-atomic operation on page flags while
holding page lock.
Link: https://lkml.kernel.org/r/20220315030515.20263-1-andrew.yang@mediatek.com
Signed-off-by: andrew.yang <andrew.yang@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Vlastimil Babka" <vbabka@suse.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: "William Kucharski" <william.kucharski@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Nicholas Tang <nicholas.tang@mediatek.com>
Cc: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 48911e41ddee4fe113bf1e4303dda1a413b169c9
https://github.com/hnaz/linux-mm.git)
Bug: 225086204
Change-Id: I3dc59bf75f12d4ee93779bcbe9336c6376d2d6b6
Changes in 5.15.29
arm64: dts: qcom: sm8350: Describe GCC dependency clocks
arm64: dts: qcom: sm8350: Correct UFS symbol clocks
HID: elo: Revert USB reference counting
HID: hid-thrustmaster: fix OOB read in thrustmaster_interrupts
ARM: boot: dts: bcm2711: Fix HVS register range
clk: qcom: gdsc: Add support to update GDSC transition delay
clk: qcom: dispcc: Update the transition delay for MDSS GDSC
HID: vivaldi: fix sysfs attributes leak
arm64: dts: armada-3720-turris-mox: Add missing ethernet0 alias
tipc: fix kernel panic when enabling bearer
vdpa/mlx5: add validation for VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command
vduse: Fix returning wrong type in vduse_domain_alloc_iova()
net: phy: meson-gxl: fix interrupt handling in forced mode
mISDN: Fix memory leak in dsp_pipeline_build()
vhost: fix hung thread due to erroneous iotlb entries
virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero
vdpa: fix use-after-free on vp_vdpa_remove
isdn: hfcpci: check the return value of dma_set_mask() in setup_hw()
net: qlogic: check the return value of dma_alloc_coherent() in qed_vf_hw_prepare()
esp: Fix possible buffer overflow in ESP transformation
esp: Fix BEET mode inter address family tunneling on GSO
qed: return status of qed_iov_get_link
smsc95xx: Ignore -ENODEV errors when device is unplugged
gpiolib: acpi: Convert ACPI value of debounce to microseconds
drm/sun4i: mixer: Fix P010 and P210 format numbers
net: dsa: mt7530: fix incorrect test in mt753x_phylink_validate()
ARM: dts: aspeed: Fix AST2600 quad spi group
iavf: Fix handling of vlan strip virtual channel messages
i40e: stop disabling VFs due to PF error responses
ice: stop disabling VFs due to PF error responses
ice: Fix error with handling of bonding MTU
ice: Don't use GFP_KERNEL in atomic context
ice: Fix curr_link_speed advertised speed
ethernet: Fix error handling in xemaclite_of_probe
tipc: fix incorrect order of state message data sanity check
net: ethernet: ti: cpts: Handle error for clk_enable
net: ethernet: lpc_eth: Handle error for clk_enable
net: marvell: prestera: Add missing of_node_put() in prestera_switch_set_base_mac_addr
ax25: Fix NULL pointer dereference in ax25_kill_by_device
net/mlx5: Fix size field in bufferx_reg struct
net/mlx5: Fix a race on command flush flow
net/mlx5e: Lag, Only handle events from highest priority multipath entry
NFC: port100: fix use-after-free in port100_send_complete
selftests: pmtu.sh: Kill tcpdump processes launched by subshell.
selftests: pmtu.sh: Kill nettest processes launched in subshell.
gpio: ts4900: Do not set DAT and OE together
gianfar: ethtool: Fix refcount leak in gfar_get_ts_info
net: phy: DP83822: clear MISR2 register to disable interrupts
sctp: fix kernel-infoleak for SCTP sockets
net: bcmgenet: Don't claim WOL when its not available
net: phy: meson-gxl: improve link-up behavior
selftests/bpf: Add test for bpf_timer overwriting crash
swiotlb: fix info leak with DMA_FROM_DEVICE
usb: dwc3: pci: add support for the Intel Raptor Lake-S
pinctrl: tigerlake: Revert "Add Alder Lake-M ACPI ID"
KVM: Fix lockdep false negative during host resume
kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode
spi: rockchip: Fix error in getting num-cs property
spi: rockchip: terminate dma transmission when slave abort
drm/vc4: hdmi: Unregister codec device on unbind
x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU
net-sysfs: add check for netdevice being present to speed_show
hwmon: (pmbus) Clear pmbus fault/warning bits after read
PCI: Mark all AMD Navi10 and Navi14 GPU ATS as broken
gpio: Return EPROBE_DEFER if gc->to_irq is NULL
drm/amdgpu: bypass tiling flag check in virtual display case (v2)
Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
Revert "xen-netback: Check for hotplug-status existence before watching"
ipv6: prevent a possible race condition with lifetimes
tracing: Ensure trace buffer is at least 4096 bytes large
tracing/osnoise: Make osnoise_main to sleep for microseconds
selftest/vm: fix map_fixed_noreplace test failure
selftests/memfd: clean up mapping in mfd_fail_write
ARM: Spectre-BHB: provide empty stub for non-config
fuse: fix fileattr op failure
fuse: fix pipe buffer lifetime for direct_io
staging: rtl8723bs: Fix access-point mode deadlock
staging: gdm724x: fix use after free in gdm_lte_rx()
net: macb: Fix lost RX packet wakeup race in NAPI receive
riscv: alternative only works on !XIP_KERNEL
mmc: meson: Fix usage of meson_mmc_post_req()
riscv: Fix auipc+jalr relocation range checks
tracing/osnoise: Force quiescent states while tracing
arm64: dts: marvell: armada-37xx: Remap IO space to bus address 0x0
arm64: Ensure execute-only permissions are not allowed without EPAN
arm64: kasan: fix include error in MTE functions
swiotlb: rework "fix info leak with DMA_FROM_DEVICE"
KVM: x86/mmu: kvm_faultin_pfn has to return false if pfh is returned
virtio: unexport virtio_finalize_features
virtio: acknowledge all features before access
net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE
ARM: fix Thumb2 regression with Spectre BHB
watch_queue: Fix filter limit check
watch_queue, pipe: Free watchqueue state after clearing pipe ring
watch_queue: Fix to release page in ->release()
watch_queue: Fix to always request a pow-of-2 pipe ring size
watch_queue: Fix the alloc bitmap size to reflect notes allocated
watch_queue: Free the alloc bitmap when the watch_queue is torn down
watch_queue: Fix lack of barrier/sync/lock between post and read
watch_queue: Make comment about setting ->defunct more accurate
x86/boot: Fix memremap of setup_indirect structures
x86/boot: Add setup_indirect support in early_memremap_is_setup_data()
x86/sgx: Free backing memory after faulting the enclave page
x86/traps: Mark do_int3() NOKPROBE_SYMBOL
drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDP
btrfs: make send work with concurrent block group relocation
drm/i915: Workaround broken BIOS DBUF configuration on TGL/RKL
riscv: dts: k210: fix broken IRQs on hart1
block: drop unused includes in <linux/genhd.h>
Revert "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN"
vhost: allow batching hint without size
Linux 5.15.29
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5c9c6006b90a8283a81fd5f7c79776e1a0cfb6b1
Linux 5.15.28 extended the size of the cpu_hwcaps structure to handle
the spectre-bhb issue. Because of this, the ABI "changes" to extend the
structures to grow to handle this. No driver will be affected by this
at all as there is just a new field at the end of these arrays.
Leaf changes summary: 2 artifacts changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 2 Changed, 0 Added variable
2 Changed variables:
[C] 'static_key_false cpu_hwcap_keys[65]' was changed to 'static_key_false cpu_hwcap_keys[66]' at cpufeature.c:158:1:
size of symbol changed from 1040 to 1056
CRC (modversions) changed from 0xc3fa40cc to 0xd14fef22
type of variable changed:
type name changed from 'static_key_false[65]' to 'static_key_false[66]'
array type size changed from 8320 to 8448
array type subrange 1 changed length from 65 to 66
[C] 'unsigned long int cpu_hwcaps[2]' was changed at cpufeature.c:106:1:
CRC (modversions) changed from 0x856092f6 to 0x77c1b75a
Fixes: aa79753319 ("Linux 5.15.28")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idbe93c443c7c4e72ef1ae6bba9e550355e9e32c5
Changes in 5.15.28
slip: fix macro redefine warning
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
x86/speculation: Add eIBRS + Retpoline options
Documentation/hw-vuln: Update spectre doc
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
x86/speculation: Use generic retpoline by default on AMD
x86/speculation: Update link to AMD speculation whitepaper
x86/speculation: Warn about Spectre v2 LFENCE mitigation
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
ARM: report Spectre v2 status through sysfs
ARM: early traps initialisation
ARM: use LOADADDR() to get load address of sections
ARM: Spectre-BHB workaround
ARM: include unprivileged BPF status in Spectre V2 reporting
arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
arm64: Add HWCAP for self-synchronising virtual counter
arm64: Add Cortex-X2 CPU part definition
arm64: add ID_AA64ISAR2_EL1 sys register
arm64: cpufeature: add HWCAP for FEAT_AFP
arm64: cpufeature: add HWCAP for FEAT_RPRES
arm64: entry.S: Add ventry overflow sanity checks
arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
arm64: entry: Make the trampoline cleanup optional
arm64: entry: Free up another register on kpti's tramp_exit path
arm64: entry: Move the trampoline data page before the text page
arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
arm64: entry: Don't assume tramp_vectors is the start of the vectors
arm64: entry: Move trampoline macros out of ifdef'd section
arm64: entry: Make the kpti trampoline's kpti sequence optional
arm64: entry: Allow the trampoline text to occupy multiple pages
arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
arm64: entry: Add vectors that have the bhb mitigation sequences
arm64: entry: Add macro for reading symbol addresses from the trampoline
arm64: Add percpu vectors for EL1
arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
arm64: Mitigate spectre style branch history side channels
KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
arm64: Use the clearbhb instruction in mitigations
arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
ARM: fix build error when BPF_SYSCALL is disabled
ARM: fix co-processor register typo
ARM: Do not use NOCROSSREFS directive with ld.lld
arm64: Do not include __READ_ONCE() block in assembly files
ARM: fix build warning in proc-v7-bugs.c
xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
xen/grant-table: add gnttab_try_end_foreign_access()
xen/blkfront: don't use gnttab_query_foreign_access() for mapped status
xen/netfront: don't use gnttab_query_foreign_access() for mapped status
xen/scsifront: don't use gnttab_query_foreign_access() for mapped status
xen/gntalloc: don't use gnttab_query_foreign_access()
xen: remove gnttab_query_foreign_access()
xen/9p: use alloc/free_pages_exact()
xen/pvcalls: use alloc/free_pages_exact()
xen/gnttab: fix gnttab_end_foreign_access() without page specified
xen/netfront: react properly to failing gnttab_end_foreign_access_ref()
Revert "ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE"
Linux 5.15.28
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6d6fcf4f171c097168e17ecff30e1c510cf69fe8
[ Upstream commit 13765de814 ]
Syzbot found a GPF in reweight_entity. This has been bisected to
commit 4ef0c5c6b5 ("kernel/sched: Fix sched_fork() access an invalid
sched_task_group")
There is a race between sched_post_fork() and setpriority(PRIO_PGRP)
within a thread group that causes a null-ptr-deref in
reweight_entity() in CFS. The scenario is that the main process spawns
number of new threads, which then call setpriority(PRIO_PGRP, 0, -20),
wait, and exit. For each of the new threads the copy_process() gets
invoked, which adds the new task_struct and calls sched_post_fork()
for it.
In the above scenario there is a possibility that
setpriority(PRIO_PGRP) and set_one_prio() will be called for a thread
in the group that is just being created by copy_process(), and for
which the sched_post_fork() has not been executed yet. This will
trigger a null pointer dereference in reweight_entity(), as it will
try to access the run queue pointer, which hasn't been set.
Before the mentioned change the cfs_rq pointer for the task has been
set in sched_fork(), which is called much earlier in copy_process(),
before the new task is added to the thread_group. Now it is done in
the sched_post_fork(), which is called after that. To fix the issue
the remove the update_load param from the update_load param() function
and call reweight_task() only if the task flag doesn't have the
TASK_NEW flag set.
Fixes: 4ef0c5c6b5 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: syzbot+af7a719bc92395ee41b3@syzkaller.appspotmail.com
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220203161846.1160750-1-tadeusz.struk@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Change-Id: If81be87378ae7948e0278e30ac4a8618a434d070
Bug: 225145763
Reverse migration is used to do the balancing the occupancy of memory
zones in a node in the system whose imabalance may be caused by
migration of pages to other zones by an operation, eg: hotremove and
then hotadding the same memory. In this case there is a lot of free
memory in newly hotadd memory which can be filled up by the previous
migrated pages(as part of offline/hotremove) thus may free up some
pressure in other zones of the node.
Upstream discussion: https://lore.kernel.org/all/ee78c83d-da9b-f6d1-4f66-934b7782acfb@codeaurora.org/
Change-Id: Ib3137dab0db66ecf6858c4077dcadb9dfd0c6b1c
Bug: 201263307
Signed-off-by: Charan Teja Reddy <quic_charante@quicinc.com>
This reverts commit 462c5e6cb2 as it
breaks the KABI. It will be reverted the next KABI gate in a week.
Fixes: efe3167e52 ("Linux 5.15.27")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I269cd4144a2da69aa43de9055f797d0c0ba19bee
This reverts commit aa5040691c as it
breaks the KABI. It will be reverted the next KABI gate in a week.
Fixes: efe3167e52 ("Linux 5.15.27")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ieb674a4e218740a596e6a53d63e7f83b6d3df53b
This reverts commit 07058fb18d as it
breaks the KABI. It will be reverted the next KABI gate in a week.
Fixes: efe3167e52 ("Linux 5.15.27")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9b8f2522971f5747918d9775b869ca6126d18075
Changes in 5.15.27
mac80211_hwsim: report NOACK frames in tx_status
mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work
i2c: bcm2835: Avoid clock stretching timeouts
ASoC: rt5668: do not block workqueue if card is unbound
ASoC: rt5682: do not block workqueue if card is unbound
regulator: core: fix false positive in regulator_late_cleanup()
Input: clear BTN_RIGHT/MIDDLE on buttonpads
btrfs: get rid of warning on transaction commit when using flushoncommit
KVM: arm64: vgic: Read HW interrupt pending state from the HW
block: loop:use kstatfs.f_bsize of backing file to set discard granularity
tipc: fix a bit overflow in tipc_crypto_key_rcv()
cifs: do not use uninitialized data in the owner/group sid
cifs: fix double free race when mount fails in cifs_get_root()
HID: amd_sfh: Handle amd_sfh work buffer in PM ops
HID: amd_sfh: Add functionality to clear interrupts
HID: amd_sfh: Add interrupt handler to process interrupts
cifs: modefromsids must add an ACE for authenticated users
selftests/seccomp: Fix seccomp failure by adding missing headers
drm/amd/pm: correct UMD pstate clocks for Dimgrey Cavefish and Beige Goby
selftests/ftrace: Do not trace do_softirq because of PREEMPT_RT
dmaengine: shdma: Fix runtime PM imbalance on error
i2c: cadence: allow COMPILE_TEST
i2c: imx: allow COMPILE_TEST
i2c: qup: allow COMPILE_TEST
net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern
usb: gadget: don't release an existing dev->buf
usb: gadget: clear related members when goto fail
exfat: reuse exfat_inode_info variable instead of calling EXFAT_I()
exfat: fix i_blocks for files truncated over 4 GiB
tracing: Add test for user space strings when filtering on string pointers
arm64: Mark start_backtrace() notrace and NOKPROBE_SYMBOL
serial: stm32: prevent TDR register overwrite when sending x_char
ext4: drop ineligible txn start stop APIs
ext4: simplify updating of fast commit stats
ext4: fast commit may not fallback for ineligible commit
ext4: fast commit may miss file actions
sched/fair: Fix fault in reweight_entity
ata: pata_hpt37x: fix PCI clock detection
drm/amdgpu: check vm ready by amdgpu_vm->evicting flag
tracing: Add ustring operation to filtering string pointers
ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment()
NFSD: Fix zero-length NFSv3 WRITEs
io_uring: fix no lock protection for ctx->cq_extra
tools/resolve_btf_ids: Close ELF file on error
mtd: spi-nor: Fix mtd size for s3an flashes
MIPS: fix local_{add,sub}_return on MIPS64
signal: In get_signal test for signal_group_exit every time through the loop
PCI: mediatek-gen3: Disable DVFSRC voltage request
PCI: rcar: Check if device is runtime suspended instead of __clk_is_enabled()
PCI: dwc: Do not remap invalid res
PCI: aardvark: Fix checking for MEM resource type
KVM: VMX: Don't unblock vCPU w/ Posted IRQ if IRQs are disabled in guest
KVM: s390: Ensure kvm_arch_no_poll() is read once when blocking vCPU
KVM: VMX: Read Posted Interrupt "control" exactly once per loop iteration
KVM: X86: Ensure that dirty PDPTRs are loaded
KVM: x86: Handle 32-bit wrap of EIP for EMULTYPE_SKIP with flat code seg
KVM: x86: Exit to userspace if emulation prepared a completion callback
i3c: fix incorrect address slot lookup on 64-bit
i3c/master/mipi-i3c-hci: Fix a potentially infinite loop in 'hci_dat_v1_get_index()'
tracing: Do not let synth_events block other dyn_event systems during create
Input: ti_am335x_tsc - set ADCREFM for X configuration
Input: ti_am335x_tsc - fix STEPCONFIG setup for Z2
PCI: mvebu: Check for errors from pci_bridge_emul_init() call
PCI: mvebu: Do not modify PCI IO type bits in conf_write
PCI: mvebu: Fix support for bus mastering and PCI_COMMAND on emulated bridge
PCI: mvebu: Fix configuring secondary bus of PCIe Root Port via emulated bridge
PCI: mvebu: Setup PCIe controller to Root Complex mode
PCI: mvebu: Fix support for PCI_BRIDGE_CTL_BUS_RESET on emulated bridge
PCI: mvebu: Fix support for PCI_EXP_DEVCTL on emulated bridge
PCI: mvebu: Fix support for PCI_EXP_RTSTA on emulated bridge
PCI: mvebu: Fix support for DEVCAP2, DEVCTL2 and LNKCTL2 registers on emulated bridge
NFSD: Fix verifier returned in stable WRITEs
Revert "nfsd: skip some unnecessary stats in the v4 case"
nfsd: fix crash on COPY_NOTIFY with special stateid
x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi()
drm/i915: don't call free_mmap_offset when purging
SUNRPC: Fix sockaddr handling in the svc_xprt_create_error trace point
SUNRPC: Fix sockaddr handling in svcsock_accept_class trace points
drm/sun4i: dw-hdmi: Fix missing put_device() call in sun8i_hdmi_phy_get
drm/atomic: Check new_crtc_state->active to determine if CRTC needs disable in self refresh mode
ntb_hw_switchtec: Fix pff ioread to read into mmio_part_cfg_all
ntb_hw_switchtec: Fix bug with more than 32 partitions
drm/amdkfd: Check for null pointer after calling kmemdup
drm/amdgpu: use spin_lock_irqsave to avoid deadlock by local interrupt
i3c: master: dw: check return of dw_i3c_master_get_free_pos()
dma-buf: cma_heap: Fix mutex locking section
tracing/uprobes: Check the return value of kstrdup() for tu->filename
tracing/probes: check the return value of kstrndup() for pbuf
mm: defer kmemleak object creation of module_alloc()
kasan: fix quarantine conflicting with init_on_free
selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting
hugetlbfs: fix off-by-one error in hugetlb_vmdelete_list()
drm/amdgpu/display: Only set vblank_disable_immediate when PSR is not enabled
drm/amdgpu: filter out radeon PCI device IDs
drm/amdgpu: filter out radeon secondary ids as well
drm/amd/display: Use adjusted DCN301 watermarks
drm/amd/display: move FPU associated DSC code to DML folder
ethtool: Fix link extended state for big endian
octeontx2-af: Optimize KPU1 processing for variable-length headers
octeontx2-af: Reset PTP config in FLR handler
octeontx2-af: cn10k: RPM hardware timestamp configuration
octeontx2-af: cn10k: Use appropriate register for LMAC enable
octeontx2-af: Adjust LA pointer for cpt parse header
octeontx2-af: Add KPU changes to parse NGIO as separate layer
net/mlx5e: IPsec: Refactor checksum code in tx data path
net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic
bpf: Use u64_stats_t in struct bpf_prog_stats
bpf: Fix possible race in inc_misses_counter
drm/amd/display: Update watermark values for DCN301
drm: mxsfb: Set fallback bus format when the bridge doesn't provide one
drm: mxsfb: Fix NULL pointer dereference
riscv/mm: Add XIP_FIXUP for phys_ram_base
drm/i915/display: split out dpt out of intel_display.c
drm/i915/display: Move DRRS code its own file
drm/i915: Disable DRRS on IVB/HSW port != A
gve: Recording rx queue before sending to napi
net: dsa: ocelot: seville: utilize of_mdiobus_register
net: dsa: seville: register the mdiobus under devres
ibmvnic: don't release napi in __ibmvnic_open()
of: net: move of_net under net/
net: ethernet: litex: Add the dependency on HAS_IOMEM
drm/mediatek: mtk_dsi: Reset the dsi0 hardware
cifs: protect session channel fields with chan_lock
cifs: fix confusing unneeded warning message on smb2.1 and earlier
drm/amd/display: Fix stream->link_enc unassigned during stream removal
bnxt_en: Fix occasional ethtool -t loopback test failures
drm/amd/display: For vblank_disable_immediate, check PSR is really used
PCI: mvebu: Fix device enumeration regression
net: of: fix stub of_net helpers for CONFIG_NET=n
ALSA: intel_hdmi: Fix reference to PCM buffer address
ucounts: Fix systemd LimitNPROC with private users regression
riscv/efi_stub: Fix get_boot_hartid_from_fdt() return value
riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
riscv: Fix config KASAN && DEBUG_VIRTUAL
iwlwifi: mvm: check debugfs_dir ptr before use
ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min
iommu/vt-d: Fix double list_add when enabling VMD in scalable mode
iommu/amd: Recover from event log overflow
drm/i915: s/JSP2/ICP2/ PCH
drm/amd/display: Reduce dmesg error to a debug print
xen/netfront: destroy queues before real_num_tx_queues is zeroed
thermal: core: Fix TZ_GET_TRIP NULL pointer dereference
mac80211: fix EAPoL rekey fail in 802.3 rx path
blktrace: fix use after free for struct blk_trace
ntb: intel: fix port config status offset for SPR
mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
xfrm: fix MTU regression
netfilter: fix use-after-free in __nf_register_net_hook()
bpf, sockmap: Do not ignore orig_len parameter
xfrm: fix the if_id check in changelink
xfrm: enforce validity of offload input flags
e1000e: Correct NVM checksum verification flow
net: fix up skbs delta_truesize in UDP GRO frag_list
netfilter: nf_queue: don't assume sk is full socket
netfilter: nf_queue: fix possible use-after-free
netfilter: nf_queue: handle socket prefetch
batman-adv: Request iflink once in batadv-on-batadv check
batman-adv: Request iflink once in batadv_get_real_netdevice
batman-adv: Don't expect inter-netns unique iflink indices
net: ipv6: ensure we call ipv6_mc_down() at most once
net: dcb: flush lingering app table entries for unregistered devices
net: ipa: add an interconnect dependency
net/smc: fix connection leak
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range
mac80211: fix forwarded mesh frames AC & queue selection
net: stmmac: fix return value of __setup handler
mac80211: treat some SAE auth steps as final
iavf: Fix missing check for running netdev
net: sxgbe: fix return value of __setup handler
ibmvnic: register netdev after init of adapter
net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
iavf: Fix deadlock in iavf_reset_task
efivars: Respect "block" flag in efivar_entry_set_safe()
auxdisplay: lcd2s: Fix lcd2s_redefine_char() feature
firmware: arm_scmi: Remove space in MODULE_ALIAS name
ASoC: cs4265: Fix the duplicated control name
auxdisplay: lcd2s: Fix memory leak in ->remove()
auxdisplay: lcd2s: Use proper API to free the instance of charlcd object
can: gs_usb: change active_channels's type from atomic_t to u8
iommu/tegra-smmu: Fix missing put_device() call in tegra_smmu_find
arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output
igc: igc_read_phy_reg_gpy: drop premature return
ARM: Fix kgdb breakpoint for Thumb2
mips: setup: fix setnocoherentio() boolean setting
ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions
mptcp: Correctly set DATA_FIN timeout when number of retransmits is large
selftests: mlxsw: tc_police_scale: Make test more robust
pinctrl: sunxi: Use unique lockdep classes for IRQs
igc: igc_write_phy_reg_gpy: drop premature return
ibmvnic: free reset-work-item when flushing
memfd: fix F_SEAL_WRITE after shmem huge page allocated
s390/extable: fix exception table sorting
sched: Fix yet more sched_fork() races
arm64: dts: juno: Remove GICv2m dma-range
iommu/amd: Fix I/O page table memory leak
MIPS: ralink: mt7621: do memory detection on KSEG1
ARM: dts: switch timer config to common devkit8000 devicetree
ARM: dts: Use 32KiHz oscillator on devkit8000
soc: fsl: guts: Revert commit 3c0d64e867
soc: fsl: guts: Add a missing memory allocation failure check
soc: fsl: qe: Check of ioremap return value
netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant
ARM: tegra: Move panels to AUX bus
can: etas_es58x: change opened_channel_cnt's type from atomic_t to u8
net: stmmac: enhance XDP ZC driver level switching performance
net: stmmac: only enable DMA interrupts when ready
ibmvnic: initialize rc before completing wait
ibmvnic: define flush_reset_queue helper
ibmvnic: complete init_done on transport events
net: chelsio: cxgb3: check the return value of pci_find_capability()
net: sparx5: Fix add vlan when invalid operation
iavf: Refactor iavf state machine tracking
iavf: Add __IAVF_INIT_FAILED state
iavf: Combine init and watchdog state machines
iavf: Add trace while removing device
iavf: Rework mutexes for better synchronisation
iavf: Add helper function to go from pci_dev to adapter
iavf: Fix kernel BUG in free_msi_irqs
iavf: Add waiting so the port is initialized in remove
iavf: Fix init state closure on remove
iavf: Fix locking for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS
iavf: Fix race in init state
iavf: Fix __IAVF_RESETTING state usage
drm/i915/guc/slpc: Correct the param count for unset param
drm/bridge: ti-sn65dsi86: Properly undo autosuspend
e1000e: Fix possible HW unit hang after an s0ix exit
MIPS: ralink: mt7621: use bitwise NOT instead of logical
nl80211: Handle nla_memdup failures in handle_nan_filter
drm/amdgpu: fix suspend/resume hang regression
net: dcb: disable softirqs in dcbnl_flush_dev()
selftests: mlxsw: resource_scale: Fix return value
net: stmmac: perserve TX and RX coalesce value during XDP setup
iavf: do not override the adapter state in the watchdog task (again)
iavf: missing unlocks in iavf_watchdog_task()
MAINTAINERS: adjust file entry for of_net.c after movement
Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
Input: samsung-keypad - properly state IOMEM dependency
HID: add mapping for KEY_DICTATE
HID: add mapping for KEY_ALL_APPLICATIONS
tracing/histogram: Fix sorting on old "cpu" value
tracing: Fix return value of __setup handlers
btrfs: fix lost prealloc extents beyond eof after full fsync
btrfs: fix relocation crash due to premature return from btrfs_commit_transaction()
btrfs: do not WARN_ON() if we have PageError set
btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
btrfs: add missing run of delayed items after unlink during log replay
btrfs: do not start relocation until in progress drops are done
Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6"
proc: fix documentation and description of pagemap
KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()
hamradio: fix macro redefine warning
Linux 5.15.27
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie338dd23e0eb61feb540b4256b5d1840fee4db84
Need to get the request frequency and target frequency
use it to do frequency check and modify it according to
the platform thermal requirements.
Bug: 208166320
Signed-off-by: Jeson Gao <jeson.gao@unisoc.com>
Change-Id: I776b43c8f559b8a072abd8d3abcb3528348b2c5d
(cherry picked from commit fc827b344f)
This reverts commit 2566a89b9e which is
commit a2614140dc upstream.
The above change depends on upstream commit 0faf890fc5 ("net: dsa:
drop rtnl_lock from dsa_slave_switchdev_event_work"), which is not
present in linux-5.15.y. Without that change, waiting for the switchdev
workqueue causes deadlocks on the rtnl_mutex.
Backporting the dependency commit isn't trivial/desirable, since it
requires that the following dependencies of the dependency are also
backported:
df405910ab net: dsa: sja1105: wait for dynamic config command completion on writes too
eb016afd83 net: dsa: sja1105: serialize access to the dynamic config interface
2468346c56 net: mscc: ocelot: serialize access to the MAC table
f7eb4a1c08 net: dsa: b53: serialize access to the ARL table
cf231b436f net: dsa: lantiq_gswip: serialize access to the PCE registers
338a3a4745 net: dsa: introduce locking for the address lists on CPU and DSA ports
and then this bugfix on top:
8940e6b669 ("net: dsa: avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held")
Reported-by: Daniel Suchy <danny@danysek.cz>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 74583f1b92 upstream.
Commit 67d96729a9 ("riscv: Update Canaan Kendryte K210 device tree")
incorrectly removed two entries from the PLIC interrupt-controller node's
interrupts-extended property.
The PLIC driver cannot know the mapping between hart contexts and hart ids,
so this information has to be provided by device tree, as specified by the
PLIC device tree binding.
The PLIC driver uses the interrupts-extended property, and initializes the
hart context registers in the exact same order as provided by the
interrupts-extended property.
In other words, if we don't specify the S-mode interrupts, the PLIC driver
will simply initialize the hart0 S-mode hart context with the hart1 M-mode
configuration. It is therefore essential to specify the S-mode IRQs even
though the system itself will only ever be running in M-mode.
Re-add the S-mode interrupts, so that we get working IRQs on hart1 again.
Cc: <stable@vger.kernel.org>
Fixes: 67d96729a9 ("riscv: Update Canaan Kendryte K210 device tree")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e6f55120c upstream.
On TGL/RKL the BIOS likes to use some kind of bogus DBUF layout
that doesn't match what the spec recommends. With a single active
pipe that is not going to be a problem, but with multiple pipes
active skl_commit_modeset_enables() goes into an infinite loop
since it can't figure out any order in which it can commit the
pipes without causing DBUF overlaps between the planes.
We'd need some kind of extra DBUF defrag stage in between to
make the transition possible. But that is clearly way too complex
a solution, so in the name of simplicity let's just sanitize the
DBUF state by simply turning off all planes when we detect a
pipe encroaching on its neighbours' DBUF slices. We only have
to disable the primary planes as all other planes should have
already been disabled (if they somehow were enabled) by
earlier sanitization steps.
And for good measure let's also sanitize in case the DBUF
allocations of the pipes already seem to overlap each other.
Cc: <stable@vger.kernel.org> # v5.14+
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4762
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220204141818.1900-3-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
(cherry picked from commit 15512021eb)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d96b34248c upstream.
We don't allow send and balance/relocation to run in parallel in order
to prevent send failing or silently producing some bad stream. This is
because while send is using an extent (specially metadata) or about to
read a metadata extent and expecting it belongs to a specific parent
node, relocation can run, the transaction used for the relocation is
committed and the extent gets reallocated while send is still using the
extent, so it ends up with a different content than expected. This can
result in just failing to read a metadata extent due to failure of the
validation checks (parent transid, level, etc), failure to find a
backreference for a data extent, and other unexpected failures. Besides
reallocation, there's also a similar problem of an extent getting
discarded when it's unpinned after the transaction used for block group
relocation is committed.
The restriction between balance and send was added in commit 9e967495e0
("Btrfs: prevent send failures and crashes due to concurrent relocation"),
kernel 5.3, while the more general restriction between send and relocation
was added in commit 1cea5cf0e6 ("btrfs: ensure relocation never runs
while we have send operations running"), kernel 5.14.
Both send and relocation can be very long running operations. Relocation
because it has to do a lot of IO and expensive backreference lookups in
case there are many snapshots, and send due to read IO when operating on
very large trees. This makes it inconvenient for users and tools to deal
with scheduling both operations.
For zoned filesystem we also have automatic block group relocation, so
send can fail with -EAGAIN when users least expect it or send can end up
delaying the block group relocation for too long. In the future we might
also get the automatic block group relocation for non zoned filesystems.
This change makes it possible for send and relocation to run in parallel.
This is achieved the following way:
1) For all tree searches, send acquires a read lock on the commit root
semaphore;
2) After each tree search, and before releasing the commit root semaphore,
the leaf is cloned and placed in the search path (struct btrfs_path);
3) After releasing the commit root semaphore, the changed_cb() callback
is invoked, which operates on the leaf and writes commands to the pipe
(or file in case send/receive is not used with a pipe). It's important
here to not hold a lock on the commit root semaphore, because if we did
we could deadlock when sending and receiving to the same filesystem
using a pipe - the send task blocks on the pipe because it's full, the
receive task, which is the only consumer of the pipe, triggers a
transaction commit when attempting to create a subvolume or reserve
space for a write operation for example, but the transaction commit
blocks trying to write lock the commit root semaphore, resulting in a
deadlock;
4) Before moving to the next key, or advancing to the next change in case
of an incremental send, check if a transaction used for relocation was
committed (or is about to finish its commit). If so, release the search
path(s) and restart the search, to where we were before, so that we
don't operate on stale extent buffers. The search restarts are always
possible because both the send and parent roots are RO, and no one can
add, remove of update keys (change their offset) in RO trees - the
only exception is deduplication, but that is still not allowed to run
in parallel with send;
5) Periodically check if there is contention on the commit root semaphore,
which means there is a transaction commit trying to write lock it, and
release the semaphore and reschedule if there is contention, so as to
avoid causing any significant delays to transaction commits.
This leaves some room for optimizations for send to have less path
releases and re searching the trees when there's relocation running, but
for now it's kept simple as it performs quite well (on very large trees
with resulting send streams in the order of a few hundred gigabytes).
Test case btrfs/187, from fstests, stresses relocation, send and
deduplication attempting to run in parallel, but without verifying if send
succeeds and if it produces correct streams. A new test case will be added
that exercises relocation happening in parallel with send and then checks
that send succeeds and the resulting streams are correct.
A final note is that for now this still leaves the mutual exclusion
between send operations and deduplication on files belonging to a root
used by send operations. A solution for that will be slightly more complex
but it will eventually be built on top of this change.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08999b2489 upstream.
There is a limited amount of SGX memory (EPC) on each system. When that
memory is used up, SGX has its own swapping mechanism which is similar
in concept but totally separate from the core mm/* code. Instead of
swapping to disk, SGX swaps from EPC to normal RAM. That normal RAM
comes from a shared memory pseudo-file and can itself be swapped by the
core mm code. There is a hierarchy like this:
EPC <-> shmem <-> disk
After data is swapped back in from shmem to EPC, the shmem backing
storage needs to be freed. Currently, the backing shmem is not freed.
This effectively wastes the shmem while the enclave is running. The
memory is recovered when the enclave is destroyed and the backing
storage freed.
Sort this out by freeing memory with shmem_truncate_range(), as soon as
a page is faulted back to the EPC. In addition, free the memory for
PCMD pages as soon as all PCMD's in a page have been marked as unused
by zeroing its contents.
Cc: stable@vger.kernel.org
Fixes: 1728ab54b4 ("x86/sgx: Add a page reclaimer")
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220303223859.273187-1-jarkko@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7228918b34 upstream.
As documented, the setup_indirect structure is nested inside
the setup_data structures in the setup_data list. The code currently
accesses the fields inside the setup_indirect structure but only
the sizeof(struct setup_data) is being memremapped. No crash
occurred but this is just due to how the area is remapped under the
covers.
Properly memremap both the setup_data and setup_indirect structures
in these cases before accessing them.
Fixes: b3c72fc9a7 ("x86/boot: Introduce setup_indirect")
Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/1645668456-22036-2-git-send-email-ross.philipson@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4edc076041 upstream.
watch_queue_clear() has a comment stating that setting ->defunct to true
preventing new additions as well as preventing notifications. Whilst
the latter is true, the first bit is superfluous since at the time this
function is called, the pipe cannot be accessed to add new event
sources.
Remove the "new additions" bit from the comment.
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ed147f015 upstream.
There's nothing to synchronise post_one_notification() versus
pipe_read(). Whilst posting is done under pipe->rd_wait.lock, the
reader only takes pipe->mutex which cannot bar notification posting as
that may need to be made from contexts that cannot sleep.
Fix this by setting pipe->head with a barrier in post_one_notification()
and reading pipe->head with a barrier in pipe_read().
If that's not sufficient, the rd_wait.lock will need to be taken,
possibly in a ->confirm() op so that it only applies to notifications.
The lock would, however, have to be dropped before copy_page_to_iter()
is invoked.
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3b4c037192 upstream.
Currently, watch_queue_set_size() sets the number of notes available in
wqueue->nr_notes according to the number of notes allocated, but sets
the size of the bitmap to the unrounded number of notes originally asked
for.
Fix this by setting the bitmap size to the number of notes we're
actually going to make available (ie. the number allocated).
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 96a4d8912b upstream.
The pipe ring size must always be a power of 2 as the head and tail
pointers are masked off by AND'ing with the size of the ring - 1.
watch_queue_set_size(), however, lets you specify any number of notes
between 1 and 511. This number is passed through to pipe_resize_ring()
without checking/forcing its alignment.
Fix this by rounding the number of slots required up to the nearest
power of two. The request is meant to guarantee that at least that many
notifications can be generated before the queue is full, so rounding
down isn't an option, but, alternatively, it may be better to give an
error if we aren't allowed to allocate that much ring space.
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1853fbadc upstream.
When a pipe ring descriptor points to a notification message, the
refcount on the backing page is incremented by the generic get function,
but the release function, which marks the bitmap, doesn't drop the page
ref.
Fix this by calling generic_pipe_buf_release() at the end of
watch_queue_pipe_buf_release().
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db8facfc9f upstream.
In free_pipe_info(), free the watchqueue state after clearing the pipe
ring as each pipe ring descriptor has a release function, and in the
case of a notification message, this is watch_queue_pipe_buf_release()
which tries to mark the allocation bitmap that was previously released.
Fix this by moving the put of the pipe's ref on the watch queue to after
the ring has been cleared. We still need to call watch_queue_clear()
before doing that to make sure that the pipe is disconnected from any
notification sources first.
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c993ee0f9f upstream.
In watch_queue_set_filter(), there are a couple of places where we check
that the filter type value does not exceed what the type_filter bitmap
can hold. One place calculates the number of bits by:
if (tf[i].type >= sizeof(wfilter->type_filter) * 8)
which is fine, but the second does:
if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG)
which is not. This can lead to a couple of out-of-bounds writes due to
a too-large type:
(1) __set_bit() on wfilter->type_filter
(2) Writing more elements in wfilter->filters[] than we allocated.
Fix this by just using the proper WATCH_TYPE__NR instead, which is the
number of types we actually know about.
The bug may cause an oops looking something like:
BUG: KASAN: slab-out-of-bounds in watch_queue_set_filter+0x659/0x740
Write of size 4 at addr ffff88800d2c66bc by task watch_queue_oob/611
...
Call Trace:
<TASK>
dump_stack_lvl+0x45/0x59
print_address_description.constprop.0+0x1f/0x150
...
kasan_report.cold+0x7f/0x11b
...
watch_queue_set_filter+0x659/0x740
...
__x64_sys_ioctl+0x127/0x190
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Allocated by task 611:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x81/0xa0
watch_queue_set_filter+0x23a/0x740
__x64_sys_ioctl+0x127/0x190
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff88800d2c66a0
which belongs to the cache kmalloc-32 of size 32
The buggy address is located 28 bytes inside of
32-byte region [ffff88800d2c66a0, ffff88800d2c66c0)
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c7cb60bff upstream.
When building for Thumb2, the vectors make use of a local label. Sadly,
the Spectre BHB code also uses a local label with the same number which
results in the Thumb2 reference pointing at the wrong place. Fix this
by changing the number used for the Spectre BHB local label.
Fixes: b9baf5c8c5 ("ARM: Spectre-BHB workaround")
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 39bab83b11 upstream.
Only prio 1 is supported for nic mode when there is no ignore flow level
support in firmware. But for switchdev mode, which supports fixed number
of statically pre-allocated prios, this restriction is not relevant so
it can be relaxed.
Fixes: d671e109bd ("net/mlx5: Fix tc max supported prio for nic mode")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4fa59ede95 upstream.
The feature negotiation was designed in a way that
makes it possible for devices to know which config
fields will be accessed by drivers.
This is broken since commit 404123c2db ("virtio: allow drivers to
validate features") with fallout in at least block and net. We have a
partial work-around in commit 2f9a174f91 ("virtio: write back
F_VERSION_1 before validate") which at least lets devices find out which
format should config space have, but this is a partial fix: guests
should not access config space without acknowledging features since
otherwise we'll never be able to change the config space format.
To fix, split finalize_features from virtio_finalize_features and
call finalize_features with all feature bits before validation,
and then - if validation changed any bits - once again after.
Since virtio_finalize_features no longer writes out features
rename it to virtio_features_ok - since that is what it does:
checks that features are ok with the device.
As a side effect, this also reduces the amount of hypervisor accesses -
we now only acknowledge features once unless we are clearing any
features when validating (which is uncommon).
IRC I think that this was more or less always the intent in the spec but
unfortunately the way the spec is worded does not say this explicitly, I
plan to address this at the spec level, too.
Acked-by: Jason Wang <jasowang@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 404123c2db ("virtio: allow drivers to validate features")
Fixes: 2f9a174f91 ("virtio: write back F_VERSION_1 before validate")
Cc: "Halil Pasic" <pasic@linux.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aa6f8dcbab upstream.
Unfortunately, we ended up merging an old version of the patch "fix info
leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
(the swiotlb maintainer), he asked me to create an incremental fix
(after I have pointed this out the mix up, and asked him for guidance).
So here we go.
The main differences between what we got and what was agreed are:
* swiotlb_sync_single_for_device is also required to do an extra bounce
* We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
* The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
must take precedence over DMA_ATTR_SKIP_CPU_SYNC
Thus this patch removes DMA_ATTR_OVERWRITE, and makes
swiotlb_sync_single_for_device() bounce unconditionally (that is, also
when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
data from the swiotlb buffer.
Let me note, that if the size used with dma_sync_* API is less than the
size used with dma_[un]map_*, under certain circumstances we may still
end up with swiotlb not being transparent. In that sense, this is no
perfect fix either.
To get this bullet proof, we would have to bounce the entire
mapping/bounce buffer. For that we would have to figure out the starting
address, and the size of the mapping in
swiotlb_sync_single_for_device(). While this does seem possible, there
seems to be no firm consensus on how things are supposed to work.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: ddbd89deb7 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e2edd6371 upstream.
Commit 18107f8a2d ("arm64: Support execute-only permissions with
Enhanced PAN") re-introduced execute-only permissions when EPAN is
available. When EPAN is not available, arch_filter_pgprot() is supposed
to change a PAGE_EXECONLY permission into PAGE_READONLY_EXEC. However,
if BTI or MTE are present, such check does not detect the execute-only
pgprot in the presence of PTE_GP (BTI) or MT_NORMAL_TAGGED (MTE),
allowing the user to request PROT_EXEC with PROT_BTI or PROT_MTE.
Remove the arch_filter_pgprot() function, change the default VM_EXEC
permissions to PAGE_READONLY_EXEC and update the protection_map[] array
at core_initcall() if EPAN is detected.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 18107f8a2d ("arm64: Support execute-only permissions with Enhanced PAN")
Cc: <stable@vger.kernel.org> # 5.13.x
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a1cc1697bb upstream.
Legacy and old PCI I/O based cards do not support 32-bit I/O addressing.
Since commit 64f160e19e ("PCI: aardvark: Configure PCIe resources from
'ranges' DT property") kernel can set different PCIe address on CPU and
different on the bus for the one A37xx address mapping without any firmware
support in case the bus address does not conflict with other A37xx mapping.
So remap I/O space to the bus address 0x0 to enable support for old legacy
I/O port based cards which have hardcoded I/O ports in low address space.
Note that DDR on A37xx is mapped to bus address 0x0. And mapping of I/O
space can be set to address 0x0 too because MEM space and I/O space are
separate and so do not conflict.
Remapping IO space on Turris Mox to different address is not possible to
due bootloader bug.
Signed-off-by: Pali Rohár <pali@kernel.org>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 76f6386b25 ("arm64: dts: marvell: Add Aardvark PCIe support for Armada 3700")
Cc: stable@vger.kernel.org # 64f160e19e ("PCI: aardvark: Configure PCIe resources from 'ranges' DT property")
Cc: stable@vger.kernel.org # 514ef1e62d ("arm64: dts: marvell: armada-37xx: Extend PCIe MEM space")
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit caf4c86bf1 upstream.
At the moment running osnoise on a nohz_full CPU or uncontested FIFO
priority and a PREEMPT_RCU kernel might have the side effect of
extending grace periods too much. This will entice RCU to force a
context switch on the wayward CPU to end the grace period, all while
introducing unwarranted noise into the tracer. This behaviour is
unavoidable as overly extending grace periods might exhaust the system's
memory.
This same exact problem is what extended quiescent states (EQS) were
created for, conversely, rcu_momentary_dyntick_idle() emulates them by
performing a zero duration EQS. So let's make use of it.
In the common case rcu_momentary_dyntick_idle() is fairly inexpensive:
atomically incrementing a local per-CPU counter and doing a store. So it
shouldn't affect osnoise's measurements (which has a 1us granularity),
so we'll call it unanimously.
The uncommon case involve calling rcu_momentary_dyntick_idle() after
having the osnoise process:
- Receive an expedited quiescent state IPI with preemption disabled or
during an RCU critical section. (activates rdp->cpu_no_qs.b.exp
code-path).
- Being preempted within in an RCU critical section and having the
subsequent outermost rcu_read_unlock() called with interrupts
disabled. (t->rcu_read_unlock_special.b.blocked code-path).
Neither of those are possible at the moment, and are unlikely to be in
the future given the osnoise's loop design. On top of this, the noise
generated by the situations described above is unavoidable, and if not
exposed by rcu_momentary_dyntick_idle() will be eventually seen in
subsequent rcu_read_unlock() calls or schedule operations.
Link: https://lkml.kernel.org/r/20220307180740.577607-1-nsaenzju@redhat.com
Cc: stable@vger.kernel.org
Fixes: bce29ac9ce ("trace: Add osnoise tracer")
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0966d38583 upstream.
RISC-V can do PC-relative jumps with a 32bit range using the following
two instructions:
auipc t0, imm20 ; t0 = PC + imm20 * 2^12
jalr ra, t0, imm12 ; ra = PC + 4, PC = t0 + imm12
Crucially both the 20bit immediate imm20 and the 12bit immediate imm12
are treated as two's-complement signed values. For this reason the
immediates are usually calculated like this:
imm20 = (offset + 0x800) >> 12
imm12 = offset & 0xfff
..where offset is the signed offset from the auipc instruction. When
the 11th bit of offset is 0 the addition of 0x800 doesn't change the top
20 bits and imm12 considered positive. When the 11th bit is 1 the carry
of the addition by 0x800 means imm20 is one higher, but since imm12 is
then considered negative the two's complement representation means it
all cancels out nicely.
However, this addition by 0x800 (2^11) means an offset greater than or
equal to 2^31 - 2^11 would overflow so imm20 is considered negative and
result in a backwards jump. Similarly the lower range of offset is also
moved down by 2^11 and hence the true 32bit range is
[-2^31 - 2^11, 2^31 - 2^11)
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Fixes: e2c0cdfba7 ("RISC-V: User-facing API")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c80ee64a80 upstream.
The alternative mechanism needs runtime code patching, it can't work
on XIP_KERNEL. And the errata workarounds are implemented via the
alternative mechanism. So add !XIP_KERNEL dependency for alternative
and erratas.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Fixes: 44c9225729 ("RISC-V: enable XIP")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0bf476fc36 upstream.
There is an oddity in the way the RSR register flags propagate to the
ISR register (and the actual interrupt output) on this hardware: it
appears that RSR register bits only result in ISR being asserted if the
interrupt was actually enabled at the time, so enabling interrupts with
RSR bits already set doesn't trigger an interrupt to be raised. There
was already a partial fix for this race in the macb_poll function where
it checked for RSR bits being set and re-triggered NAPI receive.
However, there was a still a race window between checking RSR and
actually enabling interrupts, where a lost wakeup could happen. It's
necessary to check again after enabling interrupts to see if RSR was set
just prior to the interrupt being enabled, and re-trigger receive in that
case.
This issue was noticed in a point-to-point UDP request-response protocol
which periodically saw timeouts or abnormally high response times due to
received packets not being processed in a timely fashion. In many
applications, more packets arriving, including TCP retransmissions, would
cause the original packet to be processed, thus masking the issue.
Fixes: 02f7a34f34 ("net: macb: Re-enable RX interrupt only when RX is done")
Cc: stable@vger.kernel.org
Co-developed-by: Scott McNutt <scott.mcnutt@siriusxm.com>
Signed-off-by: Scott McNutt <scott.mcnutt@siriusxm.com>
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>