Changes in 5.10.198
NFS: Use the correct commit info in nfs_join_page_group()
NFS/pNFS: Report EINVAL errors from connect() to the server
SUNRPC: Mark the cred for revalidation if the server rejects it
tracing: Increase trace array ref count on enable and filter files
ata: ahci: Drop pointless VPRINTK() calls and convert the remaining ones
ata: libahci: clear pending interrupt status
ext4: remove the 'group' parameter of ext4_trim_extent
ext4: add new helper interface ext4_try_to_trim_range()
ext4: scope ret locally in ext4_try_to_trim_range()
ext4: change s_last_trim_minblks type to unsigned long
ext4: mark group as trimmed only if it was fully scanned
ext4: replace the traditional ternary conditional operator with with max()/min()
ext4: move setting of trimmed bit into ext4_try_to_trim_range()
ext4: do not let fstrim block system suspend
tracing: Have event inject files inc the trace array ref count
netfilter: nf_tables: integrate pipapo into commit protocol
netfilter: nf_tables: don't skip expired elements during walk
netfilter: nf_tables: GC transaction API to avoid race with control plane
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nft_set_hash: mark set element as dead when deleting from packet path
netfilter: nf_tables: remove busy mark and gc batch API
netfilter: nf_tables: don't fail inserts if duplicate has expired
netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
netfilter: nf_tables: GC transaction race with netns dismantle
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails
netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
netfilter: nf_tables: fix memleak when more than 255 elements expired
ASoC: meson: spdifin: start hw on dai probe
netfilter: nf_tables: disallow element removal on anonymous sets
bpf: Avoid deadlock when using queue and stack maps from NMI
selftests/tls: Add {} to avoid static checker warning
selftests: tls: swap the TX and RX sockets in some tests
ASoC: imx-audmix: Fix return error with devm_clk_get()
i40e: Fix VF VLAN offloading when port VLAN is configured
ipv4: fix null-deref in ipv4_link_failure
powerpc/perf/hv-24x7: Update domain value check
dccp: fix dccp_v4_err()/dccp_v6_err() again
platform/x86: intel_scu_ipc: Check status after timeout in busy_loop()
platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt()
platform/x86: intel_scu_ipc: Don't override scu in intel_scu_ipc_dev_simple_command()
platform/x86: intel_scu_ipc: Fail IPC send if still busy
x86/srso: Fix srso_show_state() side effect
x86/srso: Fix SBPB enablement for spec_rstack_overflow=off
net: hns3: only enable unicast promisc when mac table full
net: hns3: add 5ms delay before clear firmware reset irq source
net: bridge: use DEV_STATS_INC()
team: fix null-ptr-deref when team device type is changed
netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
seqlock: avoid -Wshadow warnings
seqlock: Rename __seqprop() users
seqlock: Prefix internal seqcount_t-only macros with a "do_"
locking/seqlock: Do the lockdep annotation before locking in do_write_seqcount_begin_nested()
bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI
net: rds: Fix possible NULL-pointer dereference
gpio: tb10x: Fix an error handling path in tb10x_gpio_probe()
i2c: mux: demux-pinctrl: check the return value of devm_kstrdup()
netfilter: nf_tables: unregister flowtable hooks on netns exit
netfilter: nf_tables: double hook unregistration in netns path
Input: i8042 - rename i8042-x86ia64io.h to i8042-acpipnpio.h
Input: i8042 - add quirk for TUXEDO Gemini 17 Gen1/Clevo PD70PN
mmc: renesas_sdhi: probe into TMIO after SCC parameters have been setup
mmc: renesas_sdhi: populate SCC pointer at the proper place
mmc: tmio: support custom irq masks
mmc: renesas_sdhi: register irqs before registering controller
media: venus: core: Add io base variables for each block
media: venus: hfi,pm,firmware: Convert to block relative addressing
media: venus: hfi: Define additional 6xx registers
media: venus: core: Add differentiator IS_V6(core)
media: venus: hfi: Add a 6xx boot logic
media: venus: hfi_venus: Write to VIDC_CTRL_INIT after unmasking interrupts
netfilter: use actual socket sk for REJECT action
netfilter: nft_exthdr: Support SCTP chunks
netfilter: nf_tables: add and use nft_sk helper
netfilter: nf_tables: add and use nft_thoff helper
netfilter: nft_exthdr: break evaluation if setting TCP option fails
netfilter: exthdr: add support for tcp option removal
netfilter: nft_exthdr: Fix non-linear header modification
ata: libata: Rename link flag ATA_LFLAG_NO_DB_DELAY
ata: ahci: Add support for AMD A85 FCH (Hudson D4)
ata: ahci: Rename board_ahci_mobile
ata: ahci: Add Elkhart Lake AHCI controller
btrfs: reset destination buffer when read_extent_buffer() gets invalid range
MIPS: Alchemy: only build mmc support helpers if au1xmmc is enabled
bus: ti-sysc: Use fsleep() instead of usleep_range() in sysc_reset()
bus: ti-sysc: Fix missing AM35xx SoC matching
clk: tegra: fix error return case for recalc_rate
ARM: dts: omap: correct indentation
ARM: dts: ti: omap: Fix bandgap thermal cells addressing for omap3/4
ARM: dts: motorola-mapphone: Configure lower temperature passive cooling
ARM: dts: motorola-mapphone: Add 1.2GHz OPP
ARM: dts: motorola-mapphone: Drop second ti,wlcore compatible value
ARM: dts: am335x: Guardian: Update beeper label
ARM: dts: Unify pwm-omap-dmtimer node names
ARM: dts: ti: omap: motorola-mapphone: Fix abe_clkctrl warning on boot
bus: ti-sysc: Fix SYSC_QUIRK_SWSUP_SIDLE_ACT handling for uart wake-up
power: supply: ucs1002: fix error code in ucs1002_get_property()
xtensa: add default definition for XCHAL_HAVE_DIV32
xtensa: iss/network: make functions static
xtensa: boot: don't add include-dirs
xtensa: boot/lib: fix function prototypes
gpio: pmic-eic-sprd: Add can_sleep flag for PMIC EIC chip
i2c: npcm7xx: Fix callback completion ordering
dma-debug: don't call __dma_entry_alloc_check_leak() under free_entries_lock
parisc: sba: Fix compile warning wrt list of SBA devices
parisc: iosapic.c: Fix sparse warnings
parisc: drivers: Fix sparse warning
parisc: irq: Make irq_stack_union static to avoid sparse warning
scsi: qedf: Add synchronization between I/O completions and abort
selftests/ftrace: Correctly enable event in instance-event.tc
ring-buffer: Avoid softlockup in ring_buffer_resize()
selftests: fix dependency checker script
ring-buffer: Do not attempt to read past "commit"
platform/mellanox: mlxbf-bootctl: add NET dependency into Kconfig
scsi: pm80xx: Use phy-specific SAS address when sending PHY_START command
scsi: pm80xx: Avoid leaking tags when processing OPC_INB_SET_CONTROLLER_CONFIG command
ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
spi: nxp-fspi: reset the FLSHxCR1 registers
bpf: Clarify error expectations from bpf_clone_redirect
media: vb2: frame_vector.c: replace WARN_ONCE with a comment
powerpc/watchpoints: Disable preemption in thread_change_pc()
ncsi: Propagate carrier gain/loss events to the NCSI controller
fbdev/sh7760fb: Depend on FB=y
perf build: Define YYNOMEM as YYNOABORT for bison < 3.81
sched/cpuacct: Fix user/system in shown cpuacct.usage*
sched/cpuacct: Fix charge percpu cpuusage
sched/cpuacct: Optimize away RCU read lock
cgroup: Fix suspicious rcu_dereference_check() usage warning
ACPI: Check StorageD3Enable _DSD property in ACPI code
nvme-pci: factor the iod mempool creation into a helper
nvme-pci: factor out a nvme_pci_alloc_dev helper
nvme-pci: do not set the NUMA node of device if it has none
watchdog: iTCO_wdt: No need to stop the timer in probe
watchdog: iTCO_wdt: Set NO_REBOOT if the watchdog is not already running
netfilter: nft_exthdr: Search chunks in SCTP packets only
netfilter: nft_exthdr: Fix for unsafe packet data read
nvme-pci: always return an ERR_PTR from nvme_pci_alloc_dev
smack: Record transmuting in smk_transmuted
smack: Retrieve transmuting information in smack_inode_getsecurity()
Smack:- Use overlay inode label in smack_inode_copy_up()
Revert "tty: n_gsm: fix UAF in gsm_cleanup_mux"
serial: 8250_port: Check IRQ data before use
nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
netfilter: nf_tables: disallow rule removal from chain binding
ALSA: hda: Disable power save for solving pop issue on Lenovo ThinkCentre M70q
ata: libata-scsi: ignore reserved bits for REPORT SUPPORTED OPERATION CODES
i2c: i801: unregister tco_pdev in i801_probe() error path
Revert "SUNRPC dont update timeout value on connection reset"
proc: nommu: /proc/<pid>/maps: release mmap read lock
ring-buffer: Update "shortest_full" in polling
btrfs: properly report 0 avail for very full file systems
bpf: Fix BTF_ID symbol generation collision
bpf: Fix BTF_ID symbol generation collision in tools/
net: thunderbolt: Fix TCPv6 GSO checksum calculation
ata: libata-core: Fix ata_port_request_pm() locking
ata: libata-core: Fix port and device removal
ata: libata-core: Do not register PM operations for SAS ports
ata: libata-sata: increase PMP SRST timeout to 10s
fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
spi: spi-zynqmp-gqspi: Fix runtime PM imbalance in zynqmp_qspi_probe
spi: zynqmp-gqspi: fix clock imbalance on probe failure
NFS: Cleanup unused rpc_clnt variable
NFS: rename nfs_client_kset to nfs_kset
NFSv4: Fix a state manager thread deadlock regression
ring-buffer: remove obsolete comment for free_buffer_page()
ring-buffer: Fix bytes info in per_cpu buffer stats
drm/mediatek: Fix backport issue in mtk_drm_gem_prime_vmap()
rbd: move rbd_dev_refresh() definition
rbd: decouple header read-in from updating rbd_dev->header
rbd: decouple parent info read-in from updating rbd_dev
rbd: take header_rwsem in rbd_dev_refresh() only when updating
block: fix use-after-free of q->q_usage_counter
Revert "clk: imx: pll14xx: dynamically configure PLL for 393216000/361267200Hz"
Revert "PCI: qcom: Disable write access to read only registers for IP v2.3.3"
scsi: zfcp: Fix a double put in zfcp_port_enqueue()
qed/red_ll2: Fix undefined behavior bug in struct qed_ll2_info
wifi: mwifiex: Fix tlv_buf_left calculation
net: replace calls to sock->ops->connect() with kernel_connect()
net: prevent rewrite of msg_name in sock_sendmsg()
arm64: Add Cortex-A520 CPU part definition
ubi: Refuse attaching if mtd's erasesize is 0
wifi: iwlwifi: dbg_ini: fix structure packing
wifi: mwifiex: Fix oob check condition in mwifiex_process_rx_packet
bpf: Fix tr dereferencing
drivers/net: process the result of hdlc_open() and add call of hdlc_close() in uhdlc_close()
wifi: mt76: mt76x02: fix MT76x0 external LNA gain handling
regmap: rbtree: Fix wrong register marked as in-cache when creating new node
ima: Finish deprecation of IMA_TRUSTED_KEYRING Kconfig
scsi: target: core: Fix deadlock due to recursive locking
ima: rework CONFIG_IMA dependency block
NFSv4: Fix a nfs4_state_manager() race
modpost: add missing else to the "of" check
net: fix possible store tearing in neigh_periodic_work()
ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()
net: dsa: mv88e6xxx: Avoid EEPROM timeout when EEPROM is absent
net: usb: smsc75xx: Fix uninit-value access in __smsc75xx_read_reg
net: nfc: llcp: Add lock when modifying device list
net: ethernet: ti: am65-cpsw: Fix error code in am65_cpsw_nuss_init_tx_chns()
netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp
netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure
net: stmmac: dwmac-stm32: fix resume on STM32 MCU
tipc: fix a potential deadlock on &tx->lock
tcp: fix quick-ack counting to count actual ACKs of new data
tcp: fix delayed ACKs for MSS boundary condition
sctp: update transport state when processing a dupcook packet
sctp: update hb timer immediately after users change hb_interval
cpupower: add Makefile dependencies for install targets
dm zoned: free dmz->ddev array in dmz_put_zoned_devices
RDMA/core: Require admin capabilities to set system parameters
of: dynamic: Fix potential memory leak in of_changeset_action()
IB/mlx4: Fix the size of a buffer in add_port_entries()
gpio: aspeed: fix the GPIO number passed to pinctrl_gpio_set_config()
gpio: pxa: disable pinctrl calls for MMP_GPIO
RDMA/cma: Initialize ib_sa_multicast structure to 0 when join
RDMA/cma: Fix truncation compilation warning in make_cma_ports
RDMA/uverbs: Fix typo of sizeof argument
RDMA/siw: Fix connection failure handling
RDMA/mlx5: Fix NULL string error
parisc: Restore __ldcw_align for PA-RISC 2.0 processors
netfilter: nf_tables: fix kdoc warnings after gc rework
netfilter: nftables: exthdr: fix 4-byte stack OOB write
mmc: renesas_sdhi: only reset SCC when its pointer is populated
xen/events: replace evtchn_rwlock with RCU
Linux 5.10.198
Change-Id: Iabfdf919ae63e41a565e523087d800ebc20e5448
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Changes in 5.10.197
autofs: fix memory leak of waitqueues in autofs_catatonic_mode
btrfs: output extra debug info if we failed to find an inline backref
locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lock
ACPICA: Add AML_NO_OPERAND_RESOLVE flag to Timer
kernel/fork: beware of __put_task_struct() calling context
rcuscale: Move rcu_scale_writer() schedule_timeout_uninterruptible() to _idle()
scftorture: Forgive memory-allocation failure if KASAN
ACPI: video: Add backlight=native DMI quirk for Lenovo Ideapad Z470
perf/smmuv3: Enable HiSilicon Erratum 162001900 quirk for HIP08/09
ACPI: video: Add backlight=native DMI quirk for Apple iMac12,1 and iMac12,2
hw_breakpoint: fix single-stepping when using bpf_overflow_handler
devlink: remove reload failed checks in params get/set callbacks
crypto: lrw,xts - Replace strlcpy with strscpy
wifi: ath9k: fix fortify warnings
wifi: ath9k: fix printk specifier
wifi: mwifiex: fix fortify warning
wifi: wil6210: fix fortify warnings
crypto: lib/mpi - avoid null pointer deref in mpi_cmp_ui()
tpm_tis: Resend command to recover from data transfer errors
mmc: sdhci-esdhc-imx: improve ESDHC_FLAG_ERR010450
alx: fix OOB-read compiler warning
netfilter: ebtables: fix fortify warnings in size_entry_mwt()
wifi: mac80211_hwsim: drop short frames
drm/bridge: tc358762: Instruct DSI host to generate HSE packets
samples/hw_breakpoint: Fix kernel BUG 'invalid opcode: 0000'
ALSA: hda: intel-dsp-cfg: add LunarLake support
drm/exynos: fix a possible null-pointer dereference due to data race in exynos_drm_crtc_atomic_disable()
bus: ti-sysc: Configure uart quirks for k3 SoC
md: raid1: fix potential OOB in raid1_remove_disk()
ext2: fix datatype of block number in ext2_xattr_set2()
fs/jfs: prevent double-free in dbUnmount() after failed jfs_remount()
jfs: fix invalid free of JFS_IP(ipimap)->i_imap in diUnmount
powerpc/pseries: fix possible memory leak in ibmebus_bus_init()
media: dvb-usb-v2: af9035: Fix null-ptr-deref in af9035_i2c_master_xfer
media: dw2102: Fix null-ptr-deref in dw2102_i2c_transfer()
media: af9005: Fix null-ptr-deref in af9005_i2c_xfer
media: anysee: fix null-ptr-deref in anysee_master_xfer
media: az6007: Fix null-ptr-deref in az6007_i2c_xfer()
media: dvb-usb-v2: gl861: Fix null-ptr-deref in gl861_i2c_master_xfer
media: tuners: qt1010: replace BUG_ON with a regular error
media: pci: cx23885: replace BUG with error return
usb: gadget: fsl_qe_udc: validate endpoint index for ch9 udc
scsi: target: iscsi: Fix buffer overflow in lio_target_nacl_info_show()
serial: cpm_uart: Avoid suspicious locking
media: pci: ipu3-cio2: Initialise timing struct to avoid a compiler warning
kobject: Add sanity check for kset->kobj.ktype in kset_register()
mtd: rawnand: brcmnand: Allow SoC to provide I/O operations
mtd: rawnand: brcmnand: Fix ECC level field setting for v7.2 controller
perf jevents: Make build dependency on test JSONs
perf tools: Add an option to build without libbfd
btrfs: move btrfs_pinned_by_swapfile prototype into volumes.h
btrfs: add a helper to read the superblock metadata_uuid
btrfs: compare the correct fsid/metadata_uuid in btrfs_validate_super
drm: gm12u320: Fix the timeout usage for usb_bulk_msg()
scsi: qla2xxx: Fix NULL vs IS_ERR() bug for debugfs_create_dir()
selftests: tracing: Fix to unmount tracefs for recovering environment
scsi: lpfc: Fix the NULL vs IS_ERR() bug for debugfs_create_file()
x86/boot/compressed: Reserve more memory for page tables
samples/hw_breakpoint: fix building without module unloading
md/raid1: fix error: ISO C90 forbids mixed declarations
attr: block mode changes of symlinks
ovl: fix incorrect fdput() on aio completion
btrfs: fix lockdep splat and potential deadlock after failure running delayed items
btrfs: release path before inode lookup during the ino lookup ioctl
drm/amdgpu: fix amdgpu_cs_p1_user_fence
net/sched: Retire rsvp classifier
proc: fix a dentry lock race between release_task and lookup
mm/filemap: fix infinite loop in generic_file_buffered_read()
drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma
tracing: Have current_trace inc the trace array ref count
tracing: Have option files inc the trace array ref count
nfsd: fix change_info in NFSv4 RENAME replies
tracefs: Add missing lockdown check to tracefs_create_dir()
i2c: aspeed: Reset the i2c controller when timeout occurs
ata: libata: disallow dev-initiated LPM transitions to unsupported states
scsi: megaraid_sas: Fix deadlock on firmware crashdump
scsi: pm8001: Setup IRQs on resume
ext4: fix rec_len verify error
Linux 5.10.197
Change-Id: Ic8626d7d13ec54d438c4d80efe1f8b6bddeb84a8
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 45d99ea451 ]
The 'bytes' info in file 'per_cpu/cpu<X>/stats' means the number of
bytes in cpu buffer that have not been consumed. However, currently
after consuming data by reading file 'trace_pipe', the 'bytes' info
was not changed as expected.
# cat per_cpu/cpu0/stats
entries: 0
overrun: 0
commit overrun: 0
bytes: 568 <--- 'bytes' is problematical !!!
oldest event ts: 8651.371479
now ts: 8653.912224
dropped events: 0
read events: 8
The root cause is incorrect stat on cpu_buffer->read_bytes. To fix it:
1. When stat 'read_bytes', account consumed event in rb_advance_reader();
2. When stat 'entries_bytes', exclude the discarded padding event which
is smaller than minimum size because it is invisible to reader. Then
use rb_page_commit() instead of BUF_PAGE_SIZE at where accounting for
page-based read/remove/overrun.
Also correct the comments of ring_buffer_bytes_cpu() in this patch.
Link: https://lore.kernel.org/linux-trace-kernel/20230921125425.1708423-1-zhengyejian1@huawei.com
Cc: stable@vger.kernel.org
Fixes: c64e148a3b ("trace: Add ring buffer stats to measure rate of events")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 1e0cb399c7 upstream.
It was discovered that the ring buffer polling was incorrectly stating
that read would not block, but that's because polling did not take into
account that reads will block if the "buffer-percent" was set. Instead,
the ring buffer polling would say reads would not block if there was any
data in the ring buffer. This was incorrect behavior from a user space
point of view. This was fixed by commit 42fb0a1e84 by having the polling
code check if the ring buffer had more data than what the user specified
"buffer percent" had.
The problem now is that the polling code did not register itself to the
writer that it wanted to wait for a specific "full" value of the ring
buffer. The result was that the writer would wake the polling waiter
whenever there was a new event. The polling waiter would then wake up, see
that there's not enough data in the ring buffer to notify user space and
then go back to sleep. The next event would wake it up again.
Before the polling fix was added, the code would wake up around 100 times
for a hackbench 30 benchmark. After the "fix", due to the constant waking
of the writer, it would wake up over 11,0000 times! It would never leave
the kernel, so the user space behavior was still "correct", but this
definitely is not the desired effect.
To fix this, have the polling code add what it's waiting for to the
"shortest_full" variable, to tell the writer not to wake it up if the
buffer is not as full as it expects to be.
Note, after this fix, it appears that the waiter is now woken up around 2x
the times it was before (~200). This is a tremendous improvement from the
11,000 times, but I will need to spend some time to see why polling is
more aggressive in its wakeups than the read blocking code.
Link: https://lore.kernel.org/linux-trace-kernel/20230929180113.01c2cae3@rorschach.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: 42fb0a1e84 ("tracing/ring-buffer: Have polling block on watermark")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Tested-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dd02d4234c upstream.
cpuacct has 2 different ways of accounting and showing user
and system times.
The first one uses cpuacct_account_field() to account times
and cpuacct.stat file to expose them. And this one seems to work ok.
The second one is uses cpuacct_charge() function for accounting and
set of cpuacct.usage* files to show times. Despite some attempts to
fix it in the past it still doesn't work. Sometimes while running KVM
guest the cpuacct_charge() accounts most of the guest time as
system time. This doesn't match with user&system times shown in
cpuacct.stat or proc/<pid>/stat.
Demonstration:
# git clone https://github.com/aryabinin/kvmsample
# make
# mkdir /sys/fs/cgroup/cpuacct/test
# echo $$ > /sys/fs/cgroup/cpuacct/test/tasks
# ./kvmsample &
# for i in {1..5}; do cat /sys/fs/cgroup/cpuacct/test/cpuacct.usage_sys; sleep 1; done
1976535645
2979839428
3979832704
4983603153
5983604157
Use cpustats accounted in cpuacct_account_field() as the source
of user/sys times for cpuacct.usage* files. Make cpuacct_charge()
to account only summary execution time.
Fixes: d740037fac ("sched/cpuacct: Split usage accounting into user_usage and sys_usage")
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20211115164607.23784-3-arbn@yandex-team.com
[OP: adjusted context for v5.10]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 95a404bd60 ]
When iterating over the ring buffer while the ring buffer is active, the
writer can corrupt the reader. There's barriers to help detect this and
handle it, but that code missed the case where the last event was at the
very end of the page and has only 4 bytes left.
The checks to detect the corruption by the writer to reads needs to see the
length of the event. If the length in the first 4 bytes is zero then the
length is stored in the second 4 bytes. But if the writer is in the process
of updating that code, there's a small window where the length in the first
4 bytes could be zero even though the length is only 4 bytes. That will
cause rb_event_length() to read the next 4 bytes which could happen to be off the
allocated page.
To protect against this, fail immediately if the next event pointer is
less than 8 bytes from the end of the commit (last byte of data), as all
events must be a minimum of 8 bytes anyway.
Link: https://lore.kernel.org/all/20230905141245.26470-1-Tze-nan.Wu@mediatek.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230907122820.0899019c@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Reported-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f6bd2c9248 ]
When user resize all trace ring buffer through file 'buffer_size_kb',
then in ring_buffer_resize(), kernel allocates buffer pages for each
cpu in a loop.
If the kernel preemption model is PREEMPT_NONE and there are many cpus
and there are many buffer pages to be allocated, it may not give up cpu
for a long time and finally cause a softlockup.
To avoid it, call cond_resched() after each cpu buffer allocation.
Link: https://lore.kernel.org/linux-trace-kernel/20230906081930.3939106-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a34a9f1a19 ]
Sysbot discovered that the queue and stack maps can deadlock if they are
being used from a BPF program that can be called from NMI context (such as
one that is attached to a perf HW counter event). To fix this, add an
in_nmi() check and use raw_spin_trylock() in NMI context, erroring out if
grabbing the lock fails.
Fixes: f1a2e44a3a ("bpf: add queue and stack maps")
Reported-by: Hsin-Wei Hung <hsinweih@uci.edu>
Tested-by: Hsin-Wei Hung <hsinweih@uci.edu>
Co-developed-by: Hsin-Wei Hung <hsinweih@uci.edu>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20230911132815.717240-1-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This reverts commit 0c0547d2a6 which is
commit c2489bb7e6 upstream.
It breaks the Android kabi and is not needed for Android systems. If it
is needed in the future, it can be brought back in an abi-safe way.
Bug: 161946584
Change-Id: I014d4486d85641031f816da38b00c593dcb8eae6
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 5103216b86 which is
commit 3d07fa1dd1 upstream.
The commit it fixes is about to be reverted, so also revert it.
Bug: 161946584
Change-Id: I0c442ffd94cfe75b8d61318d2913de9b818ba7f3
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Recently we have discovered many lag issues caused by percpu_rwsem
lock-holding tasks not being scheduled for a long time. we need to
identify them and provide appropriate scheduling protection in our
oem scheduler.
To support this, we add one hook below:
trace_android_vh_percpu_rwsem_wq_add
Bug: 301066838
Change-Id: Id770c1a7978842abfc62d3fa9aeb5ac7a1904972
Signed-off-by: xieliujie <xieliujie@oppo.com>
(cherry picked from commit f451f4a599)
And update the cpu regs when it is soft lock.
Fixes: d851edc401 ("watchdog/hardlockup: add hardlock_notifier_list")
Signed-off-by: Huibin Hong <huibin.hong@rock-chips.com>
Change-Id: Id695ee4e7a80bfde2c15891f19e98af092f74a01
[ Upstream commit 013608cd08 ]
Kernels built with CONFIG_KASAN=y quarantine newly freed memory in order
to better detect use-after-free errors. However, this can exhaust memory
more quickly in allocator-heavy tests, which can result in spurious
scftorture failure. This commit therefore forgives memory-allocation
failure in kernels built with CONFIG_KASAN=y, but continues counting
the errors for use in detailed test-result analyses.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d243b34459 ]
Under PREEMPT_RT, __put_task_struct() indirectly acquires sleeping
locks. Therefore, it can't be called from an non-preemptible context.
One practical example is splat inside inactive_task_timer(), which is
called in a interrupt context:
CPU: 1 PID: 2848 Comm: life Kdump: loaded Tainted: G W ---------
Hardware name: HP ProLiant DL388p Gen8, BIOS P70 07/15/2012
Call Trace:
dump_stack_lvl+0x57/0x7d
mark_lock_irq.cold+0x33/0xba
mark_lock+0x1e7/0x400
mark_usage+0x11d/0x140
__lock_acquire+0x30d/0x930
lock_acquire.part.0+0x9c/0x210
rt_spin_lock+0x27/0xe0
refill_obj_stock+0x3d/0x3a0
kmem_cache_free+0x357/0x560
inactive_task_timer+0x1ad/0x340
__run_hrtimer+0x8a/0x1a0
__hrtimer_run_queues+0x91/0x130
hrtimer_interrupt+0x10f/0x220
__sysvec_apic_timer_interrupt+0x7b/0xd0
sysvec_apic_timer_interrupt+0x4f/0xd0
asm_sysvec_apic_timer_interrupt+0x12/0x20
RIP: 0033:0x7fff196bf6f5
Instead of calling __put_task_struct() directly, we defer it using
call_rcu(). A more natural approach would use a workqueue, but since
in PREEMPT_RT, we can't allocate dynamic memory from atomic context,
the code would become more complex because we would need to put the
work_struct instance in the task_struct and initialize it when we
allocate a new task_struct.
The issue is reproducible with stress-ng:
while true; do
stress-ng --sched deadline --sched-period 1000000000 \
--sched-runtime 800000000 --sched-deadline \
1000000000 --mmapfork 23 -t 20
done
Reported-by: Hu Chunyu <chuhu@redhat.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Valentin Schneider <vschneid@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230614122323.37957-2-wander@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 5.10.195
erofs: ensure that the post-EOF tails are all zeroed
ARM: pxa: remove use of symbol_get()
mmc: au1xmmc: force non-modular build and remove symbol_get usage
net: enetc: use EXPORT_SYMBOL_GPL for enetc_phc_index
rtc: ds1685: use EXPORT_SYMBOL_GPL for ds1685_rtc_poweroff
modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules
USB: serial: option: add Quectel EM05G variant (0x030e)
USB: serial: option: add FOXCONN T99W368/T99W373 product
usb: dwc3: meson-g12a: do post init to fix broken usb after resumption
usb: chipidea: imx: improve logic if samsung,picophy-* parameter is 0
HID: wacom: remove the battery when the EKR is off
staging: rtl8712: fix race condition
Bluetooth: btsdio: fix use after free bug in btsdio_remove due to race condition
configfs: fix a race in configfs_lookup()
serial: qcom-geni: fix opp vote on shutdown
serial: sc16is7xx: fix broken port 0 uart init
serial: sc16is7xx: fix bug when first setting GPIO direction
firmware: stratix10-svc: Fix an NULL vs IS_ERR() bug in probe
fsi: master-ast-cf: Add MODULE_FIRMWARE macro
nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()
nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse
pinctrl: amd: Don't show `Invalid config param` errors
ASoC: rt5682: Fix a problem with error handling in the io init function of the soundwire
ARM: dts: imx: update sdma node name format
ARM: dts: imx7s: Drop dma-apb interrupt-names
ARM: dts: imx: Adjust dma-apbh node name
ARM: dts: imx: Set default tuning step for imx7d usdhc
phy: qcom-snps-femto-v2: use qcom_snps_hsphy_suspend/resume error code
media: pulse8-cec: handle possible ping error
media: pci: cx23885: fix error handling for cx23885 ATSC boards
9p: virtio: make sure 'offs' is initialized in zc_request
ASoC: da7219: Flush pending AAD IRQ when suspending
ASoC: da7219: Check for failure reading AAD IRQ events
ethernet: atheros: fix return value check in atl1c_tso_csum()
vxlan: generalize vxlan_parse_gpe_hdr and remove unused args
m68k: Fix invalid .section syntax
s390/dasd: use correct number of retries for ERP requests
s390/dasd: fix hanging device after request requeue
fs/nls: make load_nls() take a const parameter
ASoc: codecs: ES8316: Fix DMIC config
ASoC: atmel: Fix the 8K sample parameter in I2SC master
platform/x86: intel: hid: Always call BTNL ACPI method
platform/x86: huawei-wmi: Silence ambient light sensor
drm/amd/display: Exit idle optimizations before attempt to access PHY
ovl: Always reevaluate the file signature for IMA
ata: pata_arasan_cf: Use dev_err_probe() instead dev_err() in data_xfer()
security: keys: perform capable check only on privileged operations
kprobes: Prohibit probing on CFI preamble symbol
clk: fixed-mmio: make COMMON_CLK_FIXED_MMIO depend on HAS_IOMEM
vmbus_testing: fix wrong python syntax for integer value comparison
net: usb: qmi_wwan: add Quectel EM05GV2
idmaengine: make FSL_EDMA and INTEL_IDMA64 depends on HAS_IOMEM
scsi: qedi: Fix potential deadlock on &qedi_percpu->p_work_lock
netlabel: fix shift wrapping bug in netlbl_catmap_setlong()
bnx2x: fix page fault following EEH recovery
sctp: handle invalid error codes without calling BUG()
scsi: storvsc: Always set no_report_opcodes
ALSA: seq: oss: Fix racy open/close of MIDI devices
tracing: Introduce pipe_cpumask to avoid race on trace_pipes
platform/mellanox: Fix mlxbf-tmfifo not handling all virtio CONSOLE notifications
net: Avoid address overwrite in kernel_connect
udf: Check consistency of Space Bitmap Descriptor
udf: Handle error when adding extent to a file
Revert "net: macsec: preserve ingress frame ordering"
reiserfs: Check the return value from __getblk()
eventfd: Export eventfd_ctx_do_read()
eventfd: prevent underflow for eventfd semaphores
fs: Fix error checking for d_hash_and_lookup()
tmpfs: verify {g,u}id mount options correctly
selftests/harness: Actually report SKIP for signal tests
refscale: Fix uninitalized use of wait_queue_head_t
OPP: Fix passing 0 to PTR_ERR in _opp_attach_genpd()
selftests/resctrl: Don't leak buffer in fill_cache()
selftests/resctrl: Unmount resctrl FS if child fails to run benchmark
selftests/resctrl: Close perf value read fd on errors
x86/decompressor: Don't rely on upper 32 bits of GPRs being preserved
perf/imx_ddr: don't enable counter0 if none of 4 counters are used
s390/pkey: fix/harmonize internal keyblob headers
s390/paes: fix PKEY_TYPE_EP11_AES handling for secure keyblobs
x86/efistub: Fix PCI ROM preservation in mixed mode
cpufreq: powernow-k8: Use related_cpus instead of cpus in driver.exit()
bpftool: Use a local bpf_perf_event_value to fix accessing its fields
bpf: Clear the probe_addr for uprobe
tcp: tcp_enter_quickack_mode() should be static
hwrng: nomadik - keep clock enabled while hwrng is registered
regmap: rbtree: Use alloc_flags for memory allocations
udp: re-score reuseport groups when connected sockets are present
bpf: reject unhashed sockets in bpf_sk_assign
wifi: mt76: testmode: add nla_policy for MT76_TM_ATTR_TX_LENGTH
spi: tegra20-sflash: fix to check return value of platform_get_irq() in tegra_sflash_probe()
can: gs_usb: gs_usb_receive_bulk_callback(): count RX overflow errors also in case of OOM
wifi: mwifiex: Fix OOB and integer underflow when rx packets
wifi: mwifiex: fix error recovery in PCIE buffer descriptor management
selftests/bpf: fix static assert compilation issue for test_cls_*.c
crypto: stm32 - Properly handle pm_runtime_get failing
crypto: api - Use work queue in crypto_destroy_instance
Bluetooth: nokia: fix value check in nokia_bluetooth_serdev_probe()
Bluetooth: Fix potential use-after-free when clear keys
net: tcp: fix unexcepted socket die when snd_wnd is 0
selftests/bpf: Clean up fmod_ret in bench_rename test script
ice: ice_aq_check_events: fix off-by-one check when filling buffer
crypto: caam - fix unchecked return value error
hwrng: iproc-rng200 - Implement suspend and resume calls
lwt: Fix return values of BPF xmit ops
lwt: Check LWTUNNEL_XMIT_CONTINUE strictly
fs: ocfs2: namei: check return value of ocfs2_add_entry()
wifi: mwifiex: fix memory leak in mwifiex_histogram_read()
wifi: mwifiex: Fix missed return in oob checks failed path
samples/bpf: fix broken map lookup probe
wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
wifi: ath9k: protect WMI command response buffer replacement with a lock
wifi: mwifiex: avoid possible NULL skb pointer dereference
Bluetooth: btusb: Do not call kfree_skb() under spin_lock_irqsave()
wifi: ath9k: use IS_ERR() with debugfs_create_dir()
net: arcnet: Do not call kfree_skb() under local_irq_disable()
mlxsw: i2c: Fix chunk size setting in output mailbox buffer
mlxsw: i2c: Limit single transaction buffer size
hwmon: (tmp513) Fix the channel number in tmp51x_is_visible()
net/sched: sch_hfsc: Ensure inner classes have fsc curve
netrom: Deny concurrent connect().
drm/bridge: tc358764: Fix debug print parameter order
quota: factor out dquot_write_dquot()
quota: rename dquot_active() to inode_quota_active()
quota: add new helper dquot_active()
quota: fix dqput() to follow the guarantees dquot_srcu should provide
ASoC: stac9766: fix build errors with REGMAP_AC97
soc: qcom: ocmem: Add OCMEM hardware version print
soc: qcom: ocmem: Fix NUM_PORTS & NUM_MACROS macros
arm64: dts: qcom: msm8996: Add missing interrupt to the USB2 controller
drm/amdgpu: avoid integer overflow warning in amdgpu_device_resize_fb_bar()
ARM: dts: BCM5301X: Harmonize EHCI/OHCI DT nodes name
ARM: dts: BCM53573: Describe on-SoC BCM53125 rev 4 switch
ARM: dts: BCM53573: Drop nonexistent #usb-cells
ARM: dts: BCM53573: Add cells sizes to PCIe node
ARM: dts: BCM53573: Use updated "spi-gpio" binding properties
drm/etnaviv: fix dumping of active MMU context
x86/mm: Fix PAT bit missing from page protection modify mask
ARM: dts: s3c64xx: align pinctrl with dtschema
ARM: dts: samsung: s3c6410-mini6410: correct ethernet reg addresses (split)
ARM: dts: s5pv210: adjust node names to DT spec
ARM: dts: s5pv210: add dummy 5V regulator for backlight on SMDKv210
ARM: dts: samsung: s5pv210-smdkv210: correct ethernet reg addresses (split)
drm: adv7511: Fix low refresh rate register for ADV7533/5
ARM: dts: BCM53573: Fix Ethernet info for Luxul devices
arm64: dts: qcom: sdm845: Add missing RPMh power domain to GCC
arm64: dts: qcom: sdm845: Fix the min frequency of "ice_core_clk"
drm/amdgpu: Update min() to min_t() in 'amdgpu_info_ioctl'
md/bitmap: don't set max_write_behind if there is no write mostly device
md/md-bitmap: hold 'reconfig_mutex' in backlog_store()
drm/tegra: Remove superfluous error messages around platform_get_irq()
drm/tegra: dpaux: Fix incorrect return value of platform_get_irq
of: unittest: fix null pointer dereferencing in of_unittest_find_node_by_name()
drm/armada: Fix off-by-one error in armada_overlay_get_property()
drm/panel: simple: Add missing connector type and pixel format for AUO T215HVN01
ima: Remove deprecated IMA_TRUSTED_KEYRING Kconfig
drm: xlnx: zynqmp_dpsub: Add missing check for dma_set_mask
drm/msm/mdp5: Don't leak some plane state
firmware: meson_sm: fix to avoid potential NULL pointer dereference
smackfs: Prevent underflow in smk_set_cipso()
drm/amd/pm: fix variable dereferenced issue in amdgpu_device_attr_create()
drm/msm/a2xx: Call adreno_gpu_init() earlier
audit: fix possible soft lockup in __audit_inode_child()
bus: ti-sysc: Fix build warning for 64-bit build
drm/mediatek: Fix potential memory leak if vmap() fail
bus: ti-sysc: Fix cast to enum warning
of: unittest: Fix overlay type in apply/revert check
ALSA: ac97: Fix possible error value of *rac97
ipmi:ssif: Add check for kstrdup
ipmi:ssif: Fix a memory leak when scanning for an adapter
drivers: clk: keystone: Fix parameter judgment in _of_pll_clk_init()
clk: sunxi-ng: Modify mismatched function name
clk: qcom: gcc-sc7180: use ARRAY_SIZE instead of specifying num_parents
clk: qcom: gcc-sc7180: Fix up gcc_sdcc2_apps_clk_src
ext4: correct grp validation in ext4_mb_good_group
clk: qcom: gcc-sm8250: use ARRAY_SIZE instead of specifying num_parents
clk: qcom: gcc-sm8250: Fix gcc_sdcc2_apps_clk_src
clk: qcom: reset: Use the correct type of sleep/delay based on length
PCI: Mark NVIDIA T4 GPUs to avoid bus reset
pinctrl: mcp23s08: check return value of devm_kasprintf()
PCI: pciehp: Use RMW accessors for changing LNKCTL
PCI/ASPM: Use RMW accessors for changing LNKCTL
clk: imx8mp: fix sai4 clock
clk: imx: composite-8m: fix clock pauses when set_rate would be a no-op
vfio/type1: fix cap_migration information leak
powerpc/fadump: reset dump area size if fadump memory reserve fails
powerpc/perf: Convert fsl_emb notifier to state machine callbacks
drm/amdgpu: Use RMW accessors for changing LNKCTL
drm/radeon: Use RMW accessors for changing LNKCTL
net/mlx5: Use RMW accessors for changing LNKCTL
wifi: ath10k: Use RMW accessors for changing LNKCTL
powerpc: Don't include lppaca.h in paca.h
powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT
nfs/blocklayout: Use the passed in gfp flags
powerpc/iommu: Fix notifiers being shared by PCI and VIO buses
jfs: validate max amount of blocks before allocation.
fs: lockd: avoid possible wrong NULL parameter
NFSD: da_addr_body field missing in some GETDEVICEINFO replies
NFS: Guard against READDIR loop when entry names exceed MAXNAMELEN
NFSv4.2: fix handling of COPY ERR_OFFLOAD_NO_REQ
media: ad5820: Drop unsupported ad5823 from i2c_ and of_device_id tables
media: i2c: tvp5150: check return value of devm_kasprintf()
media: v4l2-core: Fix a potential resource leak in v4l2_fwnode_parse_link()
drivers: usb: smsusb: fix error handling code in smsusb_init_device
media: dib7000p: Fix potential division by zero
media: dvb-usb: m920x: Fix a potential memory leak in m920x_i2c_xfer()
media: cx24120: Add retval check for cx24120_message_send()
scsi: hisi_sas: Print SAS address for v3 hw erroneous completion print
scsi: libsas: Introduce more SAM status code aliases in enum exec_status
scsi: hisi_sas: Modify v3 HW SSP underflow error processing
scsi: hisi_sas: Modify v3 HW SATA completion error processing
scsi: hisi_sas: Fix warnings detected by sparse
scsi: hisi_sas: Fix normally completed I/O analysed as failed
media: rkvdec: increase max supported height for H.264
media: mediatek: vcodec: Return NULL if no vdec_fb is found
usb: phy: mxs: fix getting wrong state with mxs_phy_is_otg_host()
scsi: RDMA/srp: Fix residual handling
scsi: iscsi: Rename iscsi_set_param() to iscsi_if_set_param()
scsi: iscsi: Add length check for nlattr payload
scsi: iscsi: Add strlen() check in iscsi_if_set{_host}_param()
scsi: be2iscsi: Add length check when parsing nlattrs
scsi: qla4xxx: Add length check when parsing nlattrs
serial: sprd: Assign sprd_port after initialized to avoid wrong access
serial: sprd: Fix DMA buffer leak issue
x86/APM: drop the duplicate APM_MINOR_DEV macro
scsi: qedf: Do not touch __user pointer in qedf_dbg_stop_io_on_error_cmd_read() directly
scsi: qedf: Do not touch __user pointer in qedf_dbg_debug_cmd_read() directly
scsi: qedf: Do not touch __user pointer in qedf_dbg_fp_int_cmd_read() directly
coresight: tmc: Explicit type conversions to prevent integer overflow
dma-buf/sync_file: Fix docs syntax
driver core: test_async: fix an error code
IB/uverbs: Fix an potential error pointer dereference
fsi: aspeed: Reset master errors after CFAM reset
iommu/qcom: Disable and reset context bank before programming
iommu/vt-d: Fix to flush cache of PASID directory table
media: go7007: Remove redundant if statement
USB: gadget: f_mass_storage: Fix unused variable warning
media: ov5640: Enable MIPI interface in ov5640_set_power_mipi()
media: i2c: ov2680: Set V4L2_CTRL_FLAG_MODIFY_LAYOUT on flips
media: ov2680: Remove auto-gain and auto-exposure controls
media: ov2680: Fix ov2680_bayer_order()
media: ov2680: Fix vflip / hflip set functions
media: ov2680: Fix regulators being left enabled on ov2680_power_on() errors
cgroup:namespace: Remove unused cgroup_namespaces_init()
scsi: core: Use 32-bit hostnum in scsi_host_lookup()
scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock
serial: tegra: handle clk prepare error in tegra_uart_hw_init()
amba: bus: fix refcount leak
Revert "IB/isert: Fix incorrect release of isert connection"
RDMA/siw: Balance the reference of cep->kref in the error path
RDMA/siw: Correct wrong debug message
HID: logitech-dj: Fix error handling in logi_dj_recv_switch_to_dj_mode()
HID: multitouch: Correct devm device reference for hidinput input_dev name
x86/speculation: Mark all Skylake CPUs as vulnerable to GDS
tracing: Fix race issue between cpu buffer write and swap
mtd: rawnand: brcmnand: Fix mtd oobsize
phy/rockchip: inno-hdmi: use correct vco_div_5 macro on rk3328
phy/rockchip: inno-hdmi: round fractal pixclock in rk3328 recalc_rate
phy/rockchip: inno-hdmi: do not power on rk3328 post pll on reg write
rpmsg: glink: Add check for kstrdup
mtd: spi-nor: Check bus width while setting QE bit
mtd: rawnand: fsmc: handle clk prepare error in fsmc_nand_resume()
um: Fix hostaudio build errors
dmaengine: ste_dma40: Add missing IRQ check in d40_probe
cpufreq: Fix the race condition while updating the transition_task of policy
virtio_ring: fix avail_wrap_counter in virtqueue_add_packed
igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU
netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
netfilter: xt_u32: validate user space input
netfilter: xt_sctp: validate the flag_info count
skbuff: skb_segment, Call zero copy functions before using skbuff frags
igb: set max size RX buffer when store bad packet is enabled
PM / devfreq: Fix leak in devfreq_dev_release()
ALSA: pcm: Fix missing fixup call in compat hw_refine ioctl
printk: ringbuffer: Fix truncating buffer size min_t cast
scsi: core: Fix the scsi_set_resid() documentation
ipmi_si: fix a memleak in try_smi_init()
ARM: OMAP2+: Fix -Warray-bounds warning in _pwrdm_state_switch()
backlight/gpio_backlight: Compare against struct fb_info.device
backlight/bd6107: Compare against struct fb_info.device
backlight/lv5207lp: Compare against struct fb_info.device
xtensa: PMU: fix base address for the newer hardware
arm64: csum: Fix OoB access in IP checksum code for negative lengths
media: dvb: symbol fixup for dvb_attach()
Revert "scsi: qla2xxx: Fix buffer overrun"
scsi: mpt3sas: Perform additional retries if doorbell read returns 0
ntb: Drop packets when qp link is down
ntb: Clean up tx tail index on link down
ntb: Fix calculation ntb_transport_tx_free_entry()
Revert "PCI: Mark NVIDIA T4 GPUs to avoid bus reset"
procfs: block chmod on /proc/thread-self/comm
parisc: Fix /proc/cpuinfo output for lscpu
dlm: fix plock lookup when using multiple lockspaces
dccp: Fix out of bounds access in DCCP error handler
X.509: if signature is unsupported skip validation
net: handle ARPHRD_PPP in dev_is_mac_header_xmit()
fsverity: skip PKCS#7 parser when keyring is empty
pstore/ram: Check start of empty przs during init
s390/ipl: add missing secure/has_secure file to ipl type 'unknown'
crypto: stm32 - fix loop iterating through scatterlist for DMA
cpufreq: brcmstb-avs-cpufreq: Fix -Warray-bounds bug
usb: typec: bus: verify partner exists in typec_altmode_attention
USB: core: Unite old scheme and new scheme descriptor reads
USB: core: Change usb_get_device_descriptor() API
USB: core: Fix race by not overwriting udev->descriptor in hub_port_init()
USB: core: Fix oversight in SuperSpeed initialization
usb: typec: tcpci: clear the fault status bit
tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSY
md/md-bitmap: remove unnecessary local variable in backlog_store()
udf: initialize newblock to 0
net/ipv6: SKB symmetric hash should incorporate transport ports
io_uring: always lock in io_apoll_task_func
io_uring: break out of iowq iopoll on teardown
io_uring: break iopolling on signal
scsi: qla2xxx: Fix deletion race condition
scsi: qla2xxx: fix inconsistent TMF timeout
scsi: qla2xxx: Fix erroneous link up failure
scsi: qla2xxx: Turn off noisy message log
scsi: qla2xxx: Remove unsupported ql2xenabledif option
fbdev/ep93xx-fb: Do not assign to struct fb_info.dev
drm/ast: Fix DRAM init on AST2200
lib/test_meminit: allocate pages up to order MAX_ORDER
parisc: led: Fix LAN receive and transmit LEDs
parisc: led: Reduce CPU overhead for disk & lan LED computation
pinctrl: cherryview: fix address_space_handler() argument
dt-bindings: clock: xlnx,versal-clk: drop select:false
clk: imx: pll14xx: dynamically configure PLL for 393216000/361267200Hz
clk: qcom: gcc-mdm9615: use proper parent for pll0_vote clock
soc: qcom: qmi_encdec: Restrict string length in decode
NFS: Fix a potential data corruption
NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
kconfig: fix possible buffer overflow
backlight: gpio_backlight: Drop output GPIO direction check for initial power state
perf annotate bpf: Don't enclose non-debug code with an assert()
x86/virt: Drop unnecessary check on extended CPUID level in cpu_has_svm()
perf top: Don't pass an ERR_PTR() directly to perf_session__delete()
watchdog: intel-mid_wdt: add MODULE_ALIAS() to allow auto-load
pwm: lpc32xx: Remove handling of PWM channels
net/sched: fq_pie: avoid stalls in fq_pie_timer()
sctp: annotate data-races around sk->sk_wmem_queued
ipv4: annotate data-races around fi->fib_dead
net: read sk->sk_family once in sk_mc_loop()
drm/i915/gvt: Save/restore HW status to support GVT suspend/resume
drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
ipv4: ignore dst hint for multipath routes
igb: disable virtualization features on 82580
veth: Fixing transmit return status for dropped packets
net: ipv6/addrconf: avoid integer underflow in ipv6_create_tempaddr
af_unix: Fix data-races around user->unix_inflight.
af_unix: Fix data-race around unix_tot_inflight.
af_unix: Fix data-races around sk->sk_shutdown.
af_unix: Fix data race around sk->sk_err.
net: sched: sch_qfq: Fix UAF in qfq_dequeue()
kcm: Destroy mutex in kcm_exit_net()
igc: Change IGC_MIN to allow set rx/tx value between 64 and 80
igbvf: Change IGBVF_MIN to allow set rx/tx value between 64 and 80
igb: Change IGB_MIN to allow set rx/tx value between 64 and 80
s390/zcrypt: don't leak memory if dev_set_name() fails
idr: fix param name in idr_alloc_cyclic() doc
ip_tunnels: use DEV_STATS_INC()
net: dsa: sja1105: fix bandwidth discrepancy between tc-cbs software and offload
net: dsa: sja1105: fix -ENOSPC when replacing the same tc-cbs too many times
netfilter: nfnetlink_osf: avoid OOB read
net: hns3: fix the port information display when sfp is absent
sh: boards: Fix CEU buffer size passed to dma_declare_coherent_memory()
ext4: add correct group descriptors and reserved GDT blocks to system zone
ata: sata_gemini: Add missing MODULE_DESCRIPTION
ata: pata_ftide010: Add missing MODULE_DESCRIPTION
fuse: nlookup missing decrement in fuse_direntplus_link
btrfs: don't start transaction when joining with TRANS_JOIN_NOSTART
btrfs: use the correct superblock to compare fsid in btrfs_validate_super
mtd: rawnand: brcmnand: Fix crash during the panic_write
mtd: rawnand: brcmnand: Fix potential out-of-bounds access in oob write
mtd: rawnand: brcmnand: Fix potential false time out warning
drm/amd/display: prevent potential division by zero errors
perf hists browser: Fix hierarchy mode header
perf tools: Handle old data in PERF_RECORD_ATTR
perf hists browser: Fix the number of entries for 'e' key
ACPI: APEI: explicit init of HEST and GHES in apci_init()
arm64: sdei: abort running SDEI handlers during crash
scsi: qla2xxx: If fcport is undergoing deletion complete I/O with retry
scsi: qla2xxx: Consolidate zio threshold setting for both FCP & NVMe
scsi: qla2xxx: Fix crash in PCIe error handling
scsi: qla2xxx: Flush mailbox commands on chip reset
ARM: dts: samsung: exynos4210-i9100: Fix LCD screen's physical size
ARM: dts: BCM5301X: Extend RAM to full 256MB for Linksys EA6500 V2
bus: mhi: host: Skip MHI reset if device is in RDDM
net: ipv4: fix one memleak in __inet_del_ifa()
selftests/kselftest/runner/run_one(): allow running non-executable files
kselftest/runner.sh: Propagate SIGTERM to runner child
net/smc: use smc_lgr_list.lock to protect smc_lgr_list.list iterate in smcr_port_add
net: ethernet: mvpp2_main: fix possible OOB write in mvpp2_ethtool_get_rxnfc()
net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_hwlro_get_fdir_all()
hsr: Fix uninit-value access in fill_frame_info()
r8152: check budget for r8152_poll()
kcm: Fix memory leak in error path of kcm_sendmsg()
platform/mellanox: mlxbf-tmfifo: Drop the Rx packet if no more descriptors
platform/mellanox: mlxbf-tmfifo: Drop jumbo frames
net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()
ipv6: fix ip6_sock_set_addr_preferences() typo
ixgbe: fix timestamp configuration code
kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg().
drm/amd/display: Fix a bug when searching for insert_above_mpcc
parisc: Drop loops_per_jiffy from per_cpu struct
Linux 5.10.195
Change-Id: I4eef618f573b6d4201e05c9cf56088d77d712d97
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 3d07fa1dd1 upstream.
The pipe cpumask used to serialize opens between the main and percpu
trace pipes is not zeroed or initialized. This can result in
spurious -EBUSY returns if underlying memory is not fully zeroed.
This has been observed by immediate failure to read the main
trace_pipe file on an otherwise newly booted and idle system:
# cat /sys/kernel/debug/tracing/trace_pipe
cat: /sys/kernel/debug/tracing/trace_pipe: Device or resource busy
Zero the allocation of pipe_cpumask to avoid the problem.
Link: https://lore.kernel.org/linux-trace-kernel/20230831125500.986862-1-bfoster@redhat.com
Cc: stable@vger.kernel.org
Fixes: c2489bb7e6 ("tracing: Introduce pipe_cpumask to avoid race on trace_pipes")
Reviewed-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3163f635b2 ]
Warning happened in rb_end_commit() at code:
if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing)))
WARNING: CPU: 0 PID: 139 at kernel/trace/ring_buffer.c:3142
rb_commit+0x402/0x4a0
Call Trace:
ring_buffer_unlock_commit+0x42/0x250
trace_buffer_unlock_commit_regs+0x3b/0x250
trace_event_buffer_commit+0xe5/0x440
trace_event_buffer_reserve+0x11c/0x150
trace_event_raw_event_sched_switch+0x23c/0x2c0
__traceiter_sched_switch+0x59/0x80
__schedule+0x72b/0x1580
schedule+0x92/0x120
worker_thread+0xa0/0x6f0
It is because the race between writing event into cpu buffer and swapping
cpu buffer through file per_cpu/cpu0/snapshot:
Write on CPU 0 Swap buffer by per_cpu/cpu0/snapshot on CPU 1
-------- --------
tracing_snapshot_write()
[...]
ring_buffer_lock_reserve()
cpu_buffer = buffer->buffers[cpu]; // 1. Suppose find 'cpu_buffer_a';
[...]
rb_reserve_next_event()
[...]
ring_buffer_swap_cpu()
if (local_read(&cpu_buffer_a->committing))
goto out_dec;
if (local_read(&cpu_buffer_b->committing))
goto out_dec;
buffer_a->buffers[cpu] = cpu_buffer_b;
buffer_b->buffers[cpu] = cpu_buffer_a;
// 2. cpu_buffer has swapped here.
rb_start_commit(cpu_buffer);
if (unlikely(READ_ONCE(cpu_buffer->buffer)
!= buffer)) { // 3. This check passed due to 'cpu_buffer->buffer'
[...] // has not changed here.
return NULL;
}
cpu_buffer_b->buffer = buffer_a;
cpu_buffer_a->buffer = buffer_b;
[...]
// 4. Reserve event from 'cpu_buffer_a'.
ring_buffer_unlock_commit()
[...]
cpu_buffer = buffer->buffers[cpu]; // 5. Now find 'cpu_buffer_b' !!!
rb_commit(cpu_buffer)
rb_end_commit() // 6. WARN for the wrong 'committing' state !!!
Based on above analysis, we can easily reproduce by following testcase:
``` bash
#!/bin/bash
dmesg -n 7
sysctl -w kernel.panic_on_warn=1
TR=/sys/kernel/tracing
echo 7 > ${TR}/buffer_size_kb
echo "sched:sched_switch" > ${TR}/set_event
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
```
To fix it, IIUC, we can use smp_call_function_single() to do the swap on
the target cpu where the buffer is located, so that above race would be
avoided.
Link: https://lore.kernel.org/linux-trace-kernel/20230831132739.4070878-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Fixes: f1affcaaa8 ("tracing: Add snapshot in the per_cpu trace directories")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 82b90b6c5b ]
cgroup_namspace_init() just return 0. Therefore, there is no need to
call it during start_kernel. Just remove it.
Fixes: a79a908fd2 ("cgroup: introduce cgroup namespaces")
Signed-off-by: Lu Jialin <lujialin4@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b59bc6e372 ]
Tracefs or debugfs maybe cause hundreds to thousands of PATH records,
too many PATH records maybe cause soft lockup.
For example:
1. CONFIG_KASAN=y && CONFIG_PREEMPTION=n
2. auditctl -a exit,always -S open -k key
3. sysctl -w kernel.watchdog_thresh=5
4. mkdir /sys/kernel/debug/tracing/instances/test
There may be a soft lockup as follows:
watchdog: BUG: soft lockup - CPU#45 stuck for 7s! [mkdir:15498]
Kernel panic - not syncing: softlockup: hung tasks
Call trace:
dump_backtrace+0x0/0x30c
show_stack+0x20/0x30
dump_stack+0x11c/0x174
panic+0x27c/0x494
watchdog_timer_fn+0x2bc/0x390
__run_hrtimer+0x148/0x4fc
__hrtimer_run_queues+0x154/0x210
hrtimer_interrupt+0x2c4/0x760
arch_timer_handler_phys+0x48/0x60
handle_percpu_devid_irq+0xe0/0x340
__handle_domain_irq+0xbc/0x130
gic_handle_irq+0x78/0x460
el1_irq+0xb8/0x140
__audit_inode_child+0x240/0x7bc
tracefs_create_file+0x1b8/0x2a0
trace_create_file+0x18/0x50
event_create_dir+0x204/0x30c
__trace_add_new_event+0xac/0x100
event_trace_add_tracer+0xa0/0x130
trace_array_create_dir+0x60/0x140
trace_array_create+0x1e0/0x370
instance_mkdir+0x90/0xd0
tracefs_syscall_mkdir+0x68/0xa0
vfs_mkdir+0x21c/0x34c
do_mkdirat+0x1b4/0x1d4
__arm64_sys_mkdirat+0x4c/0x60
el0_svc_common.constprop.0+0xa8/0x240
do_el0_svc+0x8c/0xc0
el0_svc+0x20/0x30
el0_sync_handler+0xb0/0xb4
el0_sync+0x160/0x180
Therefore, we add cond_resched() to __audit_inode_child() to fix it.
Fixes: 5195d8e217 ("audit: dynamically allocate audit_names when not enough space is in the names array")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f5063e8948 ]
Running the refscale test occasionally crashes the kernel with the
following error:
[ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
[ 8569.952900] #PF: supervisor read access in kernel mode
[ 8569.952902] #PF: error_code(0x0000) - not-present page
[ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
[ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
[ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021
[ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
:
[ 8569.952940] Call Trace:
[ 8569.952941] <TASK>
[ 8569.952944] ref_scale_reader+0x380/0x4a0 [refscale]
[ 8569.952959] kthread+0x10e/0x130
[ 8569.952966] ret_from_fork+0x1f/0x30
[ 8569.952973] </TASK>
The likely cause is that init_waitqueue_head() is called after the call to
the torture_create_kthread() function that creates the ref_scale_reader
kthread. Although this init_waitqueue_head() call will very likely
complete before this kthread is created and starts running, it is
possible that the calling kthread will be delayed between the calls to
torture_create_kthread() and init_waitqueue_head(). In this case, the
new kthread will use the waitqueue head before it is properly initialized,
which is not good for the kernel's health and well-being.
The above crash happened here:
static inline void __add_wait_queue(...)
{
:
if (!(wq->flags & WQ_FLAG_PRIORITY)) <=== Crash here
The offset of flags from list_head entry in wait_queue_entry is
-0x18. If reader_tasks[i].wq.head.next is NULL as allocated reader_task
structure is zero initialized, the instruction will try to access address
0xffffffffffffffe8, which is exactly the fault address listed above.
This commit therefore invokes init_waitqueue_head() before creating
the kthread.
Fixes: 653ed64b01 ("refperf: Add a test to measure performance of read-side synchronization")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c2489bb7e6 ]
There is race issue when concurrently splice_read main trace_pipe and
per_cpu trace_pipes which will result in data read out being different
from what actually writen.
As suggested by Steven:
> I believe we should add a ref count to trace_pipe and the per_cpu
> trace_pipes, where if they are opened, nothing else can read it.
>
> Opening trace_pipe locks all per_cpu ref counts, if any of them are
> open, then the trace_pipe open will fail (and releases any ref counts
> it had taken).
>
> Opening a per_cpu trace_pipe will up the ref count for just that
> CPU buffer. This will allow multiple tasks to read different per_cpu
> trace_pipe files, but will prevent the main trace_pipe file from
> being opened.
But because we only need to know whether per_cpu trace_pipe is open or
not, using a cpumask instead of using ref count may be easier.
After this patch, users will find that:
- Main trace_pipe can be opened by only one user, and if it is
opened, all per_cpu trace_pipes cannot be opened;
- Per_cpu trace_pipes can be opened by multiple users, but each per_cpu
trace_pipe can only be opened by one user. And if one of them is
opened, main trace_pipe cannot be opened.
Link: https://lore.kernel.org/linux-trace-kernel/20230818022645.1948314-1-zhengyejian1@huawei.com
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 9011e49d54 upstream.
It has recently come to my attention that nvidia is circumventing the
protection added in 262e6ae708 ("modules: inherit
TAINT_PROPRIETARY_MODULE") by importing exports from their proprietary
modules into an allegedly GPL licensed module and then rexporting them.
Given that symbol_get was only ever intended for tightly cooperating
modules using very internal symbols it is logical to restrict it to
being used on EXPORT_SYMBOL_GPL and prevent nvidia from costly DMCA
Circumvention of Access Controls law suites.
All symbols except for four used through symbol_get were already exported
as EXPORT_SYMBOL_GPL, and the remaining four ones were switched over in
the preparation patches.
Fixes: 262e6ae708 ("modules: inherit TAINT_PROPRIETARY_MODULE")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 18f08e758f upstream.
Currently, trc_inspect_reader() treats a task exiting its RCU Tasks
Trace read-side critical section the same as being within that critical
section. However, this can fail because that task might have already
checked its .need_qs field, which means that it might never decrement
the all-important trc_n_readers_need_end counter. Of course, for that
to happen, the task would need to never again execute an RCU Tasks Trace
read-side critical section, but this really could happen if the system's
last trampoline was removed. Note that exit from such a critical section
cannot be treated as a quiescent state due to the possibility of nested
critical sections. This means that if trc_inspect_reader() sees a
negative nesting value, it must set up to try again later.
This commit therefore ignores tasks that are exiting their RCU Tasks
Trace read-side critical sections so that they will be rechecked later.
[ paulmck: Apply feedback from Neeraj Upadhyay and Boqun Feng. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cbe0d8d914 upstream.
Currently, RCU Tasks Trace initializes the trc_n_readers_need_end counter
to the value one, increments it before each trc_read_check_handler()
IPI, then decrements it within trc_read_check_handler() if the target
task was in a quiescent state (or if the target task moved to some other
CPU while the IPI was in flight), complaining if the new value was zero.
The rationale for complaining is that the initial value of one must be
decremented away before zero can be reached, and this decrement has not
yet happened.
Except that trc_read_check_handler() is initiated with an asynchronous
smp_call_function_single(), which might be significantly delayed. This
can result in false-positive complaints about the counter reaching zero.
This commit therefore waits for in-flight IPI handlers to complete before
decrementing away the initial value of one from the trc_n_readers_need_end
counter.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 46aa886c48 upstream.
The trc_wait_for_one_reader() function is called at multiple stages
of trace rcu-tasks GP function, rcu_tasks_wait_gp():
- First, it is called as part of per task function -
rcu_tasks_trace_pertask(), for all non-idle tasks. As part of per task
processing, this function add the task in the holdout list and if the
task is currently running on a CPU, it sends IPI to the task's CPU.
The IPI handler takes action depending on whether task is in trace
rcu-tasks read side critical section or not:
- a. If the task is in trace rcu-tasks read side critical section
(t->trc_reader_nesting != 0), the IPI handler sets the task's
->trc_reader_special.b.need_qs, so that this task notifies exit
from its outermost read side critical section (by decrementing
trc_n_readers_need_end) to the GP handling function.
trc_wait_for_one_reader() also increments trc_n_readers_need_end,
so that the trace rcu-tasks GP handler function waits for this
task's read side exit notification. The IPI handler also sets
t->trc_reader_checked to true, and no further IPIs are sent for
this task, for this trace rcu-tasks grace period and this
task can be removed from holdout list.
- b. If the task is in the process of exiting its trace rcu-tasks
read side critical section, (t->trc_reader_nesting < 0), defer
this task's processing to future calls to trc_wait_for_one_reader().
- c. If task is not in rcu-task read side critical section,
t->trc_reader_nesting == 0, ->trc_reader_checked is set for this
task, so that this task is removed from holdout list.
- Second, trc_wait_for_one_reader() is called as part of post scan, in
function rcu_tasks_trace_postscan(), for all idle tasks.
- Third, in function check_all_holdout_tasks_trace(), this function is
called for each task in the holdout list, but only if there isn't
a pending IPI for the task (->trc_ipi_to_cpu == -1). This function
removed the task from holdout list, if IPI handler has completed the
required work, to ensure that the current trace rcu-tasks grace period
either waits for this task, or this task is not in a trace rcu-tasks
read side critical section.
Now, considering the scenario where smp_call_function_single() fails in
first case, inside rcu_tasks_trace_pertask(). In this case,
->trc_ipi_to_cpu is set to the current CPU for that task. This will
result in trc_wait_for_one_reader() getting skipped in third case,
inside check_all_holdout_tasks_trace(), for this task. This further
results in ->trc_reader_checked never getting set for this task,
and the task not getting removed from holdout list. This can cause
the current trace rcu-tasks grace period to stall.
Fix the above problem, by resetting ->trc_ipi_to_cpu to -1, on
smp_call_function_single() failure, so that future IPI calls can
be send for this task.
Note that all three of the trc_wait_for_one_reader() function's
callers (rcu_tasks_trace_pertask(), rcu_tasks_trace_postscan(),
check_all_holdout_tasks_trace()) hold cpu_read_lock(). This means
that smp_call_function_single() cannot race with CPU hotplug, and thus
should never fail. Therefore, also add a warning in order to report
any such failure in case smp_call_function_single() grows some other
reason for failure.
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 147f04b14a upstream.
If an RCU expedited grace period starts just when a CPU is in the process
of going offline, so that the outgoing CPU has completed its pass through
stop-machine but has not yet completed its final dive into the idle loop,
RCU will attempt to enable that CPU's scheduling-clock tick via a call
to tick_dep_set_cpu(). For this to happen, that CPU has to have been
online when the expedited grace period completed its CPU-selection phase.
This is pointless: The outgoing CPU has interrupts disabled, so it cannot
take a scheduling-clock tick anyway. In addition, the tick_dep_set_cpu()
function's eventual call to irq_work_queue_on() will splat as follows:
smpboot: CPU 1 is now offline
WARNING: CPU: 6 PID: 124 at kernel/irq_work.c:95
+irq_work_queue_on+0x57/0x60
Modules linked in:
CPU: 6 PID: 124 Comm: kworker/6:2 Not tainted 5.15.0-rc1+ #3
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
+rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
Workqueue: rcu_gp wait_rcu_exp_gp
RIP: 0010:irq_work_queue_on+0x57/0x60
Code: 8b 05 1d c7 ea 62 a9 00 00 f0 00 75 21 4c 89 ce 44 89 c7 e8
+9b 37 fa ff ba 01 00 00 00 89 d0 c3 4c 89 cf e8 3b ff ff ff eb ee <0f> 0b eb b7
+0f 0b eb db 90 48 c7 c0 98 2a 02 00 65 48 03 05 91
6f
RSP: 0000:ffffb12cc038fe48 EFLAGS: 00010282
RAX: 0000000000000001 RBX: 0000000000005208 RCX: 0000000000000020
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9ad01f45a680
RBP: 000000000004c990 R08: 0000000000000001 R09: ffff9ad01f45a680
R10: ffffb12cc0317db0 R11: 0000000000000001 R12: 00000000fffecee8
R13: 0000000000000001 R14: 0000000000026980 R15: ffffffff9e53ae00
FS: 0000000000000000(0000) GS:ffff9ad01f580000(0000)
+knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000000de0c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tick_nohz_dep_set_cpu+0x59/0x70
rcu_exp_wait_wake+0x54e/0x870
? sync_rcu_exp_select_cpus+0x1fc/0x390
process_one_work+0x1ef/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x28/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0x115/0x140
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
---[ end trace c5bf75eb6aa80bc6 ]---
This commit therefore avoids invoking tick_dep_set_cpu() on offlined
CPUs to limit both futility and false-positive splats.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2abcc4b5a6 upstream.
module_init_layout_section() choses whether the core module loader
considers a section as init or not. This affects the placement of the
exit section when module unloading is disabled. This code will never run,
so it can be free()d once the module has been initialised.
arm and arm64 need to count the number of PLTs they need before applying
relocations based on the section name. The init PLTs are stored separately
so they can be free()d. arm and arm64 both use within_module_init() to
decide which list of PLTs to use when applying the relocation.
Because within_module_init()'s behaviour changes when module unloading
is disabled, both architecture would need to take this into account when
counting the PLTs.
Today neither architecture does this, meaning when module unloading is
disabled there are insufficient PLTs in the init section to load some
modules, resulting in warnings:
| WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc
| Modules linked in: crct10dif_common
| CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : module_emit_plt_entry+0x184/0x1cc
| lr : module_emit_plt_entry+0x94/0x1cc
| sp : ffffffc0803bba60
[...]
| Call trace:
| module_emit_plt_entry+0x184/0x1cc
| apply_relocate_add+0x2bc/0x8e4
| load_module+0xe34/0x1bd4
| init_module_from_file+0x84/0xc0
| __arm64_sys_finit_module+0x1b8/0x27c
| invoke_syscall.constprop.0+0x5c/0x104
| do_el0_svc+0x58/0x160
| el0_svc+0x38/0x110
| el0t_64_sync_handler+0xc0/0xc4
| el0t_64_sync+0x190/0x194
Instead of duplicating module_init_layout_section()s logic, expose it.
Reported-by: Adam Johnston <adam.johnston@arm.com>
Fixes: 055f23b74b ("module: check for exit sections in layout_sections() instead of module_init_section()")
Cc: stable@vger.kernel.org
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.10.193
objtool/x86: Fix SRSO mess
NFSv4: fix out path in __nfs4_get_acl_uncached
xprtrdma: Remap Receive buffers after a reconnect
PCI: acpiphp: Reassign resources on bridge if necessary
dlm: improve plock logging if interrupted
dlm: replace usage of found with dedicated list iterator variable
fs: dlm: add pid to debug log
fs: dlm: change plock interrupted message to debug again
fs: dlm: use dlm_plock_info for do_unlock_close
fs: dlm: fix mismatch of plock results from userspace
MIPS: cpu-features: Enable octeon_cache by cpu_type
MIPS: cpu-features: Use boot_cpu_type for CPU type based features
fbdev: Improve performance of sys_imageblit()
fbdev: Fix sys_imageblit() for arbitrary image widths
fbdev: fix potential OOB read in fast_imageblit()
dm integrity: increase RECALC_SECTORS to improve recalculate speed
dm integrity: reduce vmalloc space footprint on 32-bit architectures
ALSA: pcm: Fix potential data race at PCM memory allocation helpers
drm/amd/display: do not wait for mpc idle if tg is disabled
drm/amd/display: check TG is non-null before checking if enabled
libceph, rbd: ignore addr->type while comparing in some cases
rbd: make get_lock_owner_info() return a single locker or NULL
rbd: retrieve and check lock owner twice before blocklisting
rbd: prevent busy loop when requesting exclusive lock
tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
tracing: Fix memleak due to race between current_tracer and trace
octeontx2-af: SDP: fix receive link config
sock: annotate data-races around prot->memory_pressure
dccp: annotate data-races in dccp_poll()
ipvlan: Fix a reference count leak warning in ipvlan_ns_exit()
net: bgmac: Fix return value check for fixed_phy_register()
net: bcmgenet: Fix return value check for fixed_phy_register()
net: validate veth and vxcan peer ifindexes
ice: fix receive buffer size miscalculation
igb: Avoid starting unnecessary workqueues
net/sched: fix a qdisc modification with ambiguous command request
netfilter: nf_tables: fix out of memory error handling
rtnetlink: return ENODEV when ifname does not exist and group is given
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
net: remove bond_slave_has_mac_rcu()
bonding: fix macvlan over alb bond support
ibmveth: Use dcbf rather than dcbfl
NFSv4: Fix dropped lock for racing OPEN and delegation return
clk: Fix slab-out-of-bounds error in devm_clk_release()
mm: add a call to flush_cache_vmap() in vmap_pfn()
NFS: Fix a use after free in nfs_direct_join_group()
nfsd: Fix race to FREE_STATEID and cl_revoked
selinux: set next pointer before attaching to list
batman-adv: Trigger events for auto adjusted MTU
batman-adv: Don't increase MTU when set by user
batman-adv: Do not get eth header before batadv_check_management_packet
batman-adv: Fix TT global entry leak when client roamed back
batman-adv: Fix batadv_v_ogm_aggr_send memory leak
batman-adv: Hold rtnl lock during MTU update via netlink
lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
radix tree: remove unused variable
of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock
media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus
drm/vmwgfx: Fix shader stage validation
drm/display/dp: Fix the DP DSC Receiver cap size
x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
torture: Fix hang during kthread shutdown phase
tick: Detect and fix jiffies update stall
timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped
cgroup/cpuset: Rename functions dealing with DEADLINE accounting
sched/cpuset: Bring back cpuset_mutex
sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
cgroup/cpuset: Iterate only if DEADLINE tasks are present
sched/deadline: Create DL BW alloc, free & check overflow interface
cgroup/cpuset: Free DL BW in case can_attach() fails
drm/i915: Fix premature release of request's reusable memory
ASoC: rt711: add two jack detection modes
scsi: snic: Fix double free in snic_tgt_create()
scsi: core: raid_class: Remove raid_component_add()
clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'
pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function}
dma-buf/sw_sync: Avoid recursive lock during fence signal
mm,hwpoison: refactor get_any_page
mm: fix page reference leak in soft_offline_page()
mm: memory-failure: kill soft_offline_free_page()
mm: memory-failure: fix unexpected return value in soft_offline_page()
ASoC: Intel: sof_sdw: include rt711.h for RT711 JD mode
mm,hwpoison: fix printing of page flags
Linux 5.10.193
Change-Id: I7c6ce55cbc73cef27a5cbe8954131a052b67dac2
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 2ef269ef1a upstream.
cpuset_can_attach() can fail. Postpone DL BW allocation until all tasks
have been checked. DL BW is not allocated per-task but as a sum over
all DL tasks migrating.
If multiple controllers are attached to the cgroup next to the cpuset
controller a non-cpuset can_attach() can fail. In this case free DL BW
in cpuset_cancel_attach().
Finally, update cpuset DL task count (nr_deadline_tasks) only in
cpuset_attach().
Suggested-by: Waiman Long <longman@redhat.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[ Fix conflicts in kernel/cgroup/cpuset.c due to new code being applied
that is not applicable on this branch. Reject new code. ]
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 85989106fe upstream.
While moving a set of tasks between exclusive cpusets,
cpuset_can_attach() -> task_can_attach() calls dl_cpu_busy(..., p) for
DL BW overflow checking and per-task DL BW allocation on the destination
root_domain for the DL tasks in this set.
This approach has the issue of not freeing already allocated DL BW in
the following error cases:
(1) The set of tasks includes multiple DL tasks and DL BW overflow
checking fails for one of the subsequent DL tasks.
(2) Another controller next to the cpuset controller which is attached
to the same cgroup fails in its can_attach().
To address this problem rework dl_cpu_busy():
(1) Split it into dl_bw_check_overflow() & dl_bw_alloc() and add a
dedicated dl_bw_free().
(2) dl_bw_alloc() & dl_bw_free() take a `u64 dl_bw` parameter instead of
a `struct task_struct *p` used in dl_cpu_busy(). This allows to
allocate DL BW for a set of tasks too rather than only for a single
task.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c24849f55 upstream.
Qais reported that iterating over all tasks when rebuilding root domains
for finding out which ones are DEADLINE and need their bandwidth
correctly restored on such root domains can be a costly operation (10+
ms delays on suspend-resume).
To fix the problem keep track of the number of DEADLINE tasks belonging
to each cpuset and then use this information (followup patch) to only
perform the above iteration if DEADLINE tasks are actually present in
the cpuset for which a corresponding root domain is being rebuilt.
Reported-by: Qais Yousef (Google) <qyousef@layalina.io>
Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[ Fix conflicts in kernel/cgroup/cpuset.c and kernel/sched/deadline.c
due to pulling new fields and functions. Remove new code and match the
patch diff. ]
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 111cd11bbc upstream.
Turns out percpu_cpuset_rwsem - commit 1243dc518c ("cgroup/cpuset:
Convert cpuset_mutex to percpu_rwsem") - wasn't such a brilliant idea,
as it has been reported to cause slowdowns in workloads that need to
change cpuset configuration frequently and it is also not implementing
priority inheritance (which causes troubles with realtime workloads).
Convert percpu_cpuset_rwsem back to regular cpuset_mutex. Also grab it
only for SCHED_DEADLINE tasks (other policies don't care about stable
cpusets anyway).
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[ Fix conflict in kernel/cgroup/cpuset.c due to pulling new functions or
comment that don't exist on 5.10 or the usage of different cpu hotplug
lock whenever replacing the rwsem with mutex. Remove BUG_ON() for
rwsem that doesn't exist on mainline. ]
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>