Changes in 5.15.130
ACPI: thermal: Drop nocrt parameter
module: Expose module_init_layout_section()
arm64: module-plts: inline linux/moduleloader.h
arm64: module: Use module_init_layout_section() to spot init sections
ARM: module: Use module_init_layout_section() to spot init sections
rcu: Prevent expedited GP from enabling tick on offline CPU
rcu-tasks: Fix IPI failure handling in trc_wait_for_one_reader
rcu-tasks: Wait for trc_read_check_handler() IPIs
rcu-tasks: Add trc_inspect_reader() checks for exiting critical section
Linux 5.15.130
Change-Id: Ibf4ab4fbe70e9099f930096f34094fe8d1ec23bb
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
In commit 49b830d75f ("ring-buffer: Do not swap cpu_buffer during
resize process"), the internal structure of "struct ring_buffer" had an
additional field added.
The ABI tooling has had this structure "leak" to its list of symbols to
verify, but all apis only use a pointer, not the real structure, which
is controlled by the tracing core. So this change is safe to make, so
update the abi defintion with the new signature:
type 'struct trace_buffer' changed
member 'atomic_t resizing' was added
Bug: 161946584
Fixes: 49b830d75f ("ring-buffer: Do not swap cpu_buffer during resize process")
Change-Id: I633c3878d27dae9acd0576659163ca4362a713c9
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 18f08e758f upstream.
Currently, trc_inspect_reader() treats a task exiting its RCU Tasks
Trace read-side critical section the same as being within that critical
section. However, this can fail because that task might have already
checked its .need_qs field, which means that it might never decrement
the all-important trc_n_readers_need_end counter. Of course, for that
to happen, the task would need to never again execute an RCU Tasks Trace
read-side critical section, but this really could happen if the system's
last trampoline was removed. Note that exit from such a critical section
cannot be treated as a quiescent state due to the possibility of nested
critical sections. This means that if trc_inspect_reader() sees a
negative nesting value, it must set up to try again later.
This commit therefore ignores tasks that are exiting their RCU Tasks
Trace read-side critical sections so that they will be rechecked later.
[ paulmck: Apply feedback from Neeraj Upadhyay and Boqun Feng. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cbe0d8d914 upstream.
Currently, RCU Tasks Trace initializes the trc_n_readers_need_end counter
to the value one, increments it before each trc_read_check_handler()
IPI, then decrements it within trc_read_check_handler() if the target
task was in a quiescent state (or if the target task moved to some other
CPU while the IPI was in flight), complaining if the new value was zero.
The rationale for complaining is that the initial value of one must be
decremented away before zero can be reached, and this decrement has not
yet happened.
Except that trc_read_check_handler() is initiated with an asynchronous
smp_call_function_single(), which might be significantly delayed. This
can result in false-positive complaints about the counter reaching zero.
This commit therefore waits for in-flight IPI handlers to complete before
decrementing away the initial value of one from the trc_n_readers_need_end
counter.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 46aa886c48 upstream.
The trc_wait_for_one_reader() function is called at multiple stages
of trace rcu-tasks GP function, rcu_tasks_wait_gp():
- First, it is called as part of per task function -
rcu_tasks_trace_pertask(), for all non-idle tasks. As part of per task
processing, this function add the task in the holdout list and if the
task is currently running on a CPU, it sends IPI to the task's CPU.
The IPI handler takes action depending on whether task is in trace
rcu-tasks read side critical section or not:
- a. If the task is in trace rcu-tasks read side critical section
(t->trc_reader_nesting != 0), the IPI handler sets the task's
->trc_reader_special.b.need_qs, so that this task notifies exit
from its outermost read side critical section (by decrementing
trc_n_readers_need_end) to the GP handling function.
trc_wait_for_one_reader() also increments trc_n_readers_need_end,
so that the trace rcu-tasks GP handler function waits for this
task's read side exit notification. The IPI handler also sets
t->trc_reader_checked to true, and no further IPIs are sent for
this task, for this trace rcu-tasks grace period and this
task can be removed from holdout list.
- b. If the task is in the process of exiting its trace rcu-tasks
read side critical section, (t->trc_reader_nesting < 0), defer
this task's processing to future calls to trc_wait_for_one_reader().
- c. If task is not in rcu-task read side critical section,
t->trc_reader_nesting == 0, ->trc_reader_checked is set for this
task, so that this task is removed from holdout list.
- Second, trc_wait_for_one_reader() is called as part of post scan, in
function rcu_tasks_trace_postscan(), for all idle tasks.
- Third, in function check_all_holdout_tasks_trace(), this function is
called for each task in the holdout list, but only if there isn't
a pending IPI for the task (->trc_ipi_to_cpu == -1). This function
removed the task from holdout list, if IPI handler has completed the
required work, to ensure that the current trace rcu-tasks grace period
either waits for this task, or this task is not in a trace rcu-tasks
read side critical section.
Now, considering the scenario where smp_call_function_single() fails in
first case, inside rcu_tasks_trace_pertask(). In this case,
->trc_ipi_to_cpu is set to the current CPU for that task. This will
result in trc_wait_for_one_reader() getting skipped in third case,
inside check_all_holdout_tasks_trace(), for this task. This further
results in ->trc_reader_checked never getting set for this task,
and the task not getting removed from holdout list. This can cause
the current trace rcu-tasks grace period to stall.
Fix the above problem, by resetting ->trc_ipi_to_cpu to -1, on
smp_call_function_single() failure, so that future IPI calls can
be send for this task.
Note that all three of the trc_wait_for_one_reader() function's
callers (rcu_tasks_trace_pertask(), rcu_tasks_trace_postscan(),
check_all_holdout_tasks_trace()) hold cpu_read_lock(). This means
that smp_call_function_single() cannot race with CPU hotplug, and thus
should never fail. Therefore, also add a warning in order to report
any such failure in case smp_call_function_single() grows some other
reason for failure.
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 147f04b14a upstream.
If an RCU expedited grace period starts just when a CPU is in the process
of going offline, so that the outgoing CPU has completed its pass through
stop-machine but has not yet completed its final dive into the idle loop,
RCU will attempt to enable that CPU's scheduling-clock tick via a call
to tick_dep_set_cpu(). For this to happen, that CPU has to have been
online when the expedited grace period completed its CPU-selection phase.
This is pointless: The outgoing CPU has interrupts disabled, so it cannot
take a scheduling-clock tick anyway. In addition, the tick_dep_set_cpu()
function's eventual call to irq_work_queue_on() will splat as follows:
smpboot: CPU 1 is now offline
WARNING: CPU: 6 PID: 124 at kernel/irq_work.c:95
+irq_work_queue_on+0x57/0x60
Modules linked in:
CPU: 6 PID: 124 Comm: kworker/6:2 Not tainted 5.15.0-rc1+ #3
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
+rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
Workqueue: rcu_gp wait_rcu_exp_gp
RIP: 0010:irq_work_queue_on+0x57/0x60
Code: 8b 05 1d c7 ea 62 a9 00 00 f0 00 75 21 4c 89 ce 44 89 c7 e8
+9b 37 fa ff ba 01 00 00 00 89 d0 c3 4c 89 cf e8 3b ff ff ff eb ee <0f> 0b eb b7
+0f 0b eb db 90 48 c7 c0 98 2a 02 00 65 48 03 05 91
6f
RSP: 0000:ffffb12cc038fe48 EFLAGS: 00010282
RAX: 0000000000000001 RBX: 0000000000005208 RCX: 0000000000000020
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9ad01f45a680
RBP: 000000000004c990 R08: 0000000000000001 R09: ffff9ad01f45a680
R10: ffffb12cc0317db0 R11: 0000000000000001 R12: 00000000fffecee8
R13: 0000000000000001 R14: 0000000000026980 R15: ffffffff9e53ae00
FS: 0000000000000000(0000) GS:ffff9ad01f580000(0000)
+knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000000de0c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tick_nohz_dep_set_cpu+0x59/0x70
rcu_exp_wait_wake+0x54e/0x870
? sync_rcu_exp_select_cpus+0x1fc/0x390
process_one_work+0x1ef/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x28/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0x115/0x140
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
---[ end trace c5bf75eb6aa80bc6 ]---
This commit therefore avoids invoking tick_dep_set_cpu() on offlined
CPUs to limit both futility and false-positive splats.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6846234f4 upstream.
Today module_frob_arch_sections() spots init sections from their
'init' prefix, and uses this to keep the init PLTs separate from the rest.
get_module_plt() uses within_module_init() to determine if a
location is in the init text or not, but this depends on whether
core code thought this was an init section.
Naturally the logic is different.
module_init_layout_section() groups the init and exit text together if
module unloading is disabled, as the exit code will never run. The result
is kernels with this configuration can't load all their modules because
there are not enough PLTs for the combined init+exit section.
A previous patch exposed module_init_layout_section(), use that so the
logic is the same.
Fixes: 055f23b74b ("module: check for exit sections in layout_sections() instead of module_init_section()")
Cc: stable@vger.kernel.org
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f928f8b1a2 upstream.
Today module_frob_arch_sections() spots init sections from their
'init' prefix, and uses this to keep the init PLTs separate from the rest.
module_emit_plt_entry() uses within_module_init() to determine if a
location is in the init text or not, but this depends on whether
core code thought this was an init section.
Naturally the logic is different.
module_init_layout_section() groups the init and exit text together if
module unloading is disabled, as the exit code will never run. The result
is kernels with this configuration can't load all their modules because
there are not enough PLTs for the combined init+exit section.
This results in the following:
| WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc
| Modules linked in: crct10dif_common
| CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : module_emit_plt_entry+0x184/0x1cc
| lr : module_emit_plt_entry+0x94/0x1cc
| sp : ffffffc0803bba60
[...]
| Call trace:
| module_emit_plt_entry+0x184/0x1cc
| apply_relocate_add+0x2bc/0x8e4
| load_module+0xe34/0x1bd4
| init_module_from_file+0x84/0xc0
| __arm64_sys_finit_module+0x1b8/0x27c
| invoke_syscall.constprop.0+0x5c/0x104
| do_el0_svc+0x58/0x160
| el0_svc+0x38/0x110
| el0t_64_sync_handler+0xc0/0xc4
| el0t_64_sync+0x190/0x194
A previous patch exposed module_init_layout_section(), use that so the
logic is the same.
Reported-by: Adam Johnston <adam.johnston@arm.com>
Tested-by: Adam Johnston <adam.johnston@arm.com>
Fixes: 055f23b74b ("module: check for exit sections in layout_sections() instead of module_init_section()")
Cc: <stable@vger.kernel.org> # 5.15.x: 60a0aab746 arm64: module-plts: inline linux/moduleloader.h
Cc: <stable@vger.kernel.org> # 5.15.x
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2abcc4b5a6 upstream.
module_init_layout_section() choses whether the core module loader
considers a section as init or not. This affects the placement of the
exit section when module unloading is disabled. This code will never run,
so it can be free()d once the module has been initialised.
arm and arm64 need to count the number of PLTs they need before applying
relocations based on the section name. The init PLTs are stored separately
so they can be free()d. arm and arm64 both use within_module_init() to
decide which list of PLTs to use when applying the relocation.
Because within_module_init()'s behaviour changes when module unloading
is disabled, both architecture would need to take this into account when
counting the PLTs.
Today neither architecture does this, meaning when module unloading is
disabled there are insufficient PLTs in the init section to load some
modules, resulting in warnings:
| WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc
| Modules linked in: crct10dif_common
| CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : module_emit_plt_entry+0x184/0x1cc
| lr : module_emit_plt_entry+0x94/0x1cc
| sp : ffffffc0803bba60
[...]
| Call trace:
| module_emit_plt_entry+0x184/0x1cc
| apply_relocate_add+0x2bc/0x8e4
| load_module+0xe34/0x1bd4
| init_module_from_file+0x84/0xc0
| __arm64_sys_finit_module+0x1b8/0x27c
| invoke_syscall.constprop.0+0x5c/0x104
| do_el0_svc+0x58/0x160
| el0_svc+0x38/0x110
| el0t_64_sync_handler+0xc0/0xc4
| el0t_64_sync+0x190/0x194
Instead of duplicating module_init_layout_section()s logic, expose it.
Reported-by: Adam Johnston <adam.johnston@arm.com>
Fixes: 055f23b74b ("module: check for exit sections in layout_sections() instead of module_init_section()")
Cc: stable@vger.kernel.org
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f641174a1 upstream.
The `nocrt` module parameter has no code associated with it and does
nothing. As `crt=-1` has same functionality as what nocrt should be
doing drop `nocrt` and associated documentation.
This should fix a quirk for Gigabyte GA-7ZX that used `nocrt` and
thus didn't function properly.
Fixes: 8c99fdce30 ("ACPI: thermal: set "thermal.nocrt" via DMI on Gigabyte GA-7ZX")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit b190cf1f27 which is
commit 5ad1ab30ac upstream.
It causes a CRC abi break so revert it for now. If this change is
really needed, it can be brought back in an abi-safe way in the future.
Bug: 161946584
Change-Id: I385d80527d8c3eb121f005b244da1ffac8fb7074
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Changes in 5.15.129
objtool/x86: Fix SRSO mess
NFSv4.2: fix error handling in nfs42_proc_getxattr
NFSv4: fix out path in __nfs4_get_acl_uncached
xprtrdma: Remap Receive buffers after a reconnect
PCI: acpiphp: Reassign resources on bridge if necessary
dlm: improve plock logging if interrupted
dlm: replace usage of found with dedicated list iterator variable
fs: dlm: add pid to debug log
fs: dlm: change plock interrupted message to debug again
fs: dlm: use dlm_plock_info for do_unlock_close
fs: dlm: fix mismatch of plock results from userspace
MIPS: cpu-features: Enable octeon_cache by cpu_type
MIPS: cpu-features: Use boot_cpu_type for CPU type based features
fbdev: Improve performance of sys_imageblit()
fbdev: Fix sys_imageblit() for arbitrary image widths
fbdev: fix potential OOB read in fast_imageblit()
ALSA: pcm: Fix potential data race at PCM memory allocation helpers
jbd2: remove t_checkpoint_io_list
jbd2: remove journal_clean_one_cp_list()
jbd2: fix a race when checking checkpoint buffer busy
can: raw: fix receiver memory leak
drm/amd/display: do not wait for mpc idle if tg is disabled
drm/amd/display: check TG is non-null before checking if enabled
can: raw: fix lockdep issue in raw_release()
tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
tracing: Fix memleak due to race between current_tracer and trace
octeontx2-af: SDP: fix receive link config
sock: annotate data-races around prot->memory_pressure
dccp: annotate data-races in dccp_poll()
ipvlan: Fix a reference count leak warning in ipvlan_ns_exit()
net: bgmac: Fix return value check for fixed_phy_register()
net: bcmgenet: Fix return value check for fixed_phy_register()
net: validate veth and vxcan peer ifindexes
ice: fix receive buffer size miscalculation
igb: Avoid starting unnecessary workqueues
igc: Fix the typo in the PTM Control macro
net/sched: fix a qdisc modification with ambiguous command request
netfilter: nf_tables: flush pending destroy work before netlink notifier
netfilter: nf_tables: fix out of memory error handling
rtnetlink: return ENODEV when ifname does not exist and group is given
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
net: remove bond_slave_has_mac_rcu()
bonding: fix macvlan over alb bond support
net/ncsi: make one oem_gma function for all mfr id
net/ncsi: change from ndo_set_mac_address to dev_set_mac_address
Revert "KVM: x86: enable TDP MMU by default"
ibmveth: Use dcbf rather than dcbfl
NFSv4: Fix dropped lock for racing OPEN and delegation return
clk: Fix slab-out-of-bounds error in devm_clk_release()
ALSA: ymfpci: Fix the missing snd_card_free() call at probe error
mm: add a call to flush_cache_vmap() in vmap_pfn()
NFS: Fix a use after free in nfs_direct_join_group()
nfsd: Fix race to FREE_STATEID and cl_revoked
selinux: set next pointer before attaching to list
batman-adv: Trigger events for auto adjusted MTU
batman-adv: Don't increase MTU when set by user
batman-adv: Do not get eth header before batadv_check_management_packet
batman-adv: Fix TT global entry leak when client roamed back
batman-adv: Fix batadv_v_ogm_aggr_send memory leak
batman-adv: Hold rtnl lock during MTU update via netlink
lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
radix tree: remove unused variable
of: unittest: Fix EXPECT for parse_phandle_with_args_map() test
of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock
media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus
drm/vmwgfx: Fix shader stage validation
drm/display/dp: Fix the DP DSC Receiver cap size
x86/fpu: Invalidate FPU state correctly on exec()
x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
nfs: use vfs setgid helper
nfsd: use vfs setgid helper
torture: Fix hang during kthread shutdown phase
cgroup/cpuset: Rename functions dealing with DEADLINE accounting
sched/cpuset: Bring back cpuset_mutex
sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
cgroup/cpuset: Iterate only if DEADLINE tasks are present
sched/deadline: Create DL BW alloc, free & check overflow interface
cgroup/cpuset: Free DL BW in case can_attach() fails
drm/i915: Fix premature release of request's reusable memory
can: raw: add missing refcount for memory leak fix
scsi: snic: Fix double free in snic_tgt_create()
scsi: core: raid_class: Remove raid_component_add()
clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'
pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function}
dma-buf/sw_sync: Avoid recursive lock during fence signal
mm: memory-failure: kill soft_offline_free_page()
mm: memory-failure: fix unexpected return value in soft_offline_page()
mm,ima,kexec,of: use memblock_free_late from ima_free_kexec_buffer
Linux 5.15.129
Change-Id: I196922313560df14873ab83e55b11961989f33de
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Changes in 5.15.128
mmc: sdhci-f-sdh30: Replace with sdhci_pltfm
selftests: forwarding: tc_actions: cleanup temporary files when test is aborted
selftests: forwarding: tc_actions: Use ncat instead of nc
macsec: Fix traffic counters/statistics
macsec: use DEV_STATS_INC()
net/tls: Perform immediate device ctx cleanup when possible
net/tls: Multi-threaded calls to TX tls_dev_del
net: tls: avoid discarding data on record close
PCI: tegra194: Fix possible array out of bounds access
ARM: dts: imx6dl: prtrvt, prtvt7, prti6q, prtwd2: fix USB related warnings
iopoll: Call cpu_relax() in busy loops
ASoC: SOF: Intel: fix SoundWire/HDaudio mutual exclusion
dma-remap: use kvmalloc_array/kvfree for larger dma memory remap
HID: logitech-hidpp: Add USB and Bluetooth IDs for the Logitech G915 TKL Keyboard
drm/amdgpu: install stub fence into potential unused fence pointers
HID: add quirk for 03f0:464a HP Elite Presenter Mouse
RDMA/mlx5: Return the firmware result upon destroying QP/RQ
ovl: check type and offset of struct vfsmount in ovl_entry
smb: client: fix warning in cifs_smb3_do_mount()
media: v4l2-mem2mem: add lock to protect parameter num_rdy
usb: gadget: u_serial: Avoid spinlock recursion in __gs_console_push
media: platform: mediatek: vpu: fix NULL ptr dereference
thunderbolt: Read retimer NVM authentication status prior tb_retimer_set_inbound_sbtx()
usb: chipidea: imx: don't request QoS for imx8ulp
usb: chipidea: imx: add missing USB PHY DPDM wakeup setting
gfs2: Fix possible data races in gfs2_show_options()
pcmcia: rsrc_nonstatic: Fix memory leak in nonstatic_release_resource_db()
firewire: net: fix use after free in fwnet_finish_incoming_packet()
watchdog: sp5100_tco: support Hygon FCH/SCH (Server Controller Hub)
Bluetooth: L2CAP: Fix use-after-free
Bluetooth: btusb: Add MT7922 bluetooth ID for the Asus Ally
drm/amdgpu: Fix potential fence use-after-free v2
fs/ntfs3: Enhance sanity check while generating attr_list
fs: ntfs3: Fix possible null-pointer dereferences in mi_read()
fs/ntfs3: Mark ntfs dirty when on-disk struct is corrupted
ALSA: hda/realtek: Add quirks for Unis H3C Desktop B760 & Q760
ALSA: hda: fix a possible null-pointer dereference due to data race in snd_hdac_regmap_sync()
powerpc/kasan: Disable KCOV in KASAN code
ring-buffer: Do not swap cpu_buffer during resize process
iio: add addac subdirectory
iio: adc: stx104: Utilize iomap interface
iio: adc: stx104: Implement and utilize register structures
iio: stx104: Move to addac subdirectory
iio: addac: stx104: Fix race condition for stx104_write_raw()
iio: addac: stx104: Fix race condition when converting analog-to-digital
igc: read before write to SRRCTL register
ARM: dts: aspeed: asrock: Correct firmware flash SPI clocks
drm/amd/display: save restore hdcp state when display is unplugged from mst hub
drm/amd/display: phase3 mst hdcp for multiple displays
drm/amd/display: fix access hdcp_workqueue assert
usb: dwc3: gadget: Synchronize IRQ between soft connect/disconnect
usb: dwc3: Remove DWC3 locking during gadget suspend/resume
usb: dwc3: Fix typos in gadget.c
USB: dwc3: gadget: drop dead hibernation code
usb: dwc3: gadget: Improve dwc3_gadget_suspend() and dwc3_gadget_resume()
tty: serial: fsl_lpuart: Add i.MXRT1050 support
tty: serial: fsl_lpuart: make rx_watermark configurable for different platforms
tty: serial: fsl_lpuart: reduce RX watermark to 0 on LS1028A
USB: dwc3: qcom: fix NULL-deref on suspend
USB: dwc3: fix use-after-free on core driver unbind
mmc: bcm2835: fix deferred probing
mmc: sunxi: fix deferred probing
ARM: dts: imx6sll: fixup of operating points
ARM: dts: nxp/imx6sll: fix wrong property name in usbphy node
btrfs: move out now unused BG from the reclaim list
virtio-mmio: don't break lifecycle of vm_dev
vduse: Use proper spinlock for IRQ injection
cifs: fix potential oops in cifs_oplock_break
i2c: bcm-iproc: Fix bcm_iproc_i2c_isr deadlock issue
i2c: hisi: Only handle the interrupt of the driver's transfer
fbdev: mmp: fix value check in mmphw_probe()
powerpc/rtas_flash: allow user copy to flash block cache objects
tty: n_gsm: fix the UAF caused by race condition in gsm_cleanup_mux
tty: serial: fsl_lpuart: Clear the error flags by writing 1 for lpuart32 platforms
btrfs: fix BUG_ON condition in btrfs_cancel_balance
i2c: designware: Correct length byte validation logic
i2c: designware: Handle invalid SMBus block data response length value
net: xfrm: Fix xfrm_address_filter OOB read
net: af_key: fix sadb_x_filter validation
net: xfrm: Amend XFRMA_SEC_CTX nla_policy structure
xfrm: fix slab-use-after-free in decode_session6
ip6_vti: fix slab-use-after-free in decode_session6
ip_vti: fix potential slab-use-after-free in decode_session6
xfrm: add NULL check in xfrm_update_ae_params
xfrm: add forgotten nla_policy for XFRMA_MTIMER_THRESH
net: phy: fix IRQ-based wake-on-lan over hibernate / power off
selftests: mirror_gre_changes: Tighten up the TTL test match
drm/panel: simple: Fix AUO G121EAN01 panel timings according to the docs
netfilter: nf_tables: fix false-positive lockdep splat
netfilter: nf_tables: deactivate catchall elements in next generation
ipvs: fix racy memcpy in proc_do_sync_threshold
netfilter: nft_dynset: disallow object maps
net: phy: broadcom: stub c45 read/write for 54810
team: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
iavf: fix FDIR rule fields masks validation
i40e: fix misleading debug logs
net: dsa: mv88e6xxx: Wait for EEPROM done before HW reset
sock: Fix misuse of sk_under_memory_pressure()
net: do not allow gso_size to be set to GSO_BY_FRAGS
bus: ti-sysc: Flush posted write on enable before reset
arm64: dts: qcom: qrb5165-rb5: fix thermal zone conflict
ARM: dts: imx: Set default tuning step for imx6sx usdhc
ASoC: rt5665: add missed regulator_bulk_disable
ASoC: meson: axg-tdm-formatter: fix channel slot allocation
soc: aspeed: socinfo: Add kfree for kstrdup
ALSA: hda/realtek - Remodified 3k pull low procedure
riscv: uaccess: Return the number of bytes effectively not copied
serial: 8250: Fix oops for port->pm on uart_change_pm()
ALSA: usb-audio: Add support for Mythware XA001AU capture and playback interfaces.
cifs: Release folio lock on fscache read hit.
mmc: wbsd: fix double mmc_free_host() in wbsd_init()
mmc: block: Fix in_flight[issue_type] value error
drm/qxl: fix UAF on handle creation
drm/amd: flush any delayed gfxoff on suspend entry
netfilter: set default timeout to 3 secs for sctp shutdown send and recv state
exfat: check if filename entries exceeds max filename length
arm64: dts: rockchip: Disable HS400 for eMMC on ROCK Pi 4
af_unix: Fix null-ptr-deref in unix_stream_sendpage().
virtio-net: set queues after driver_ok
net: fix the RTO timer retransmitting skb every 1ms if linear option is enabled
mmc: f-sdh30: fix order of function calls in sdhci_f_sdh30_remove
x86/cpu: Fix __x86_return_thunk symbol type
x86/cpu: Fix up srso_safe_ret() and __x86_return_thunk()
x86/alternative: Make custom return thunk unconditional
objtool: Add frame-pointer-specific function ignore
x86/ibt: Add ANNOTATE_NOENDBR
x86/cpu: Clean up SRSO return thunk mess
x86/cpu: Rename original retbleed methods
x86/cpu: Rename srso_(.*)_alias to srso_alias_\1
x86/cpu: Cleanup the untrain mess
x86/srso: Explain the untraining sequences a bit more
x86/static_call: Fix __static_call_fixup()
x86/retpoline: Don't clobber RFLAGS during srso_safe_ret()
x86/CPU/AMD: Fix the DIV(0) initial fix attempt
x86/srso: Disable the mitigation on unaffected configurations
x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG
objtool/x86: Fixup frame-pointer vs rethunk
x86/srso: Correct the mitigation status when SMT is disabled
Linux 5.15.128
Change-Id: Ifcc5fc8c3027b6550eab4996c9458438f7d065b4
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Changes in 5.15.127
ksmbd: validate command request size
ksmbd: fix wrong next length validation of ea buffer in smb2_set_ea()
wireguard: allowedips: expand maximum node depth
mmc: moxart: read scr register without changing byte order
ipv6: adjust ndisc_is_useropt() to also return true for PIO
dmaengine: pl330: Return DMA_PAUSED when transaction is paused
riscv,mmio: Fix readX()-to-delay() ordering
drm/nouveau/gr: enable memory loads on helper invocation on all channels
drm/shmem-helper: Reset vma->vm_ops before calling dma_buf_mmap()
drm/amd/display: check attr flag before set cursor degamma on DCN3+
hwmon: (pmbus/bel-pfe) Enable PMBUS_SKIP_STATUS_CHECK for pfe1100
radix tree test suite: fix incorrect allocation size for pthreads
nilfs2: fix use-after-free of nilfs_root in dirtying inodes via iput
bpf: allow precision tracking for programs with subprogs
bpf: stop setting precise in current state
bpf: aggressively forget precise markings during state checkpointing
selftests/bpf: make test_align selftest more robust
selftests/bpf: Workaround verification failure for fexit_bpf2bpf/func_replace_return_code
selftests/bpf: Fix sk_assign on s390x
io_uring: correct check for O_TMPFILE
iio: cros_ec: Fix the allocation size for cros_ec_command
iio: adc: ina2xx: avoid NULL pointer dereference on OF device match
binder: fix memory leak in binder_init()
misc: rtsx: judge ASPM Mode to set PETXCFG Reg
usb-storage: alauda: Fix uninit-value in alauda_check_media()
usb: dwc3: Properly handle processing of pending events
usb: common: usb-conn-gpio: Prevent bailing out if initial role is none
usb: typec: tcpm: Fix response to vsafe0V event
x86/srso: Fix build breakage with the LLVM linker
x86/cpu/amd: Enable Zenbleed fix for AMD Custom APU 0405
x86/mm: Fix VDSO and VVAR placement on 5-level paging machines
x86/speculation: Add cpu_show_gds() prototype
x86: Move gds_ucode_mitigated() declaration to header
drm/nouveau/disp: Revert a NULL check inside nouveau_connector_get_modes
selftests/rseq: Fix build with undefined __weak
selftests: forwarding: Add a helper to skip test when using veth pairs
selftests: forwarding: ethtool: Skip when using veth pairs
selftests: forwarding: ethtool_extended_state: Skip when using veth pairs
selftests: forwarding: Skip test when no interfaces are specified
selftests: forwarding: Switch off timeout
selftests: forwarding: tc_flower: Relax success criterion
net: core: remove unnecessary frame_sz check in bpf_xdp_adjust_tail()
bpf, sockmap: Fix map type error in sock_map_del_link
bpf, sockmap: Fix bug that strp_done cannot be called
mISDN: Update parameter type of dsp_cmx_send()
net/packet: annotate data-races around tp->status
tunnels: fix kasan splat when generating ipv4 pmtu error
xsk: fix refcount underflow in error path
bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
dccp: fix data-race around dp->dccps_mss_cache
drivers: net: prevent tun_build_skb() to exceed the packet size limit
iavf: fix potential races for FDIR filters
IB/hfi1: Fix possible panic during hotplug remove
drm/rockchip: Don't spam logs in atomic check
wifi: cfg80211: fix sband iftype data lookup for AP_VLAN
RDMA/umem: Set iova in ODP flow
net: phy: at803x: remove set/get wol callbacks for AR8032
net: hns3: refactor hclge_mac_link_status_wait for interface reuse
net: hns3: add wait until mac link down
nexthop: Fix infinite nexthop dump when using maximum nexthop ID
nexthop: Make nexthop bucket dump more efficient
nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID
dmaengine: mcf-edma: Fix a potential un-allocated memory access
net/mlx5: Allow 0 for total host VFs
net/mlx5: Skip clock update work when device is in error state
ibmvnic: Enforce stronger sanity checks on login response
ibmvnic: Unmap DMA login rsp buffer on send login fail
ibmvnic: Handle DMA unmapping of login buffs in release functions
btrfs: don't stop integrity writeback too early
btrfs: exit gracefully if reloc roots don't match
btrfs: reject invalid reloc tree root keys with stack dump
btrfs: set cache_block_group_error if we find an error
nvme-tcp: fix potential unbalanced freeze & unfreeze
nvme-rdma: fix potential unbalanced freeze & unfreeze
netfilter: nf_tables: report use refcount overflow
scsi: core: Fix legacy /proc parsing buffer overflow
scsi: storvsc: Fix handling of virtual Fibre Channel timeouts
scsi: 53c700: Check that command slot is not NULL
scsi: snic: Fix possible memory leak if device_add() fails
scsi: core: Fix possible memory leak if device_add() fails
scsi: fnic: Replace return codes in fnic_clean_pending_aborts()
scsi: qedi: Fix firmware halt over suspend and resume
scsi: qedf: Fix firmware halt over suspend and resume
alpha: remove __init annotation from exported page_is_ram()
sch_netem: fix issues in netem_change() vs get_dist_table()
tick: Detect and fix jiffies update stall
timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped
timers/nohz: Last resort update jiffies on nohz_full IRQ entry
Linux 5.15.127
Change-Id: Ifa62f86a491dace0d7169fe3de65ebb6bbffe6df
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Changes in 5.15.126
io_uring: gate iowait schedule on having pending requests
perf: Fix function pointer case
net/mlx5: Free irqs only on shutdown callback
arm64: errata: Add workaround for TSB flush failures
arm64: errata: Add detection for TRBE write to out-of-range
iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982
iommu/arm-smmu-v3: Document MMU-700 erratum 2812531
iommu/arm-smmu-v3: Add explicit feature for nesting
iommu/arm-smmu-v3: Document nesting-related errata
arm64: dts: imx8mn-var-som: add missing pull-up for onboard PHY reset pinmux
word-at-a-time: use the same return type for has_zero regardless of endianness
KVM: s390: fix sthyi error handling
wifi: cfg80211: Fix return value in scan logic
net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx
net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing
rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length
net: dsa: fix value check in bcm_sf2_sw_probe()
perf test uprobe_from_different_cu: Skip if there is no gcc
net: sched: cls_u32: Fix match key mis-addressing
mISDN: hfcpci: Fix potential deadlock on &hc->lock
qed: Fix kernel-doc warnings
qed: Fix scheduling in a tasklet while getting stats
net: annotate data-races around sk->sk_max_pacing_rate
net: add missing READ_ONCE(sk->sk_rcvlowat) annotation
net: add missing READ_ONCE(sk->sk_sndbuf) annotation
net: add missing READ_ONCE(sk->sk_rcvbuf) annotation
net: add missing data-race annotations around sk->sk_peek_off
net: add missing data-race annotation for sk_ll_usec
net/sched: taprio: Limit TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME to INT_MAX.
bpf, cpumap: Handle skb as well when clean up ptr_ring
net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free
net/sched: cls_fw: No longer copy tcf_result on update to avoid use-after-free
net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free
bpf: sockmap: Remove preempt_disable in sock_map_sk_acquire
net: ll_temac: Switch to use dev_err_probe() helper
net: ll_temac: fix error checking of irq_of_parse_and_map()
net: korina: handle clk prepare error in korina_probe()
net: netsec: Ignore 'phy-mode' on SynQuacer in DT mode
net: dcb: choose correct policy to parse DCB_ATTR_BCN
s390/qeth: Don't call dev_close/dev_open (DOWN/UP)
ip6mr: Fix skb_under_panic in ip6mr_cache_report()
vxlan: Fix nexthop hash size
net/mlx5: fs_core: Make find_closest_ft more generic
net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio
prestera: fix fallback to previous version on same major version
tcp_metrics: fix addr_same() helper
tcp_metrics: annotate data-races around tm->tcpm_stamp
tcp_metrics: annotate data-races around tm->tcpm_lock
tcp_metrics: annotate data-races around tm->tcpm_vals[]
tcp_metrics: annotate data-races around tm->tcpm_net
tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
scsi: zfcp: Defer fc_rport blocking until after ADISC response
scsi: storvsc: Limit max_sectors for virtual Fibre Channel devices
libceph: fix potential hang in ceph_osdc_notify()
USB: zaurus: Add ID for A-300/B-500/C-700
ceph: defer stopping mdsc delayed_work
firmware: arm_scmi: Drop OF node reference in the transport channel setup
x86/CPU/AMD: Do not leak quotient data after a division by 0
exfat: use kvmalloc_array/kvfree instead of kmalloc_array/kfree
exfat: release s_lock before calling dir_emit()
mtd: spinand: toshiba: Fix ecc_get_status
mtd: rawnand: meson: fix OOB available bytes for ECC
arm64: dts: stratix10: fix incorrect I2C property for SCL signal
net: tun_chr_open(): set sk_uid from current_fsuid()
net: tap_open(): set sk_uid from current_fsuid()
wifi: mt76: mt7615: do not advertise 5 GHz on first phy of MT7615D (DBDC)
rbd: prevent busy loop when requesting exclusive lock
bpf: Disable preemption in bpf_event_output
open: make RESOLVE_CACHED correctly test for O_TMPFILE
drm/ttm: check null pointer before accessing when swapping
bpf, cpumap: Make sure kthread is running before map update returns
file: reinstate f_pos locking optimization for regular files
fs/ntfs3: Use __GFP_NOWARN allocation at ntfs_load_attr_list()
fs/sysv: Null check to prevent null-ptr-deref bug
Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb
net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb
fs: Protect reconfiguration of sb read-write from racing writes
ext2: Drop fragment support
mtd: rawnand: omap_elm: Fix incorrect type in assignment
mtd: rawnand: rockchip: fix oobfree offset and description
mtd: rawnand: rockchip: Align hwecc vs. raw page helper layouts
mtd: rawnand: fsl_upm: Fix an off-by one test in fun_exec_op()
powerpc/mm/altmap: Fix altmap boundary check
drm/imx/ipuv3: Fix front porch adjustment upon hactive aligning
selftests/rseq: check if libc rseq support is registered
selftests/rseq: Play nice with binaries statically linked against glibc 2.35+
soundwire: bus: pm_runtime_request_resume on peripheral attachment
soundwire: fix enumeration completion
PM / wakeirq: support enabling wake-up irq after runtime_suspend called
PM: sleep: wakeirq: fix wake irq arming
Linux 5.15.126
Change-Id: I5d56ea11000b37a22ec3e38dcf4ab58622ca109f
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit f0362a2536 upstream.
The code calling ima_free_kexec_buffer runs long after the memblock
allocator has already been torn down, potentially resulting in a use
after free in memblock_isolate_range.
With KASAN or KFENCE, this use after free will result in a BUG
from the idle task, and a subsequent kernel panic.
Switch ima_free_kexec_buffer over to memblock_free_late to avoid
that issue.
Fixes: fee3ff99bc ("powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c")
Cc: stable@kernel.org
Signed-off-by: Rik van Riel <riel@surriel.com>
Suggested-by: Mike Rappoport <rppt@kernel.org>
Link: https://lore.kernel.org/r/20230817135759.0888e5ef@imladris.surriel.com
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mike Rappoport (IBM) <rppt@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1bd3a76880 upstream.
Commit 41320b18a0 ("scsi: snic: Fix possible memory leak if device_add()
fails") fixed the memory leak caused by dev_set_name() when device_add()
failed. However, it did not consider that 'tgt' has already been released
when put_device(&tgt->dev) is called. Remove kfree(tgt) in the error path
to avoid double free of 'tgt' and move put_device(&tgt->dev) after the
removed kfree(tgt) to avoid a use-after-free.
Fixes: 41320b18a0 ("scsi: snic: Fix possible memory leak if device_add() fails")
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Link: https://lore.kernel.org/r/20230819083941.164365-1-wangzhu9@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a337b64f0d upstream.
Infinite waits for completion of GPU activity have been observed in CI,
mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or
igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps
generated with a lot of extra trace_printk() calls added to the code,
revealed loops of request dependencies being accidentally built,
preventing the requests from being processed, each waiting for completion
of another one's activity.
After we substitute a new request for a last active one tracked on a
timeline, we set up a dependency of our new request to wait on completion
of current activity of that previous one. While doing that, we must take
care of keeping the old request still in memory until we use its
attributes for setting up that await dependency, or we can happen to set
up the await dependency on an unrelated request that already reuses the
memory previously allocated to the old one, already released. Combined
with perf adding consecutive kernel context remote requests to different
user context timelines, unresolvable loops of await dependencies can be
built, leading do infinite waits.
We obtain a pointer to the previous request to wait upon when we
substitute it with a pointer to our new request in an active tracker,
e.g. in intel_timeline.last_request. In some processing paths we protect
that old request from being freed before we use it by getting a reference
to it under RCU protection, but in others, e.g. __i915_request_commit()
-> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(),
we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU,
that RCU protection is not sufficient against reuse of memory.
We could protect i915_request's memory from being prematurely reused by
calling its release function via call_rcu() and using rcu_read_lock()
consequently, as proposed in v1. However, that approach leads to
significant (up to 10 times) increase of SLAB utilization by i915_request
SLAB cache. Another potential approach is to take a reference to the
previous active fence.
When updating an active fence tracker, we first lock the new fence,
substitute a pointer of the current active fence with the new one, then we
lock the substituted fence. With this approach, there is a time window
after the substitution and before the lock when the request can be
concurrently released by an interrupt handler and its memory reused, then
we may happen to lock and return a new, unrelated request.
Always get a reference to the current active fence first, before
replacing it with a new one. Having it protected from premature release
and reuse, lock it and then replace with the new one but only if not
yet signalled via a potential concurrent interrupt nor replaced with
another one by a potential concurrent thread, otherwise retry, starting
from getting a reference to the new current one. Adjust users to not
get a reference to the previous active fence themselves and always put the
reference got by __i915_active_fence_set() when no longer needed.
v3: Fix lockdep splat reports and other issues caused by incorrect use of
try_cmpxchg() (use (cmpxchg() != prev) instead)
v2: Protect request's memory by getting a reference to it in favor of
delegating its release to call_rcu() (Chris)
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211
Fixes: df9f85d858 ("drm/i915: Serialise i915_active_fence_set() with itself")
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 946e047a3d)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ef269ef1a upstream.
cpuset_can_attach() can fail. Postpone DL BW allocation until all tasks
have been checked. DL BW is not allocated per-task but as a sum over
all DL tasks migrating.
If multiple controllers are attached to the cgroup next to the cpuset
controller a non-cpuset can_attach() can fail. In this case free DL BW
in cpuset_cancel_attach().
Finally, update cpuset DL task count (nr_deadline_tasks) only in
cpuset_attach().
Suggested-by: Waiman Long <longman@redhat.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[ Conflict in kernel/cgroup/cpuset.c due to pulling extra neighboring
functions that are not applicable on this branch. ]
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 85989106fe upstream.
While moving a set of tasks between exclusive cpusets,
cpuset_can_attach() -> task_can_attach() calls dl_cpu_busy(..., p) for
DL BW overflow checking and per-task DL BW allocation on the destination
root_domain for the DL tasks in this set.
This approach has the issue of not freeing already allocated DL BW in
the following error cases:
(1) The set of tasks includes multiple DL tasks and DL BW overflow
checking fails for one of the subsequent DL tasks.
(2) Another controller next to the cpuset controller which is attached
to the same cgroup fails in its can_attach().
To address this problem rework dl_cpu_busy():
(1) Split it into dl_bw_check_overflow() & dl_bw_alloc() and add a
dedicated dl_bw_free().
(2) dl_bw_alloc() & dl_bw_free() take a `u64 dl_bw` parameter instead of
a `struct task_struct *p` used in dl_cpu_busy(). This allows to
allocate DL BW for a set of tasks too rather than only for a single
task.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c24849f55 upstream.
Qais reported that iterating over all tasks when rebuilding root domains
for finding out which ones are DEADLINE and need their bandwidth
correctly restored on such root domains can be a costly operation (10+
ms delays on suspend-resume).
To fix the problem keep track of the number of DEADLINE tasks belonging
to each cpuset and then use this information (followup patch) to only
perform the above iteration if DEADLINE tasks are actually present in
the cpuset for which a corresponding root domain is being rebuilt.
Reported-by: Qais Yousef (Google) <qyousef@layalina.io>
Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[ Conflict in kernel/cgroup/cpuset.c and kernel/sched/deadline.c due to
pulling new code. Reject new code/fields. ]
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 111cd11bbc upstream.
Turns out percpu_cpuset_rwsem - commit 1243dc518c ("cgroup/cpuset:
Convert cpuset_mutex to percpu_rwsem") - wasn't such a brilliant idea,
as it has been reported to cause slowdowns in workloads that need to
change cpuset configuration frequently and it is also not implementing
priority inheritance (which causes troubles with realtime workloads).
Convert percpu_cpuset_rwsem back to regular cpuset_mutex. Also grab it
only for SCHED_DEADLINE tasks (other policies don't care about stable
cpusets anyway).
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[ Conflict in kernel/cgroup/cpuset.c due to pulling changes in functions
or comments that don't exist on this branch. Remove a BUG_ON() for rwsem
that doesn't exist on mainline. ]
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad3a557daf upstream.
rebuild_root_domains() and update_tasks_root_domain() have neutral
names, but actually deal with DEADLINE bandwidth accounting.
Rename them to use 'dl_' prefix so that intent is more clear.
No functional change.
Suggested-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d52d3a2bf4 upstream.
During rcutorture shutdown, the rcu_torture_cleanup() function calls
torture_cleanup_begin(), which sets the fullstop global variable to
FULLSTOP_RMMOD. This causes the rcutorture threads for readers and
fakewriters to exit all of their "while" loops and start shutting down.
They then call torture_kthread_stopping(), which in turn waits for
kthread_stop() to be called. However, rcu_torture_cleanup() has
not yet called kthread_stop() on those threads, and before it gets a
chance to do so, multiple instances of torture_kthread_stopping() invoke
schedule_timeout_interruptible(1) in a tight loop. Tracing confirms that
TIMER_SOFTIRQ can then continuously execute timer callbacks. If that
TIMER_SOFTIRQ preempts the task executing rcu_torture_cleanup(), that
task might never invoke kthread_stop().
This commit improves this situation by increasing the timeout passed to
schedule_timeout_interruptible() from one jiffy to 1/20th of a second.
This change prevents TIMER_SOFTIRQ from monopolizing its CPU, thus
allowing rcu_torture_cleanup() to carry out the needed kthread_stop()
invocations. Testing has shown 100 runs of TREE07 passing reliably,
as oppose to the tens-of-percent failure rates seen beforehand.
Cc: Paul McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Zhouyi Zhou <zhouzhouyi@gmail.com>
Cc: <stable@vger.kernel.org> # 6.0.x
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d8ae8c417 upstream.
We've aligned setgid behavior over multiple kernel releases. The details
can be found in commit cf619f8919 ("Merge tag 'fs.ovl.setgid.v6.2' of
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping") and
commit 426b4ca2d6 ("Merge tag 'fs.setgid.v6.0' of
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux").
Consistent setgid stripping behavior is now encapsulated in the
setattr_should_drop_sgid() helper which is used by all filesystems that
strip setgid bits outside of vfs proper. Usually ATTR_KILL_SGID is
raised in e.g., chown_common() and is subject to the
setattr_should_drop_sgid() check to determine whether the setgid bit can
be retained. Since nfsd is raising ATTR_KILL_SGID unconditionally it
will cause notify_change() to strip it even if the caller had the
necessary privileges to retain it. Ensure that nfsd only raises
ATR_KILL_SGID if the caller lacks the necessary privileges to retain the
setgid bit.
Without this patch the setgid stripping tests in LTP will fail:
> As you can see, the problem is S_ISGID (0002000) was dropped on a
> non-group-executable file while chown was invoked by super-user, while
[...]
> fchown02.c:66: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700
[...]
> chown02.c:57: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700
With this patch all tests pass.
Reported-by: Sherry Yang <sherry.yang@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[ Harshit: backport to 5.15.y:
Use init_user_ns instead of nop_mnt_idmap as we don't have
commit abf08576af ("fs: port vfs_*() helpers to struct mnt_idmap") ]
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f704d9a83 upstream.
We've aligned setgid behavior over multiple kernel releases. The details
can be found in the following two merge messages:
cf619f8919 ("Merge tag 'fs.ovl.setgid.v6.2')
426b4ca2d6 ("Merge tag 'fs.setgid.v6.0')
Consistent setgid stripping behavior is now encapsulated in the
setattr_should_drop_sgid() helper which is used by all filesystems that
strip setgid bits outside of vfs proper. Switch nfs to rely on this
helper as well. Without this patch the setgid stripping tests in
xfstests will fail.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230313-fs-nfs-setgid-v2-1-9a59f436cfc0@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
[ Harshit: backport to 5.15.y]
fs/internal.h -- minor conflcit due to code change differences.
include/linux/fs.h -- Used struct user_namespace *mnt_userns
instead of struct mnt_idmap *idmap
fs/nfs/inode.c -- Used init_user_ns instead of nop_mnt_idmap ]
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c66ca3949 upstream.
0-Day found a 34.6% regression in stress-ng's 'af-alg' test case, and
bisected it to commit b81fac906a ("x86/fpu: Move FPU initialization into
arch_cpu_finalize_init()"), which optimizes the FPU init order, and moves
the CR4_OSXSAVE enabling into a later place:
arch_cpu_finalize_init
identify_boot_cpu
identify_cpu
generic_identify
get_cpu_cap --> setup cpu capability
...
fpu__init_cpu
fpu__init_cpu_xstate
cr4_set_bits(X86_CR4_OSXSAVE);
As the FPU is not yet initialized the CPU capability setup fails to set
X86_FEATURE_OSXSAVE. Many security module like 'camellia_aesni_avx_x86_64'
depend on this feature and therefore fail to load, causing the regression.
Cure this by setting X86_FEATURE_OSXSAVE feature right after OSXSAVE
enabling.
[ tglx: Moved it into the actual BSP FPU initialization code and added a comment ]
Fixes: b81fac906a ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/202307192135.203ac24e-oliver.sang@intel.com
Link: https://lore.kernel.org/lkml/20230823065747.92257-1-feng.tang@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f69383b20 upstream.
The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
valid and should be reloaded when returning to userspace. However, the
kernel will skip doing this if the FPU registers are already valid as
determined by fpregs_state_valid(). The logic embedded there considers
the state valid if two cases are both true:
1: fpu_fpregs_owner_ctx points to the current tasks FPU state
2: the last CPU the registers were live in was the current CPU.
This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
the current FPU during the fpregs_restore_userregs() operation, so it
indicates that the registers have been restored on this CPU. But this
alone doesn’t preclude that the task hasn’t been rescheduled to a
different CPU, where the registers were modified, and then back to the
current CPU. To verify that this was not the case the logic relies on the
second condition. So the assumption is that if the registers have been
restored, AND they haven’t had the chance to be modified (by being
loaded on another CPU), then they MUST be valid on the current CPU.
Besides the lazy FPU optimizations, the other cases where the FPU
registers might not be valid are when the kernel modifies the FPU register
state or the FPU saved buffer. In this case the operation modifying the
FPU state needs to let the kernel know the correspondence has been
broken. The comment in “arch/x86/kernel/fpu/context.h” has:
/*
...
* If the FPU register state is valid, the kernel can skip restoring the
* FPU state from memory.
*
* Any code that clobbers the FPU registers or updates the in-memory
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
* Either one of these invalidation functions is enough. Invalidate
* a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
However, this is not completely true. When the kernel modifies the
registers or saved FPU state, it can only rely on
__fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
tracking. The exec path instead relies on fpregs_deactivate(), which sets
the CPU’s FPU context to NULL. This was observed to fail to restore the
reset FPU state to the registers when returning to userspace in the
following scenario:
1. A task is executing in userspace on CPU0
- CPU0’s FPU context points to tasks
- fpu->last_cpu=CPU0
2. The task exec()’s
3. While in the kernel the task is preempted
- CPU0 gets a thread executing in the kernel (such that no other
FPU context is activated)
- Scheduler sets task’s fpu->last_cpu=CPU0 when scheduling out
4. Task is migrated to CPU1
5. Continuing the exec(), the task gets to
fpu_flush_thread()->fpu_reset_fpregs()
- Sets CPU1’s fpu context to NULL
- Copies the init state to the task’s FPU buffer
- Sets TIF_NEED_FPU_LOAD on the task
6. The task reschedules back to CPU0 before completing the exec() and
returning to userspace
- During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
- Skips saving the registers and updating task’s fpu→last_cpu,
because TIF_NEED_FPU_LOAD is the canonical source.
7. Now CPU0’s FPU context is still pointing to the task’s, and
fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
though the reset FPU state has not been restored.
So the root cause is that exec() is doing the wrong kind of invalidate. It
should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
going away, they are just getting reset as part of an exec. So switch to
__fpu_invalidate_fpregs_state().
Also, delete the misleading comment that says that either kind of
invalidate will be enough, because it’s not always the case.
Fixes: 33344368cb ("x86/fpu: Clean up the fpu__clear() variants")
Reported-by: Lei Wang <lei4.wang@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Lijun Pan <lijun.pan@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Lijun Pan <lijun.pan@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230818170305.502891-1-rick.p.edgecombe@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc22522fd5 upstream.
40613da52b ("PCI: acpiphp: Reassign resources on bridge if necessary")
changed acpiphp hotplug to use pci_assign_unassigned_bridge_resources()
which depends on bridge being available, however enable_slot() can be
called without bridge associated:
1. Legitimate case of hotplug on root bus (widely used in virt world)
2. A (misbehaving) firmware, that sends ACPI Bus Check notifications to
non existing root ports (Dell Inspiron 7352/0W6WV0), which end up at
enable_slot(..., bridge = 0) where bus has no bridge assigned to it.
acpihp doesn't know that it's a bridge, and bus specific 'PCI
subsystem' can't augment ACPI context with bridge information since
the PCI device to get this data from is/was not available.
Issue is easy to reproduce with QEMU's 'pc' machine, which supports PCI
hotplug on hostbridge slots. To reproduce, boot kernel at commit
40613da52b in VM started with following CLI (assuming guest root fs is
installed on sda1 partition):
# qemu-system-x86_64 -M pc -m 1G -enable-kvm -cpu host \
-monitor stdio -serial file:serial.log \
-kernel arch/x86/boot/bzImage \
-append "root=/dev/sda1 console=ttyS0" \
guest_disk.img
Once guest OS is fully booted at qemu prompt:
(qemu) device_add e1000
(check serial.log) it will cause NULL pointer dereference at:
void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
{
struct pci_bus *parent = bridge->subordinate;
BUG: kernel NULL pointer dereference, address: 0000000000000018
? pci_assign_unassigned_bridge_resources+0x1f/0x260
enable_slot+0x21f/0x3e0
acpiphp_hotplug_notify+0x13d/0x260
acpi_device_hotplug+0xbc/0x540
acpi_hotplug_work_fn+0x15/0x20
process_one_work+0x1f7/0x370
worker_thread+0x45/0x3b0
The issue was discovered on Dell Inspiron 7352/0W6WV0 laptop with following
sequence:
1. Suspend to RAM
2. Wake up with the same backtrace being observed:
3. 2nd suspend to RAM attempt makes laptop freeze
Fix it by using __pci_bus_assign_resources() instead of
pci_assign_unassigned_bridge_resources() as we used to do, but only in case
when bus doesn't have a bridge associated (to cover for the case of ACPI
event on hostbridge or non existing root port).
That lets us keep hotplug on root bus working like it used to and at the
same time keeps resource reassignment usable on root ports (and other 1st
level bridges) that was fixed by 40613da52b.
Fixes: 40613da52b ("PCI: acpiphp: Reassign resources on bridge if necessary")
Link: https://lore.kernel.org/r/20230726123518.2361181-2-imammedo@redhat.com
Reported-by: Woody Suwalski <terraluna977@gmail.com>
Tested-by: Woody Suwalski <terraluna977@gmail.com>
Tested-by: Michal Koutný <mkoutny@suse.com>
Link: https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e7f2e65699 upstream.
variable *nplanes is provided by user via system call argument. The
possible value of q_data->fmt->num_planes is 1-3, while the value
of *nplanes can be 1-8. The array access by index i can cause array
out-of-bounds.
Fix this bug by checking *nplanes against the array size.
Fixes: 4e855a6efa ("[media] vcodec: mediatek: Add Mediatek V4L2 Video Encoder Driver")
Signed-off-by: Wei Chen <harperchen1110@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 914d9d831e upstream.
While originally it was fine to format strings using "%pOF" while
holding devtree_lock, this now causes a deadlock. Lockdep reports:
of_get_parent from of_fwnode_get_parent+0x18/0x24
^^^^^^^^^^^^^
of_fwnode_get_parent from fwnode_count_parents+0xc/0x28
fwnode_count_parents from fwnode_full_name_string+0x18/0xac
fwnode_full_name_string from device_node_string+0x1a0/0x404
device_node_string from pointer+0x3c0/0x534
pointer from vsnprintf+0x248/0x36c
vsnprintf from vprintk_store+0x130/0x3b4
Fix this by moving the printing in __of_changeset_entry_apply() outside
the lock. As the only difference in the multiple prints is the action
name, use the existing "action_names" to refactor the prints into a
single print.
Fixes: a92eb7621b ("lib/vsprintf: Make use of fwnode API to obtain node names and separators")
Cc: stable@vger.kernel.org
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230801-dt-changeset-fixes-v3-2-5f0410e007dd@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 382d4cd184 upstream.
The gcc compiler translates on some architectures the 64-bit
__builtin_clzll() function to a call to the libgcc function __clzdi2(),
which should take a 64-bit parameter on 32- and 64-bit platforms.
But in the current kernel code, the built-in __clzdi2() function is
defined to operate (wrongly) on 32-bit parameters if BITS_PER_LONG ==
32, thus the return values on 32-bit kernels are in the range from
[0..31] instead of the expected [0..63] range.
This patch fixes the in-kernel functions __clzdi2() and __ctzdi2() to
take a 64-bit parameter on 32-bit kernels as well, thus it makes the
functions identical for 32- and 64-bit kernels.
This bug went unnoticed since kernel 3.11 for over 10 years, and here
are some possible reasons for that:
a) Some architectures have assembly instructions to count the bits and
which are used instead of calling __clzdi2(), e.g. on x86 the bsr
instruction and on ppc cntlz is used. On such architectures the
wrong __clzdi2() implementation isn't used and as such the bug has
no effect and won't be noticed.
b) Some architectures link to libgcc.a, and the in-kernel weak
functions get replaced by the correct 64-bit variants from libgcc.a.
c) __builtin_clzll() and __clzdi2() doesn't seem to be used in many
places in the kernel, and most likely only in uncritical functions,
e.g. when printing hex values via seq_put_hex_ll(). The wrong return
value will still print the correct number, but just in a wrong
formatting (e.g. with too many leading zeroes).
d) 32-bit kernels aren't used that much any longer, so they are less
tested.
A trivial testcase to verify if the currently running 32-bit kernel is
affected by the bug is to look at the output of /proc/self/maps:
Here the kernel uses a correct implementation of __clzdi2():
root@debian:~# cat /proc/self/maps
00010000-00019000 r-xp 00000000 08:05 787324 /usr/bin/cat
00019000-0001a000 rwxp 00009000 08:05 787324 /usr/bin/cat
0001a000-0003b000 rwxp 00000000 00:00 0 [heap]
f7551000-f770d000 r-xp 00000000 08:05 794765 /usr/lib/hppa-linux-gnu/libc.so.6
...
and this kernel uses the broken implementation of __clzdi2():
root@debian:~# cat /proc/self/maps
0000000010000-0000000019000 r-xp 00000000 000000008:000000005 787324 /usr/bin/cat
0000000019000-000000001a000 rwxp 000000009000 000000008:000000005 787324 /usr/bin/cat
000000001a000-000000003b000 rwxp 00000000 00:00 0 [heap]
00000000f73d1000-00000000f758d000 r-xp 00000000 000000008:000000005 794765 /usr/lib/hppa-linux-gnu/libc.so.6
...
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 4df87bb7b6 ("lib: add weak clz/ctz functions")
Cc: Chanho Min <chanho.min@lge.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>