When ufshcd_abort_one is racing with the completion ISR, the completed tag
of the request's mq_hctx pointer will be set to NULL by ISR. Return
success when request is completed by ISR because ufshcd_abort_one does not
need to do anything.
The racing flow is:
Thread A
ufshcd_err_handler step 1
...
ufshcd_abort_one
ufshcd_try_to_abort_task
ufshcd_cmd_inflight(true) step 3
ufshcd_mcq_req_to_hwq
blk_mq_unique_tag
rq->mq_hctx->queue_num step 5
Thread B
ufs_mtk_mcq_intr(cq complete ISR) step 2
scsi_done
...
__blk_mq_free_request
rq->mq_hctx = NULL; step 4
Below is KE back trace.
ufshcd_try_to_abort_task: cmd at tag 41 not pending in the device.
ufshcd_try_to_abort_task: cmd at tag=41 is cleared.
Aborting tag 41 / CDB 0x28 succeeded
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194
pc : [0xffffffddd7a79bf8] blk_mq_unique_tag+0x8/0x14
lr : [0xffffffddd6155b84] ufshcd_mcq_req_to_hwq+0x1c/0x40 [ufs_mediatek_mod_ise]
do_mem_abort+0x58/0x118
el1_abort+0x3c/0x5c
el1h_64_sync_handler+0x54/0x90
el1h_64_sync+0x68/0x6c
blk_mq_unique_tag+0x8/0x14
ufshcd_err_handler+0xae4/0xfa8 [ufs_mediatek_mod_ise]
process_one_work+0x208/0x4fc
worker_thread+0x228/0x438
kthread+0x104/0x1d4
ret_from_fork+0x10/0x20
Bug: 361140026
Fixes: 93e6c0e19d5b ("scsi: ufs: core: Clear cmd if abort succeeds in MCQ mode")
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Change-Id: I42f9b93dae33eac8cf41ac3085858b6adf0ee9ee
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Link: https://lore.kernel.org/r/20240628070030.30929-3-peter.wang@mediatek.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 74736103fb4123c71bf11fb7a6abe7c884c5269e)
[ Resolved minor conflict in drivers/ufs/core/ufshcd.c ]
When ufshcd_clear_cmd is racing with the completion ISR, the completed tag
of the request's mq_hctx pointer will be set to NULL by the ISR. And
ufshcd_clear_cmd's call to ufshcd_mcq_req_to_hwq will get NULL pointer KE.
Return success when the request is completed by ISR because sq does not
need cleanup.
The racing flow is:
Thread A
ufshcd_err_handler step 1
ufshcd_try_to_abort_task
ufshcd_cmd_inflight(true) step 3
ufshcd_clear_cmd
...
ufshcd_mcq_req_to_hwq
blk_mq_unique_tag
rq->mq_hctx->queue_num step 5
Thread B
ufs_mtk_mcq_intr(cq complete ISR) step 2
scsi_done
...
__blk_mq_free_request
rq->mq_hctx = NULL; step 4
Below is KE back trace:
ufshcd_try_to_abort_task: cmd pending in the device. tag = 6
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194
pc : [0xffffffd589679bf8] blk_mq_unique_tag+0x8/0x14
lr : [0xffffffd5862f95b4] ufshcd_mcq_sq_cleanup+0x6c/0x1cc [ufs_mediatek_mod_ise]
Workqueue: ufs_eh_wq_0 ufshcd_err_handler [ufs_mediatek_mod_ise]
Call trace:
dump_backtrace+0xf8/0x148
show_stack+0x18/0x24
dump_stack_lvl+0x60/0x7c
dump_stack+0x18/0x3c
mrdump_common_die+0x24c/0x398 [mrdump]
ipanic_die+0x20/0x34 [mrdump]
notify_die+0x80/0xd8
die+0x94/0x2b8
__do_kernel_fault+0x264/0x298
do_page_fault+0xa4/0x4b8
do_translation_fault+0x38/0x54
do_mem_abort+0x58/0x118
el1_abort+0x3c/0x5c
el1h_64_sync_handler+0x54/0x90
el1h_64_sync+0x68/0x6c
blk_mq_unique_tag+0x8/0x14
ufshcd_clear_cmd+0x34/0x118 [ufs_mediatek_mod_ise]
ufshcd_try_to_abort_task+0x2c8/0x5b4 [ufs_mediatek_mod_ise]
ufshcd_err_handler+0xa7c/0xfa8 [ufs_mediatek_mod_ise]
process_one_work+0x208/0x4fc
worker_thread+0x228/0x438
kthread+0x104/0x1d4
ret_from_fork+0x10/0x20
Bug: 361140026
Fixes: 8d72903489 ("scsi: ufs: mcq: Add supporting functions for MCQ abort")
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Change-Id: I59fc0a8246d2fc2421e38c42b619b2393d2b22e6
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Link: https://lore.kernel.org/r/20240628070030.30929-2-peter.wang@mediatek.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 9307a998cb9846a2557fdca286997430bee36a2a)
[ Resolved minor conflict in drivers/ufs/core/ufshcd.c ]
The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of
interrupt affinity reconfiguration via procfs. Instead, the change is
deferred until the next instance of the interrupt being triggered on the
original CPU.
When the interrupt next triggers on the original CPU, the new affinity is
enforced within __irq_move_irq(). A vector is allocated from the new CPU,
but the old vector on the original CPU remains and is not immediately
reclaimed. Instead, apicd->move_in_progress is flagged, and the reclaiming
process is delayed until the next trigger of the interrupt on the new CPU.
Upon the subsequent triggering of the interrupt on the new CPU,
irq_complete_move() adds a task to the old CPU's vector_cleanup list if it
remains online. Subsequently, the timer on the old CPU iterates over its
vector_cleanup list, reclaiming old vectors.
However, a rare scenario arises if the old CPU is outgoing before the
interrupt triggers again on the new CPU.
In that case irq_force_complete_move() is not invoked on the outgoing CPU
to reclaim the old apicd->prev_vector because the interrupt isn't currently
affine to the outgoing CPU, and irq_needs_fixup() returns false. Even
though __vector_schedule_cleanup() is later called on the new CPU, it
doesn't reclaim apicd->prev_vector; instead, it simply resets both
apicd->move_in_progress and apicd->prev_vector to 0.
As a result, the vector remains unreclaimed in vector_matrix, leading to a
CPU vector leak.
To address this issue, move the invocation of irq_force_complete_move()
before the irq_needs_fixup() call to reclaim apicd->prev_vector, if the
interrupt is currently or used to be affine to the outgoing CPU.
Additionally, reclaim the vector in __vector_schedule_cleanup() as well,
following a warning message, although theoretically it should never see
apicd->move_in_progress with apicd->prev_cpu pointing to an offline CPU.
Fixes: f0383c24b4 ("genirq/cpuhotplug: Add support for cleaning up move in progress")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240522220218.162423-1-dongli.zhang@oracle.com
Bug: 359158960
Change-Id: Ib9a564add4cd1c90ec725ab7f0bdc0db505ae29f
(cherry picked from commit a6c11c0a5235fb144a65e0cb2ffd360ddc1f6c32)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
When a CPU goes offline, the interrupts affine to that CPU are
re-configured.
Managed interrupts undergo either migration to other CPUs or shutdown if
all CPUs listed in the affinity are offline. The migration of managed
interrupts is guaranteed on x86 because there are interrupt vectors
reserved.
Regular interrupts are migrated to a still online CPU in the affinity mask
or if there is no online CPU to any online CPU.
This works as long as the still online CPUs in the affinity mask have
interrupt vectors available, but in case that none of those CPUs has a
vector available the migration fails and the device interrupt becomes
stale.
This is not any different from the case where the affinity mask does not
contain any online CPU, but there is no fallback operation for this.
Instead of giving up, retry the migration attempt with the online CPU mask
if the interrupt is not managed, as managed interrupts cannot be affected
by this problem.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240423073413.79625-1-dongli.zhang@oracle.com
Bug: 359158960
Change-Id: I3e8acfe0598b0cb94389115d680d2216833d6a0c
(cherry picked from commit 88d724e2301a69c1ab805cd74fc27aa36ae529e0)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
irq_restore_affinity_of_irq() restarts managed interrupts unconditionally
when the first CPU in the affinity mask comes online. That's correct during
normal hotplug operations, but not when resuming from S3 because the
drivers are not resumed yet and interrupt delivery is not expected by them.
Skip the startup of suspended interrupts and let resume_device_irqs() deal
with restoring them. This ensures that irqs are not delivered to drivers
during the noirq phase of resuming from S3, after non-boot CPUs are brought
back online.
Signed-off-by: David Stevens <stevensd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240424090341.72236-1-stevensd@chromium.org
Bug: 359158960
Change-Id: Ie5610f7dffe05a141e6db1a8b6b067845c910b4a
(cherry picked from commit a60dd06af674d3bb76b40da5d722e4a0ecefe650)
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Currently, the UTP_TASK_REQ_LIST_BASE_L/UTP_TASK_REQ_LIST_BASE_H regs are
written to and then completed with an mb().
mb() ensures that the write completes, but completion doesn't mean that it
isn't stored in a buffer somewhere. The recommendation for ensuring these
bits have taken effect on the device is to perform a read back to force it
to make it all the way to the device. This is documented in device-io.rst
and a talk by Will Deacon on this can be seen over here:
https://youtu.be/i6DayghhA8Q?si=MiyxB5cKJXSaoc01&t=1678
Let's do that to ensure the bits hit the device. Because the mb()'s purpose
wasn't to add extra ordering (on top of the ordering guaranteed by
writel()/readl()), it can safely be removed.
Bug: 254441685
Fixes: 88441a8d35 ("scsi: ufs: core: Add hibernation callbacks")
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Can Guo <quic_cang@quicinc.com>
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20240329-ufs-reset-ensure-effect-before-delay-v5-7-181252004586@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 408e28086f1c7a6423efc79926a43d7001902fae)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Iadf1f45aebea54393365a5387d35440aa5df3e00
Commit c33c794828 ("mm: ptep_get() conversion") converted all (non-arch)
call sites to use ptep_get() instead of doing a direct dereference of the
pte. Full rationale can be found in that commit's log.
Since then, UFFDIO_MOVE has been implemented which does 7 direct pte
dereferences. Let's fix those up to use ptep_get().
I've asserted in the past that there is no reliable automated mechanism to
catch these; I'm relying on a combination of Coccinelle (which throws up a
lot of false positives) and some compiler magic to force a compiler error
on dereference. But given the frequency with which new issues are coming
up, I'll add it to my todo list to try to find an automated solution.
Bug: 254441685
Link: https://lkml.kernel.org/r/20240123141755.3836179-1-ryan.roberts@arm.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 56ae10cf628b02279980d17439c6241a643959c2)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I0573208f1c7ad5085e0f7e82d484199074a5f6ef
BUG: kernel NULL pointer dereference, address: 0000000000000014
RIP: 0010:f2fs_submit_page_write+0x6cf/0x780 [f2fs]
Call Trace:
<TASK>
? show_regs+0x6e/0x80
? __die+0x29/0x70
? page_fault_oops+0x154/0x4a0
? prb_read_valid+0x20/0x30
? __irq_work_queue_local+0x39/0xd0
? irq_work_queue+0x36/0x70
? do_user_addr_fault+0x314/0x6c0
? exc_page_fault+0x7d/0x190
? asm_exc_page_fault+0x2b/0x30
? f2fs_submit_page_write+0x6cf/0x780 [f2fs]
? f2fs_submit_page_write+0x736/0x780 [f2fs]
do_write_page+0x50/0x170 [f2fs]
f2fs_outplace_write_data+0x61/0xb0 [f2fs]
f2fs_do_write_data_page+0x3f8/0x660 [f2fs]
f2fs_write_single_data_page+0x5bb/0x7a0 [f2fs]
f2fs_write_cache_pages+0x3da/0xbe0 [f2fs]
...
It is possible that other threads have added this fio to io->bio
and submitted the io->bio before entering f2fs_submit_page_write().
At this point io->bio = NULL.
If is_end_zone_blkaddr(sbi, fio->new_blkaddr) of this fio is true,
then an NULL pointer dereference error occurs at bio_get(io->bio).
The original code for determining zone end was after "out:",
which would have missed some fio who is zone end. I've moved
this code before "skip:" to make sure it's done for each fio.
Bug: 254441685
Fixes: e067dc3c6b ("f2fs: maintain six open zones for zoned devices")
Signed-off-by: Wenjie Qi <qwjhust@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit c2034ef6192a65a986a45c2aa2ed05824fdc0e9f)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I98714afbda136a8932eaa417fff129bd4db61b26
It needs to check last zone_pending_bio and wait IO completion before
traverse next fio in io->io_list, otherwise, bio in next zone may be
submitted before all IO completion in current zone.
Bug: 254441685
Fixes: e067dc3c6b ("f2fs: maintain six open zones for zoned devices")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 536af8211586af09c5bea1c15ad28ddec5f66a97)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ifefc2c6fbda3aae00cef3650cafe48b496da65ca
When the bootloader/firmware doesn't setup the framebuffers, their
address and size are 0 in "iommu-addresses" property. If IOVA region is
reserved with 0 length, then it ends up corrupting the IOVA rbtree with
an entry which has pfn_hi < pfn_lo.
If we intend to use display driver in kernel without framebuffer then
it's causing the display IOMMU mappings to fail as entire valid IOVA
space is reserved when address and length are passed as 0.
An ideal solution would be firmware removing the "iommu-addresses"
property and corresponding "memory-region" if display is not present.
But the kernel should be able to handle this by checking for size of
IOVA region and skipping the IOVA reservation if size is 0. Also, add
a warning if firmware is requesting 0-length IOVA region reservation.
Bug: 254441685
Fixes: a5bf3cfce8 ("iommu: Implement of_iommu_get_resv_regions()")
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20231205065656.9544-1-amhetre@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit bb57f6705960bebeb832142ce9abf43220c3eab1)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I2484452356e69ecafc006246313d5c92fe9f1509
The variable phys is defined as (struct resource *) which aligns with
the printk format specifier %pr. Taking the address of it results in a
value of type (struct resource **) which is incompatible with the format
specifier %pr. Therefore, remove the address of operator (&).
Bug: 254441685
Fixes: a5bf3cfce8 ("iommu: Implement of_iommu_get_resv_regions()")
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20231108062226.928985-1-danielmentz@google.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit c2183b3dcc9dd41b768569ea88bededa58cceebb)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I36e8878d09f8cb3ff12280fbb535bfb8df682055
Check if the device is marked as DMA coherent in the DT and if so,
map its reserved memory as cacheable in the IOMMU.
This fixes the recently added IOMMU reserved memory support which
uses IOMMU_RESV_DIRECT without properly building the PROT for the
mapping.
Bug: 254441685
Fixes: a5bf3cfce8 ("iommu: Implement of_iommu_get_resv_regions()")
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20230926152600.8749-1-laurentiu.tudor@nxp.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit f1aad9df93f39267e890836a28d22511f23474e1)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ic6738822da32e134af9dd3ed5737e611dffe85c9
A previous commit added a trylock for getting the SQPOLL thread info via
fdinfo, but this introduced a regression where we often fail to get it if
the thread is busy. For that case, we end up not printing the current CPU
and PID info.
Rather than rely on this lock, just print the pid we already stored in
the io_sq_data struct, and ensure we update the current CPU every time
we've slept or potentially rescheduled. The latter won't potentially be
100% accurate, but that wasn't the case before either as the task can
get migrated at any time unless it has been pinned at creation time.
We retain keeping the io_sq_data dereference inside the ctx->uring_lock,
as it has always been, as destruction of the thread and data happen below
that. We could make this RCU safe, but there's little point in doing that.
With this, we always print the last valid information we had, rather than
have spurious outputs with missing information.
Bug: 254441685
Fixes: 7644b1a1c9 ("io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit a0d45c3f596be53c1bd8822a1984532d14fdcea9)
[Lee: PF_NO_SETAFFINITY has been removed upstream causing surrounding diff to change]
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Iab4aeffb443ed5140b33dd7274fee57c36f40f9e
The warning here shouldn't be done before we even set the
bss field (or should've used the input data). Move the
assignment before the warning to fix it.
We noticed this now because of Wen's bugfix, where the bug
fixed there had previously hidden this other bug.
Bug: 254441685
Fixes: 53ad07e982 ("wifi: cfg80211: support reporting failed links")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
(cherry picked from commit c434b2be2d)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ibc3b59e9c61b74a9dbc25ddb2ee75f220bd0f5ae
expand_upwards() and expand_downwards() will return -EFAULT if VM_GROWSUP
or VM_GROWSDOWN is not correctly set in vma->vm_flags, however in
!CONFIG_STACK_GROWSUP case, expand_stack_locked() returns -EINVAL first if
!(vma->vm_flags & VM_GROWSDOWN) before calling expand_downwards(), to keep
the consistency with CONFIG_STACK_GROWSUP case, remove this check.
The usages of this function are as below:
A:fs/exec.c
ret = expand_stack_locked(vma, stack_base);
if (ret)
ret = -EFAULT;
or
B:mm/memory.c mm/mmap.c
if (expand_stack_locked(vma, addr))
return NULL;
which means the return value will not propagate to other places, so I
believe there is no user-visible effects of this change, and it's
unnecessary to backport to earlier versions.
Bug: 254441685
Link: https://lkml.kernel.org/r/20230906103312.645712-1-xiujianfeng@huaweicloud.com
Fixes: f440fa1ac9 ("mm: make find_extend_vma() fail if write lock not held")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 7fa38d0ea00ffe2cd3c95c96c85221b8ae221875)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ibb98909a9f9f07b64e339a55cb583b357f9d964b
During system resume, sd_start_stop_device() submits a START STOP UNIT
command to the SCSI device that is being resumed. That command is not
retried in case of a unit attention and hence may fail. An example:
[16575.983359] sd 0:0:0:3: [sdd] Starting disk
[16575.983693] sd 0:0:0:3: [sdd] Start/Stop Unit failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
[16575.983712] sd 0:0:0:3: [sdd] Sense Key : 0x6
[16575.983730] sd 0:0:0:3: [sdd] ASC=0x29 ASCQ=0x0
[16575.983738] sd 0:0:0:3: PM: dpm_run_callback(): scsi_bus_resume+0x0/0xa0 returns -5
[16575.983783] sd 0:0:0:3: PM: failed to resume async: error -5
Make the SCSI core retry the START STOP UNIT command if a retryable
error is encountered.
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Mike Christie <michael.christie@oracle.com>
Change-Id: Ic8e0859c4455d93fcabee42f1598858571f5f3d1
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Bug: 348341595
Link: https://lore.kernel.org/linux-scsi/yq17ccp1i4b.fsf@ca-mkp.ca.oracle.com/T/#m52a26a50649b1d537cb129e5653f723509d6bde7
Signed-off-by: Bart Van Assche <bvanassche@google.com>
The reader code in rb_get_reader_page() swaps a new reader page into the
ring buffer by doing cmpxchg on old->list.prev->next to point it to the
new page. Following that, if the operation is successful,
old->list.next->prev gets updated too. This means the underlying
doubly-linked list is temporarily inconsistent, page->prev->next or
page->next->prev might not be equal back to page for some page in the
ring buffer.
The resize operation in ring_buffer_resize() can be invoked in parallel.
It calls rb_check_pages() which can detect the described inconsistency
and stop further tracing:
[ 190.271762] ------------[ cut here ]------------
[ 190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0
[ 190.271789] Modules linked in: [...]
[ 190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1
[ 190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G E 6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f
[ 190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[ 190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0
[ 190.272023] Code: [...]
[ 190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206
[ 190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80
[ 190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700
[ 190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000
[ 190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720
[ 190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000
[ 190.272053] FS: 00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000
[ 190.272057] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0
[ 190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 190.272077] Call Trace:
[ 190.272098] <TASK>
[ 190.272189] ring_buffer_resize+0x2ab/0x460
[ 190.272199] __tracing_resize_ring_buffer.part.0+0x23/0xa0
[ 190.272206] tracing_resize_ring_buffer+0x65/0x90
[ 190.272216] tracing_entries_write+0x74/0xc0
[ 190.272225] vfs_write+0xf5/0x420
[ 190.272248] ksys_write+0x67/0xe0
[ 190.272256] do_syscall_64+0x82/0x170
[ 190.272363] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 190.272373] RIP: 0033:0x7f1bd657d263
[ 190.272381] Code: [...]
[ 190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263
[ 190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001
[ 190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000
[ 190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500
[ 190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002
[ 190.272412] </TASK>
[ 190.272414] ---[ end trace 0000000000000000 ]---
Note that ring_buffer_resize() calls rb_check_pages() only if the parent
trace_buffer has recording disabled. Recent commit d78ab792705c
("tracing: Stop current tracer when resizing buffer") causes that it is
now always the case which makes it more likely to experience this issue.
The window to hit this race is nonetheless very small. To help
reproducing it, one can add a delay loop in rb_get_reader_page():
ret = rb_head_page_replace(reader, cpu_buffer->reader_page);
if (!ret)
goto spin;
for (unsigned i = 0; i < 1U << 26; i++) /* inserted delay loop */
__asm__ __volatile__ ("" : : : "memory");
rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list;
.. and then run the following commands on the target system:
echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable
while true; do
echo 16 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
echo 8 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
done &
while true; do
for i in /sys/kernel/tracing/per_cpu/*; do
timeout 0.1 cat $i/trace_pipe; sleep 0.2
done
done
To fix the problem, make sure ring_buffer_resize() doesn't invoke
rb_check_pages() concurrently with a reader operating on the same
ring_buffer_per_cpu by taking its cpu_buffer->reader_lock.
Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-3-petr.pavlu@suse.com
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 659f451ff2 ("ring-buffer: Add integrity check at end of iter read")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
[ Fixed whitespace ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Bug: 357487733
Change-Id: Ieed52c111e4f4f969a88be1221118eee6e3b4837
(cherry picked from commit c2274b908db05529980ec056359fae916939fdaa)
Signed-off-by: Youngmin Nam <youngmin.nam@samsung.com>
On a system with Perl 5.12.1, commit 5ef6dc08cfde
("lib/build_OID_registry: don't mention the full path of the script in
output") causes the build to fail with the error below.
Bareword found where operator expected at ./lib/build_OID_registry line 41, near "s#^\Q$abs_srctree/\E##r"
syntax error at ./lib/build_OID_registry line 41, near "s#^\Q$abs_srctree/\E##r"
Execution of ./lib/build_OID_registry aborted due to compilation errors.
make[3]: *** [lib/Makefile:352: lib/oid_registry_data.c] Error 255
Ahmad Fatoum analyzed that non-destructive substitution is only supported since
Perl 5.13.2. Instead of dropping `r` and having the side effect of modifying
`$0`, introduce a dedicated variable to support older Perl versions.
Link: https://lkml.kernel.org/r/20240702223512.8329-2-pmenzel@molgen.mpg.de
Link: https://lkml.kernel.org/r/20240701155802.75152-1-pmenzel@molgen.mpg.de
Fixes: 5ef6dc08cfde ("lib/build_OID_registry: don't mention the full path of the script in output")
Link: https://lore.kernel.org/all/259f7a87-2692-480e-9073-1c1c35b52f67@molgen.mpg.de/
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Suggested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 346559442
Bug: 347759457
Change-Id: I367a4d22f3c27c8912703d061360e5aa2c7a5642
(cherry picked from commit 2fe29fe945637b9834c5569fbb1c9d4f881d8263)
Signed-off-by: Giuliano Procida <gprocida@google.com>
This change strips the full path of the script generating
lib/oid_registry_data.c to just lib/build_OID_registry. The motivation
for this change is Yocto emitting a build warning
File /usr/src/debug/linux-lxatac/6.7-r0/lib/oid_registry_data.c in package linux-lxatac-src contains reference to TMPDIR [buildpaths]
So this change brings us one step closer to make the build result
reproducible independent of the build path.
Link: https://lkml.kernel.org/r/20240313211957.884561-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 346559442
Bug: 347759457
Change-Id: I09c19f26fd33ed95106174cd2f2fc7217ba99f34
(cherry picked from commit 5ef6dc08cfde240b8c748733759185646e654570)
Signed-off-by: Giuliano Procida <gprocida@google.com>
netdev_master_upper_dev_get
nla_append
These symbols are required to use WIFI BCM driver.
Following functions are added in STG file through abi_update
1 function symbol(s) added
'struct net_device* netdev_master_upper_dev_get(struct net_device*)'
Bug: 357016601
Change-Id: Iff752b54d88258f2d61fab397da1104145c9297f
Signed-off-by: Ajit Singh Raghav <ajit.raghav@samsung.com>
Not all FUSE servers have implemented canonical_path such as virtiofs.
This patch makes it so they go through the same logic as other
filesystems that don't have canonical path implemented.
Bug: 330136711
Test: ./cts-tradefed run commandAndExit cts -m CtsIncidentHostTestCases -t com.android.server.cts.ErrorsTest#testTombstone
Change-Id: I35f19bd1a12420015128ac9bc2662b9bd252a612
Signed-off-by: Richard Fung <richardfung@google.com>
(cherry picked from commit 4b0be62caf6923eac6acdb5a44eb03688e6f9dc5)
Even though FUSE supports d_canonical_path, the underlying server may
not implement the operation. In that case, follow the same logic for
filesystems that do not have the canonical path operation instead of
returning early from fsnotify_file with an error.
Bug: 326995824
Test: cts-tradefed run commandAndExit cts -m CtsOsTestCases -t android.os.cts.FileObserverTest
Change-Id: Iae618d4159222b06467b9a0bbb67fb67885aa65e
Signed-off-by: Tiffany Yang <ynaffit@google.com>
(cherry picked from commit 61d32e739d27dd35353a22804023f099b383df3b)
Race between the madvise(PAGEOUT) and migration caused by page offline
can make the swp_swapcount()->_swap_info_get emitting the "Bad swap file
entry " message because it is trying to get the info on the migration
entry. Add check if it is a migration entry.
Bug: 356032508
Fixes: aee36dd530 ("ANDROID: mm: add vendor hooks in madvise for swap entry")
Change-Id: Ia209d2a0103ac506ceae8fd71f733c1dc85443a6
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Signed-off-by: Nikhil V <quic_nprakash@quicinc.com>
The driver or external module may need handling due to hibernate failure.
This error cannot be detected outside of the hibernate() function.
so Add a vendor hook to receive and handle errors in hibernate().
Bug: 342523877
Change-Id: I221e26f571d94e5d5c5aae19937945bb8981f85b
Signed-off-by: karam.lee <karam.lee@lge.com>
Android GKI kernel modules should NOT have the ability to control the
system-wide tracing functionality, nor query to determine if it is on or
not. So remove the exports of these functions.
Upstream does not wish to do this, so an Android-only change is
required. See the bug id for details.
Bug: 355584612
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I50f69cd9930ddc6b876c5c1dd86f51cfb2ee1bac
A recent change to add hooks for firmware-based hibernation
added a reference to android_vendor_data1 which breaks
builds that don't enable vendor data.
Fixes: d2cb755a4398 ("ANDROID: vendor hooks: Add hooks to support bootloader based hibernation")
Signed-off-by: Todd Kjos <tkjos@google.com>
Change-Id: I449e968f8d0926c8573150553eed0905faeac3f3
caused by rwsem lock contention of reverse mapping during memory
recycling.
Dear Sir, We want to apply the following five trace hook:
1、trace_android_vh_do_folio_trylock()
2、trace_android_vh_get_folio_trylock_result()
3、trace_android_vh_folio_trylock_clear()
4、trace_android_vh_folio_trylock_set()
5、trace_android_vh_handle_trylock_failed_folio().
The optimization objectives are as follows: In the process of memory
recycling,reverse mapping is indispensable. The role of reverse mapping
is to find all vma or page tables mapped to this page through physical
address or page structure. Therefore,the process of reverse mapping will
inevitably bring synchronous operations. Whether it is file page or an
anonymous page,the synchronization operation is realized through the
rwsem read-write lock. Therefore,when kswapd or the user's key thread
enters direct_reclaim,the rwsem lock will cause the entire recycling
link to be blocked,resulting in low memory recycling efficiency and
affecting the user experience. On Xiaomi mobile phones,there is a
situation in which the reverse mapping is stuck in the rwsem read-write
lock during the file page recycling process,causing the entire recycling link to be blocked,so that the memory cannot be recycled in time. The display on the trace shows that there are a large number of D states,and its block function is rmap_walk_file,therefore,for non-forced
recycling scenarios,try_lock and asynchronous recycling are used to
handle related blocked pages.
Bug: 353608806
Change-Id: I3973c9ddc4e25f8b20e763a4a8aa2dd327e3139d
Signed-off-by: Marcus Ma <maminghui5@xiaomi.corp-partner.google.com>
The printed reserved memory information uses the non-standard "K"
prefix, while all other printed values use proper binary prefixes.
Fix this by using "Ki" instead.
While at it, drop the superfluous spaces inside the parentheses, to
reduce printed line length.
Bug: 353554778
Fixes: aeb9267eb6 ("of: reserved-mem: print out reserved-mem details during boot")
Change-Id: I1cf499acd46f83b43adf381d19044675983e24bc
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230216083725.1244817-1-geert+renesas@glider.be
Signed-off-by: Rob Herring <robh@kernel.org>
(cherry picked from commit 6ee7afbabc)
Signed-off-by: Danesh Petigara <danesh.petigara@broadcom.com>
Signed-off-by: Pierre Couillaud <pierre@broadcom.com>
Some drivers (e.g. gpio-mt7621 and gpio-brcmstb) have multiple
gpiochip banks within a single device. Unfortunately, the
gpio-ranges property of the device node was being applied to
every gpiochip of the device with device relative GPIO offset
values rather than gpiochip relative GPIO offset values.
This commit makes use of the gpio_chip offset value which can be
non-zero for such devices to split the device node gpio-ranges
property into GPIO offset ranges that can be applied to each
of the relevant gpiochips of the device.
Bug: 353554778
Change-Id: I868f0cd8e4dc8003547a34f3cfd57eece731622f
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20240424185039.1707812-3-opendmb@gmail.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
(cherry picked from commit e818cd3c8a345c046edff00b5ad0be4d39f7e4d4)
[danesh: Resolved minor conflict in drivers/gpio/gpiolib-of.c]
Signed-off-by: Danesh Petigara <danesh.petigara@broadcom.com>
Signed-off-by: Pierre Couillaud <pierre@broadcom.com>
In order to not freeze on corrupt data, we need to turn off
FAULT_FLAG_ALLOW_RETRY. However, this means we no longer retry on EINTR,
so an interrupted read will lead to page faults.
The fault handler does not seem to allow dynamic decisions as to whether
to turn on or off this flag.
To resolve both issues, add a flag to indicate if there are corrupt
pages in a file, and only if there are turn off this flag.
Also fsanitize changed the behavior of mlock - mlock should fail if the
page reads fail, but with fsanitize it returns 0 then page faults on
access. This broke this test, and fsanitize offers little value on test
code, so disable it.
Test: incfs_test passes
Bug: 343532239
Change-Id: Id2ced4be3310109206d65dcc92dea05c05131182
Signed-off-by: Paul Lawrence <paullawrence@google.com>
(cherry picked from commit b7bd4d088751a23209e0636e4d71502ef07b2d33)
After ecf848eb934b ("net: usb: ax88179_178a: fix link status when link is
set to down/up") to not reset from usbnet_open after the reset from
usbnet_probe at initialization stage to speed up this, some issues have
been reported.
It seems to happen that if the initialization is slower, and some time
passes between the probe operation and the open operation, the second reset
from open is necessary too to have the device working. The reason is that
if there is no activity with the phy, this is "disconnected".
In order to improve this, the solution is to detect when the phy is
"disconnected", and we can use the phy status register for this. So we will
only reset the device from reset operation in this situation, that is, only
if necessary.
The same bahavior is happening when the device is stopped (link set to
down) and later is restarted (link set to up), so if the phy keeps working
we only need to enable the mac again, but if enough time passes between the
device stop and restart, reset is necessary, and we can detect the
situation checking the phy status register too.
Bug: 339479352
cc: stable@vger.kernel.org # 6.6+
Fixes: ecf848eb934b ("net: usb: ax88179_178a: fix link status when link is set to down/up")
Reported-by: Yongqin Liu <yongqin.liu@linaro.org>
Reported-by: Antje Miederhöfer <a.miederhoefer@gmx.de>
Reported-by: Arne Fitzenreiter <arne_f@ipfire.org>
Tested-by: Yongqin Liu <yongqin.liu@linaro.org>
Tested-by: Antje Miederhöfer <a.miederhoefer@gmx.de>
Change-Id: I0b5944bd0bf353281c884b93c15243c2f395dbbf
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7be4cb7189f747b4e5b6977d0e4387bde3204e62)
Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org>
The idea was to keep only one reset at initialization stage in order to
reduce the total delay, or the reset from usbnet_probe or the reset from
usbnet_open.
I have seen that restarting from usbnet_probe is necessary to avoid doing
too complex things. But when the link is set to down/up (for example to
configure a different mac address) the link is not correctly recovered
unless a reset is commanded from usbnet_open.
So, detect the initialization stage (first call) to not reset from
usbnet_open after the reset from usbnet_probe and after this stage, always
reset from usbnet_open too (when the link needs to be rechecked).
Apply to all the possible devices, the behavior now is going to be the same.
Bug: 339479352
cc: stable@vger.kernel.org # 6.6+
Fixes: 56f78615bcb1 ("net: usb: ax88179_178a: avoid writing the mac address before first reading")
Reported-by: Isaac Ganoung <inventor500@vivaldi.net>
Reported-by: Yongqin Liu <yongqin.liu@linaro.org>
Change-Id: I6314fcc4df3be9624b91545f1cf033144e3ae9ec
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240510090846.328201-1-jtornosm@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit ecf848eb934b03959918f5269f64c0e52bc23998)
Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org>
commit 6d3c721e686ea6c59e18289b400cc95c76e927e0 upstream.
Userspace provided string 's' could trivially have the length zero. Left
unchecked this will firstly result in an OOB read in the form
`if (str[0 - 1] == '\n') followed closely by an OOB write in the form
`str[0 - 1] = '\0'`.
There is already a validating check to catch strings that are too long.
Let's supply an additional check for invalid strings that are too short.
Bug: 346754046
Signed-off-by: Lee Jones <lee@kernel.org>
Cc: stable <stable@kernel.org>
Link: https://lore.kernel.org/r/20240705074339.633717-1-lee@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d1205033e9)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Id9a34f3e5495aef0d2a800a1386210f4d9fa8116
There could be a potential use-after-free case in
tcpm_register_source_caps(). This could happen when:
* new (say invalid) source caps are advertised
* the existing source caps are unregistered
* tcpm_register_source_caps() returns with an error as
usb_power_delivery_register_capabilities() fails
This causes port->partner_source_caps to hold on to the now freed source
caps.
Reset port->partner_source_caps value to NULL after unregistering
existing source caps.
Fixes: 230ecdf71a64 ("usb: typec: tcpm: unregister existing source caps before re-registration")
Cc: stable@vger.kernel.org
Signed-off-by: Amit Sunil Dhamne <amitsd@google.com>
Reviewed-by: Ondrej Jirman <megi@xff.cz>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240514220134.2143181-1-amitsd@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 335453765
(cherry picked from commit e7e921918d905544500ca7a95889f898121ba886)
Change-Id: I655ac689dd559dfe22fdf545ef5a39ae056f2bde
Signed-off-by: Amit Sunil Dhamne <amitsd@google.com>