Lockdep uses lock class keys in its analysis. init_rwsem() instantiates
one lock class key with each init_rwsem() user as follows:
#define init_rwsem(sem) \
do { \
static struct lock_class_key __key; \
\
__init_rwsem((sem), #sem, &__key); \
} while (0)
Commit e4544b63a7 ("f2fs: move f2fs to use reader-unfair rwsems") reduced
the number of lock class keys from one per init_rwsem() user to one per
file in which init_f2fs_rwsem() is used. This causes the same lock class key
to be associated with multiple f2fs rwsems and also triggers a number of
false positive lockdep deadlock reports. Fix this by again instantiating one
lock class key with each init_f2fs_rwsem() caller.
Bug: 223346410
Cc: Tim Murray <timmurray@google.com>
Reported-by: syzbot+0b9cadf5fc45a98a5083@syzkaller.appspotmail.com
Fixes: e4544b63a7 ("f2fs: move f2fs to use reader-unfair rwsems")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
(cherry picked from commit c7f91bd410
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Change-Id: I7ece9c98db2e84e54427a2c8155043cb15223ba3
In mac80211_hwsim, the probe_req frame is created and sent while
scanning. It is sent with ieee80211_tx_info which is not initialized.
Uninitialized ieee80211_tx_info can cause problems when using
mac80211_hwsim with wmediumd. wmediumd checks the tx_rates field of
ieee80211_tx_info and doesn't relay probe_req frame to other clients
even if it is a broadcasting message.
Call ieee80211_tx_prepare_skb() to initialize ieee80211_tx_info for
the probe_req that is created by hw_scan_work in mac80211_hwsim.
Signed-off-by: JaeMan Park <jaeman@google.com>
Link: https://lore.kernel.org/r/20220113060235.546107-1-jaeman@google.com
[fix memory leak]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
(cherry picked from commit cacfddf82b)
Bug: 211353765
Change-Id: I7ffd0219b093ab18baf29aa0f9671d78ebd2f278
Signed-off-by: JaeMan Park <jaeman@google.com>
Signed-off-by: Alistair Delva <adelva@google.com>
Commit d483eed85f ("ANDROID: GKI: set vfs-only exports into their own
namespace") moved a bunch of symbols into a vfs-only namespace to make
it possible for some external filesystem modules to be able to use them.
Unfortunately the following two symbols were already being marked used by
external modules, and moving them into a different namespace broke
existing users of these symbols:
kern_path
__sync_dirty_buffer
The ABI checking tools do not take the namespace of the symbol into
consideration when checking, as that is a Linux kernel "add-on" and not
part of the kernel symbol table information directly, which is why this
was not caught earlier.
Bug: 157965270
Bug: 210074446
Bug: 216253405
Bug: 219830266
Fixes: d483eed85f ("ANDROID: GKI: set vfs-only exports into their own namespace")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4a791edb33312da232cb088613bd4eb8f5548239
In speculative fault path, while doing page table lookup, offset
is obtained at each level and value at that offset is read and
checks are perfomed on it, later to get next level offset we read
from previous level offset again. A concurrent page table reclaimation
operation could result in change in value at this offset, and we go
ahead and access it, this would result in reading an invalid entry.
Fix this by reading from previous level offset again and comparing
before performing next level access.
Bug: 221005439
Change-Id: I66b3d24ae79c7ee5ccce4ba7a94f028f4cf3fda0
Signed-off-by: Vijayanand Jitta <quic_vjitta@quicinc.com>
xhci_reset() timeout was increased from 250ms to 10 seconds in order to
give Renesas 720201 xHC enough time to get ready in probe.
xhci_reset() is called with interrupts disabled in other places, and
waiting for 10 seconds there is not acceptable.
Add a timeout parameter to xhci_reset(), and adjust it back to 250ms
when called from xhci_stop() or xhci_shutdown() where interrupts are
disabled, and successful reset isn't that critical.
This solves issues when deactivating host mode on platforms like SM8450.
For now don't change the timeout if xHC is reset in xhci_resume().
No issues are reported for it, and we need the reset to succeed.
Locking around that reset needs to be revisited later.
Additionally change the signed integer timeout parameter in
xhci_handshake() to a u64 to match the timeout value we pass to
readl_poll_timeout_atomic()
Fixes: 22ceac1912 ("xhci: Increase reset timeout for Renesas 720201 host.")
Cc: stable@vger.kernel.org
Reported-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reported-by: Pavan Kondeti <quic_pkondeti@quicinc.com>
Tested-by: Pavan Kondeti <quic_pkondeti@quicinc.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20220303110903.1662404-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 14073ce951https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git/ usb-testing)
BUG: 218973924
Change-Id: I2299b2116701d1563d7b9567433162b35488295d
Signed-off-by: Udipto Goswami <quic_ugoswami@quicinc.com>
Provide a vendor hook to allow drain_all_pages to be skipped
during direct reclaim in some cases to avoid delays caused by
it in cases when the benefits of draining pcp lists are known
to be small.
Bug: 220811627
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: I0805241f81e0a94afcf62c98e97cff125d4061e2
Document the functionality of disable_dma32 as introduced in commit
c3c2bb34ac ("ANDROID: arm64/mm: Add command line option to make
ZONE_DMA32 empty").
Bug: 199917449
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Change-Id: I32ab2969f59fcc49e9ac49e7e6b545f816d120f9
zone_dma32_is_empty() currently lacks the proper validation to ensure
that the NUMA node ID it receives as an argument is valid. This has no
effect on kernels with CONFIG_NUMA=n as NODE_DATA() will return the
same pglist_data on these devices, but on kernels with CONFIG_NUMA=y,
this is not the case, and the node passed to NODE_DATA must be
validated.
Rather than trying to find the node containing ZONE_DMA32, replace
calls of zone_dma32_is_empty() with zone_dma32_are_empty() (which
iterates over all nodes and returns false if one of the nodes holds
DMA32 and it is non-empty).
Bug: 199917449
Fixes: c3c2bb34ac ("ANDROID: arm64/mm: Add command line option to make ZONE_DMA32 empty")
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Change-Id: I850fb9213b71a1ef29106728bfda0cc6de46fdbb
commit 54309fde1a upstream.
On reads with MMC_READ_MULTIPLE_BLOCK that fail,
the recovery handler will use MMC_READ_SINGLE_BLOCK for
each of the blocks, up to MMC_READ_SINGLE_RETRIES times each.
The logic for this is fixed to never report unsuccessful reads
as success to the block layer.
On command error with retries remaining, blk_update_request was
called with whatever value error was set last to.
In case it was last set to BLK_STS_OK (default), the read will be
reported as success, even though there was no data read from the device.
This could happen on a CRC mismatch for the response,
a card rejecting the command (e.g. again due to a CRC mismatch).
In case it was last set to BLK_STS_IOERR, the error is reported correctly,
but no retries will be attempted.
Fixes: 81196976ed ("mmc: block: Add blk-mq support")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Loehle <cloehle@hyperstone.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/bc706a6ab08c4fe2834ba0c05a804672@hyperstone.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Ie5a28313bbbf794fc62132db366f566e70ce80c8
commit 054aa8d439 upstream.
Jann Horn points out that there is another possible race wrt Unix domain
socket garbage collection, somewhat reminiscent of the one fixed in
commit cbcf01128d ("af_unix: fix garbage collect vs MSG_PEEK").
See the extended comment about the garbage collection requirements added
to unix_peek_fds() by that commit for details.
The race comes from how we can locklessly look up a file descriptor just
as it is in the process of being closed, and with the right artificial
timing (Jann added a few strategic 'mdelay(500)' calls to do that), the
Unix domain socket garbage collector could see the reference count
decrement of the close() happen before fget() took its reference to the
file and the file was attached onto a new file descriptor.
This is all (intentionally) correct on the 'struct file *' side, with
RCU lookups and lockless reference counting very much part of the
design. Getting that reference count out of order isn't a problem per
se.
But the garbage collector can get confused by seeing this situation of
having seen a file not having any remaining external references and then
seeing it being attached to an fd.
In commit cbcf01128d ("af_unix: fix garbage collect vs MSG_PEEK") the
fix was to serialize the file descriptor install with the garbage
collector by taking and releasing the unix_gc_lock.
That's not really an option here, but since this all happens when we are
in the process of looking up a file descriptor, we can instead simply
just re-check that the file hasn't been closed in the meantime, and just
re-do the lookup if we raced with a concurrent close() of the same file
descriptor.
Reported-and-tested-by: Jann Horn <jannh@google.com>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I876f1ab7997e9546b9e13f5f5b681755acd229d6
Provide a vendor hook to allow page_referenced to be skipped
during shrink_active_list to avoid heavy cpuloading caused by
it.
Bug: 220878851
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: Ie0e369f8f8739fea59a95470af20ab0e976869d1
When page allocation in direct reclaim path fails, the system will make
one attempt to shrink per-cpu page lists and free pages from high alloc
reserves. Draining per-cpu pages into buddy allocator can be a very slow
operation because it's done using workqueues and the task in direct
reclaim waits for all of them to finish before proceeding. Currently this
time is not accounted as psi memory stall.
While testing mobile devices under extreme memory pressure, when
allocations are failing during direct reclaim, we notices that psi events
which would be expected in such conditions were not triggered. After
profiling these cases it was determined that the reason for missing psi
events was that a big chunk of time spent in direct reclaim is not
accounted as memory stall, therefore psi would not reach the levels at
which an event is generated. Further investigation revealed that the bulk
of that unaccounted time was spent inside drain_all_pages call.
A typical captured case when drain_all_pages path gets activated:
__alloc_pages_slowpath took 44.644.613ns
__perform_reclaim took 751.668ns (1.7%)
drain_all_pages took 43.887.167ns (98.3%)
PSI in this case records the time spent in __perform_reclaim but ignores
drain_all_pages, IOW it misses 98.3% of the time spent in
__alloc_pages_slowpath.
Annotate __alloc_pages_direct_reclaim in its entirety so that delays from
handling page allocation failure in the direct reclaim path are accounted
as memory stall.
Link: https://lkml.kernel.org/r/20220223194812.1299646-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: Tim Murray <timmurray@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit d4f448732857375eb3dc422225a61e64f8257cb1
https://github.com/hnaz/linux-mm.git master)
Bug: 205182133
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia3a4138f8d5e8ce612bd5c371cfcc0f21e1ebc42
The reverts in commits 07630c8073 (Revert "ANDROID: incremental-fs:
fix mount_fs issue") and 5db3e72c57 (Revert "ANDROID: incremental-fs:
remove index and incomplete dir on umount") were applied out of order,
resulting in a spurious call to kfree() being left over. Remove it.
Bug: 218732047
Signed-off-by: Steve Muckle <smuckle@google.com>
Change-Id: I6ae8d8a9775981a88d28e462b64b259bca905ffb
trace_android_vh_binder_proc_transaction_entry:
We need change binder thread so that this work can be added in
proc->todo, if we found the binder thread, skip native logic.
trace_android_vh_binder_select_worklist_ilocked:
we need this because we can't change list point in ”trace_android_vh_binder_thread_read“,
otherwise, If a work has beed added in our own defined list before,
current may goto retry and loop again and again.
Bug: 219898723
Change-Id: Ifdb3429c9ddac521bc75c1d21740ee7cc4b8f143
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
This reverts commit 6f915dd2af.
This is follow up cleanup after revert of:
"Revert "ANDROID: incremental-fs: fix mount_fs issue"
Bug: 220805927
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Change-Id: I2ff42145dd586ae6ae4c76c3136e1fad14c08952
This reverts commit 93717b608d.
Test: Can now install the same apk twice, and repeated installs are
stable
Bug: 217661925
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: I86871c364c17a0d1107b3891a574b72edcf04ea2
(cherry picked from commit d107cd06f2)
Signed-off-by: Steve Muckle <smuckle@google.com>
This reverts commit cb7e10d31b.
The hook android_vh_binder_proc_transaction_finish is not used by any
vendor, so remove it to help with merge issues with future LTS releases.
If this is needed by any real user, it can easily be reverted to add it
back and then the symbol should be added to the abi list at the same
time to prevent it from being removed again later.
Bug: 203756332
Bug: 208910215
Cc: Liujie Xie <xieliujie@oppo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I19c6660c138dc88e554e62d309484d75ec24dc6c
if2fs_fill_super
-> f2fs_build_segment_manager
-> create_discard_cmd_control
-> f2fs_start_discard_thread
It invokes kthread_run to create a thread and run issue_discard_thread.
However, if f2fs_build_node_manager fails, the control flow goes to
free_nm and calls f2fs_destroy_node_manager. This function will free
sbi->nm_info. However, if issue_discard_thread accesses sbi->nm_info
after the deallocation, but before the f2fs_stop_discard_thread, it will
cause UAF(Use-after-free).
-> f2fs_destroy_segment_manager
-> destroy_discard_cmd_control
-> f2fs_stop_discard_thread
Fix this by stopping discard thread before f2fs_destroy_node_manager.
Note that, the commit d6d2b491a8 introduces the call of
f2fs_available_free_memory into issue_discard_thread.
Cc: stable@vger.kernel.org
Fixes: d6d2b491a8 ("f2fs: allow to change discard policy based on cached discard cmds")
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 5429c9dbc9)
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: If121b453455b11b2aded8ba8a3899faad431dbd3
With the existing logic where clear_ack is true (HW doesn.t support
auto clear for ICR), interrupt clear register reset is not handled
properly. Due to this only the first interrupts get processed properly
and further interrupts are blocked due to not resetting interrupt
clear register.
Example for issue case where Invert_ack is false and clear_ack is true:
Say Default ISR=0x00 & ICR=0x00 and ISR is triggered with 2
interrupts making ISR = 0x11.
Step 1: Say ISR is set 0x11 (store status_buff = ISR). ISR needs to
be cleared with the help of ICR once the Interrupt is processed.
Step 2: Write ICR = 0x11 (status_buff), this will clear the ISR to 0x00.
Step 3: Issue - In the existing code, ICR is written with ICR =
~(status_buff) i.e ICR = 0xEE -> This will block all the interrupts
from raising except for interrupts 0 and 4. So expectation here is to
reset ICR, which will unblock all the interrupts.
if (chip->clear_ack) {
if (chip->ack_invert && !ret)
........
else if (!ret)
ret = regmap_write(map, reg,
~data->status_buf[i]);
So writing 0 and 0xff (when ack_invert is true) should have no effect, other
than clearing the ACKs just set.
Bug: 216238044
Fixes: 3a6f0fb7b8 ("regmap: irq: Add support to clear ack registers")
Change-Id: I42a884f214b3eacd9d9828078ff1a34a5f21a82f
Signed-off-by: Prasad Kumpatla <quic_pkumpatl@quicinc.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20220217085007.30218-1-quic_pkumpatl@quicinc.com
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit d04ad245d6
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap.git for-5.17)
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
Syzbot found a GPF in reweight_entity. This has been bisected to
commit 4ef0c5c6b5 ("kernel/sched: Fix sched_fork() access an invalid
sched_task_group")
There is a race between sched_post_fork() and setpriority(PRIO_PGRP)
within a thread group that causes a null-ptr-deref in
reweight_entity() in CFS. The scenario is that the main process spawns
number of new threads, which then call setpriority(PRIO_PGRP, 0, -20),
wait, and exit. For each of the new threads the copy_process() gets
invoked, which adds the new task_struct and calls sched_post_fork()
for it.
In the above scenario there is a possibility that
setpriority(PRIO_PGRP) and set_one_prio() will be called for a thread
in the group that is just being created by copy_process(), and for
which the sched_post_fork() has not been executed yet. This will
trigger a null pointer dereference in reweight_entity(), as it will
try to access the run queue pointer, which hasn't been set.
Before the mentioned change the cfs_rq pointer for the task has been
set in sched_fork(), which is called much earlier in copy_process(),
before the new task is added to the thread_group. Now it is done in
the sched_post_fork(), which is called after that. To fix the issue
the remove the update_load param from the update_load param() function
and call reweight_task() only if the task flag doesn't have the
TASK_NEW flag set.
Change-Id: I5324ce174190919cec268c281fb92dfeee830b00
Fixes: 4ef0c5c6b5 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: syzbot+af7a719bc92395ee41b3@syzkaller.appspotmail.com
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220203161846.1160750-1-tadeusz.struk@linaro.org
Bug: 219676849
(cherry picked from commit 13765de814)
[quic_ashayj: Resolved minor compilation failure, replaced __state to state ]
Signed-off-by: Ashay Jaiswal <quic_ashayj@quicinc.com>
Expedited RCU grace periods invoke sync_rcu_exp_select_node_cpus(), which
takes two passes over the leaf rcu_node structure's CPUs. The first
pass gathers up the current CPU and CPUs that are in dynticks idle mode.
The workqueue will report a quiescent state on their behalf later.
The second pass sends IPIs to the rest of the CPUs, but excludes the
current CPU, incorrectly assuming it has been included in the first
pass's list of CPUs.
Unfortunately the current CPU may have changed between the first and
second pass, due to the fact that the various rcu_node structures'
->lock fields have been dropped, thus momentarily enabling preemption.
This means that if the second pass's CPU was not on the first pass's
list, it will be ignored completely. There will be no IPI sent to
it, and there will be no reporting of quiescent states on its behalf.
Unfortunately, the expedited grace period will nevertheless be waiting
for that CPU to report a quiescent state, but with that CPU having no
reason to believe that such a report is needed.
The result will be an expedited grace period stall.
Fix this by no longer excluding the current CPU from consideration during
the second pass.
Bug: 216238044
Fixes: b9ad4d6ed1 ("rcu: Avoid self-IPI in sync_rcu_exp_select_node_cpus()")
Change-Id: I6b7bab7d72ddc41ca7cbb6bdc0dbe1ad5ecb4f44
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 81f6d49cce)
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
We need pointers to proc and t, the current hooks in binder_proc_transaction
are unable to use.
Bug: 208910215
Change-Id: I730964f965a015e5f5a3e237d9b3bd084b5bd0d0
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Stall the control endpoint in case provided index exceeds array size of
MAX_CONFIG_INTERFACES or when the retrieved function pointer is null.
Bug: 213172319
Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 75e5b4849b)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I78f46b6f2140394a6bc6cff9f829c0742d7ad2fc
Cleanup incremental-fs left overs on umount, otherwise incr-fs will
complain as below:
BUG: Dentry {i=47a,n=.incomplete} still in use [unmount of incremental-fs]
This requires vfs_rmdir() of the special index and incomplete dirs.
Also free options.sysfs_name in incfs_mount_fs() instead of in
incfs_free_mount_info() to make it consistent with incfs_remount_fs().
Since set_anon_super() was used in incfs_mount_fs() the incfs_kill_sb()
should use kill_anon_super() instead of generic_shutdown_super()
otherwise it will leak the pseudo dev_t that set_anon_super() allocates.
Bug: 211066171
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Change-Id: I7ea54db63513fc130e1997cbf79121015ee12405
Currently, mte_set_mem_tag_range() and mte_zero_clear_page_tags() use
DC {GVA,GZVA} unconditionally. But, they should make sure that
DCZID_EL0.DZP, which indicates whether or not use of those instructions
is prohibited, is zero when using those instructions.
Use ST{G,ZG,Z2G} instead when DCZID_EL0.DZP == 1.
Fixes: 013bb59dbb ("arm64: mte: handle tags zeroing at page allocation time")
Fixes: 3d0cca0b02 ("kasan: speed up mte_set_mem_tag_range")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Link: https://lore.kernel.org/r/20211206004736.1520989-3-reijiw@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 685e2564da)
Bug: 187129171
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: I1e7615a060d86aa743426334ec79d69b4299c7e8
For previous version, it uses 'sg_table.nent's to traverse sg_table in pages
free flow.
However, 'sg_table.nents' is reassigned in 'dma_map_sg', it means the number of
created entries in the DMA adderess space.
So, use 'sg_table.nents' in pages free flow will case some pages can't be freed.
Here we should use sg_table.orig_nents to free pages memory, but use the
sgtable helper 'for each_sgtable_sg'(, instead of the previous rather common
helper 'for_each_sg' which maybe cause memory leak) is much better.
Fixes: d963ab0f15 ("dma-buf: system_heap: Allocate higher order pages if available")
Signed-off-by: Guangming <Guangming.Cao@mediatek.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Cc: <stable@vger.kernel.org> # 5.11.*
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211126074904.88388-1-guangming.cao@mediatek.com
(cherry picked from commit 679d94cd7d)
Bug: 187129171
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: I90e1b7e9ea5d1fbabc86a0c1eddbdd2d8887e086
As Vincent reports in:
https://lore.kernel.org/r/20211118163417.21617-1-vincent.whitchurch@axis.com
The put_user() in schedule_tail() can get stuck in a livelock, similar
to a problem recently fixed on riscv in commit:
285a76bb2c ("riscv: evaluate put_user() arg before enabling user access")
In __raw_put_user() we have a critical section between
uaccess_ttbr0_enable() and uaccess_ttbr0_disable() where we cannot
safely call into the scheduler without having taken an exception, as
schedule() and other scheduling functions will not save/restore the
TTBR0 state. If either of the `x` or `ptr` arguments to __raw_put_user()
contain a blocking call, we may call into the scheduler within the
critical section. This can result in two problems:
1) The access within the critical section will occur without the
required TTBR0 tables installed. This will fault, and where the
required tables permit access, the access will be retried without the
required tables, resulting in a livelock.
2) When TTBR0 SW PAN is in use, check_and_switch_context() does not
modify TTBR0, leaving a stale value installed. The mappings of the
blocked task will erroneously be accessible to regular accesses in
the context of the new task. Additionally, if the tables are
subsequently freed, local TLB maintenance required to reuse the ASID
may be lost, potentially resulting in TLB corruption (e.g. in the
presence of CnP).
The same issue exists for __raw_get_user() in the critical section
between uaccess_ttbr0_enable() and uaccess_ttbr0_disable().
A similar issue exists for __get_kernel_nofault() and
__put_kernel_nofault() for the critical section between
__uaccess_enable_tco_async() and __uaccess_disable_tco_async(), as the
TCO state is not context-switched by direct calls into the scheduler.
Here the TCO state may be lost from the context of the current task,
resulting in unexpected asynchronous tag check faults. It may also be
leaked to another task, suppressing expected tag check faults.
To fix all of these cases, we must ensure that we do not directly call
into the scheduler in their respective critical sections. This patch
reworks __raw_put_user(), __raw_get_user(), __get_kernel_nofault(), and
__put_kernel_nofault(), ensuring that parameters are evaluated outside
of the critical sections. To make this requirement clear, comments are
added describing the problem, and line spaces added to separate the
critical sections from other portions of the macros.
For __raw_get_user() and __raw_put_user() the `err` parameter is
conditionally assigned to, and we must currently evaluate this in the
critical section. This behaviour is relied upon by the signal code,
which uses chains of put_user_error() and get_user_error(), checking the
return value at the end. In all cases, the `err` parameter is a plain
int rather than a more complex expression with a blocking call, so this
is safe.
In future we should try to clean up the `err` usage to remove the
potential for this to be a problem.
Aside from the changes to time of evaluation, there should be no
functional change as a result of this patch.
Reported-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Link: https://lore.kernel.org/r/20211118163417.21617-1-vincent.whitchurch@axis.com
Fixes: f253d827f3 ("arm64: uaccess: refactor __{get,put}_user")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20211122125820.55286-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 94902d849e)
[connoro: adjust __raw_{get,put}_user comments to reflect 5.10 code]
Bug: 187129171
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: I1c7f95b34b3bc53179ddaa7cb1fb38cfd9cc2e77
The id argument of ARM64_FTR_REG_OVERRIDE() is used for two purposes:
one as the system register encoding (used for the sys_id field of
__ftr_reg_entry), and the other as the register name (stringified
and used for the name field of arm64_ftr_reg), which is debug
information. The id argument is supposed to be a macro that
indicates an encoding of the register (eg. SYS_ID_AA64PFR0_EL1, etc).
ARM64_FTR_REG(), which also has the same id argument,
uses ARM64_FTR_REG_OVERRIDE() and passes the id to the macro.
Since the id argument is completely macro-expanded before it is
substituted into a macro body of ARM64_FTR_REG_OVERRIDE(),
the stringified id in the body of ARM64_FTR_REG_OVERRIDE is not
a human-readable register name, but a string of numeric bitwise
operations.
Fix this so that human-readable register names are available as
debug information.
Fixes: 8f266a5d87 ("arm64: cpufeature: Add global feature override facility")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211101045421.2215822-1-reijiw@google.com
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 9dc232a8ab)
Bug: 187129171
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: I53c28eabd5a7e5499cf62a17e4d4d826130c2562
While commit 097b9146c0 ("net: fix up truesize of cloned
skb in skb_prepare_for_shift()") fixed immediate issues found
when KFENCE was enabled/tested, there are still similar issues,
when tcp_trim_head() hits KFENCE while the master skb
is cloned.
This happens under heavy networking TX workloads,
when the TX completion might be delayed after incoming ACK.
This patch fixes the WARNING in sk_stream_kill_queues
when sk->sk_mem_queued/sk->sk_forward_alloc are not zero.
Fixes: d3fb45f370 ("mm, kfence: insert KFENCE hooks for SLAB")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20211102004555.1359210-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit c4777efa75)
Bug: 187129171
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: I5e456705bd01396c05c79009aeba36e00829e037
In RHEL's gating selftests we've encountered memory corruption in the
uffd event test even with upstream kernel:
# ./userfaultfd anon 128 4
nr_pages: 32768, nr_pages_per_cpu: 32768
bounces: 3, mode: rnd racing read, userfaults: 6240 missing (6240) 14729 wp (14729)
bounces: 2, mode: racing read, userfaults: 1444 missing (1444) 28877 wp (28877)
bounces: 1, mode: rnd read, userfaults: 6055 missing (6055) 14699 wp (14699)
bounces: 0, mode: read, userfaults: 82 missing (82) 25196 wp (25196)
testing uffd-wp with pagemap (pgsize=4096): done
testing uffd-wp with pagemap (pgsize=2097152): done
testing events (fork, remap, remove): ERROR: nr 32427 memory corruption 0 1 (errno=0, line=963)
ERROR: faulting process failed (errno=0, line=1117)
It can be easily reproduced when global thp enabled, which is the
default for RHEL.
It's also known as a side effect of commit 0db282ba2c ("selftest: use
mmap instead of posix_memalign to allocate memory", 2021-07-23), which
is imho right itself on using mmap() to make sure the addresses will be
untagged even on arm.
The problem is, for each test we allocate buffers using two
allocate_area() calls. We assumed these two buffers won't affect each
other, however they could, because mmap() could have found that the two
buffers are near each other and having the same VMA flags, so they got
merged into one VMA.
It won't be a big problem if thp is not enabled, but when thp is
agressively enabled it means when initializing the src buffer it could
accidentally setup part of the dest buffer too when there's a shared THP
that overlaps the two regions. Then some of the dest buffer won't be
able to be trapped by userfaultfd missing mode, then it'll cause memory
corruption as described.
To fix it, do release_pages() after initializing the src buffer.
Since the previous two release_pages() calls are after
uffd_test_ctx_clear() which will unmap all the buffers anyway (which is
stronger than release pages; as unmap() also tear town pgtables), drop
them as they shouldn't really be anything useful.
We can mark the Fixes tag upon 0db282ba2c as it's reported to only
happen there, however the real "Fixes" IMHO should be 8ba6e86408, as
before that commit we'll always do explicit release_pages() before
registration of uffd, and 8ba6e86408 changed that logic by adding
extra unmap/map and we didn't release the pages at the right place.
Meanwhile I don't have a solid glue anyway on whether posix_memalign()
could always avoid triggering this bug, hence it's safer to attach this
fix to commit 8ba6e86408.
Link: https://lkml.kernel.org/r/20210923232512.210092-1-peterx@redhat.com
Fixes: 8ba6e86408 ("userfaultfd/selftests: reinitialize test context in each test")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1994931
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Li Wang <liwan@redhat.com>
Tested-by: Li Wang <liwang@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8913970c19)
Bug: 187129171
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: I5f5c06c9e3be4a6e521415edb785ddcc0be81b21
The KVM page-table library refcounts the pages of concatenated stage-2
PGDs individually. However, when running KVM in protected mode, the
host's stage-2 PGD is currently managed by EL2 as a single high-order
compound page, which can cause the refcount of the tail pages to reach 0
when they shouldn't, hence corrupting the page-table.
Fix this by introducing a new hyp_split_page() helper in the EL2 page
allocator (matching the kernel's split_page() function), and make use of
it from host_s2_zalloc_pages_exact().
Fixes: 1025c8c0c6 ("KVM: arm64: Wrap the host with a stage 2")
Acked-by: Will Deacon <will@kernel.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005090155.734578-5-qperret@google.com
(cherry picked from commit 1d58a17ef5)
Bug: 187129171
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: Idf0d8328710517aee822f77a2097295734c78e7a
tx-fifo-resize is now added by default by the dwc3-qcom driver
to the SNPS DWC3 child node.
So, lets drop the tx-fifo-resize property from dwc3-qcom nodes
as having it there will cause the dwc3-qcom driver to error and
abort probe with:
[ 1.362938] dwc3-qcom 8af8800.usb: unable to add property
[ 1.368405] dwc3-qcom 8af8800.usb: failed to register DWC3 Core, err=-17
Fixes: cefdd52fa0 ("usb: dwc3: dwc3-qcom: Enable tx-fifo-resize property by default")
Signed-off-by: Robert Marko <robimarko@gmail.com>
Link: https://lore.kernel.org/r/20210902220325.1783567-1-robimarko@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit da546d6b74)
Bug: 187129171
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: I0ffb00d40b43772af3c226141183ca47325e748f
This reverts commit cefdd52fa0.
On sc7180-trogdor class devices with 'fw_devlink=permissive' and KASAN
enabled, you'll see a Use-After-Free reported at bootup.
The root of the problem is that dwc3_qcom_of_register_core() is adding
a devm-allocated "tx-fifo-resize" property to its device tree node
using of_add_property().
The issue is that of_add_property() makes a _permanent_ addition to
the device tree that lasts until reboot. That means allocating memory
for the property using "devm" managed memory is a terrible idea since
that memory will be freed upon probe deferral or device unbinding.
Let's revert the patch since the system is still functional without
it. The fact that of_add_property() makes a permanent change is extra
fodder for those folks who were aruging that the device tree isn't
really the right way to pass information between parts of the
driver. It is an exercise left to the reader to submit a patch
re-adding the new feature in a way that makes everyone happier.
Fixes: cefdd52fa0 ("usb: dwc3: dwc3-qcom: Enable tx-fifo-resize property by default")
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20211207094327.1.Ie3cde3443039342e2963262a4c3ac36dc2c08b30@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6a97cee39d)
Bug: 187129171
Signed-off-by: Connor O'Brien <connoro@google.com>
Change-Id: I69a18d3518e8cc9c43e93225f7427642b42a931b