Export css_task_iter_start() and css_task_iter_next() and
css_task_iter_end() inorder to support task iteration in a cgroup in
vendor modules.
Bug: 336967294
Change-Id: Id93963ddd30ab02c7a4d5086f19d15310e4eda14
Signed-off-by: seanwang1 <seanwang1@lenovo.com>
[ Upstream commit 47d8ac011fe1c9251070e1bd64cb10b48193ec51 ]
Garbage collector does not take into account the risk of embryo getting
enqueued during the garbage collection. If such embryo has a peer that
carries SCM_RIGHTS, two consecutive passes of scan_children() may see a
different set of children. Leading to an incorrectly elevated inflight
count, and then a dangling pointer within the gc_inflight_list.
sockets are AF_UNIX/SOCK_STREAM
S is an unconnected socket
L is a listening in-flight socket bound to addr, not in fdtable
V's fd will be passed via sendmsg(), gets inflight count bumped
connect(S, addr) sendmsg(S, [V]); close(V) __unix_gc()
---------------- ------------------------- -----------
NS = unix_create1()
skb1 = sock_wmalloc(NS)
L = unix_find_other(addr)
unix_state_lock(L)
unix_peer(S) = NS
// V count=1 inflight=0
NS = unix_peer(S)
skb2 = sock_alloc()
skb_queue_tail(NS, skb2[V])
// V became in-flight
// V count=2 inflight=1
close(V)
// V count=1 inflight=1
// GC candidate condition met
for u in gc_inflight_list:
if (total_refs == inflight_refs)
add u to gc_candidates
// gc_candidates={L, V}
for u in gc_candidates:
scan_children(u, dec_inflight)
// embryo (skb1) was not
// reachable from L yet, so V's
// inflight remains unchanged
__skb_queue_tail(L, skb1)
unix_state_unlock(L)
for u in gc_candidates:
if (u.inflight)
scan_children(u, inc_inflight_move_tail)
// V count=1 inflight=2 (!)
If there is a GC-candidate listening socket, lock/unlock its state. This
makes GC wait until the end of any ongoing connect() to that socket. After
flipping the lock, a possibly SCM-laden embryo is already enqueued. And if
there is another embryo coming, it can not possibly carry SCM_RIGHTS. At
this point, unix_inflight() can not happen because unix_gc_lock is already
taken. Inflight graph remains unaffected.
Bug: 336226035
Fixes: 1fd05ba5a2 ("[AF_UNIX]: Rewrite garbage collector, fixes race.")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240409201047.1032217-1-mhal@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 507cc232ff)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: If321f78b8b3220f5a1caea4b5e9450f1235b0770
[ Upstream commit 97af84a6bba2ab2b9c704c08e67de3b5ea551bb2 ]
When touching unix_sk(sk)->inflight, we are always under
spin_lock(&unix_gc_lock).
Let's convert unix_sk(sk)->inflight to the normal unsigned long.
Bug: 336226035
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240123170856.41348-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 301fdbaa0b)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I0d965d5f2a863d798c06de9f21d0467f256b538e
If ufshcd_abort() returns SUCCESS for an already completed command then
that command is completed twice. This results in a crash. Prevent this by
checking whether a command has completed without completion interrupt from
the timeout handler. This CL fixes the following kernel crash:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Call trace:
dma_direct_map_sg+0x70/0x274
scsi_dma_map+0x84/0x124
ufshcd_queuecommand+0x3fc/0x880
scsi_queue_rq+0x7d0/0x111c
blk_mq_dispatch_rq_list+0x440/0xebc
blk_mq_do_dispatch_sched+0x5a4/0x6b8
__blk_mq_sched_dispatch_requests+0x150/0x220
__blk_mq_run_hw_queue+0xf0/0x218
__blk_mq_delay_run_hw_queue+0x8c/0x18c
blk_mq_run_hw_queue+0x1a4/0x360
blk_mq_sched_insert_requests+0x130/0x334
blk_mq_flush_plug_list+0x138/0x234
blk_flush_plug_list+0x118/0x164
blk_finish_plug()
read_pages+0x38c/0x408
page_cache_ra_unbounded+0x230/0x2f8
do_sync_mmap_readahead+0x1a4/0x208
filemap_fault+0x27c/0x8f4
f2fs_filemap_fault+0x28/0xfc
__do_fault+0xc4/0x208
handle_pte_fault+0x290/0xe04
do_handle_mm_fault+0x52c/0x858
do_page_fault+0x5dc/0x798
do_translation_fault+0x40/0x54
do_mem_abort+0x60/0x134
el0_da+0x40/0xb8
el0t_64_sync_handler+0xc4/0xe4
el0t_64_sync+0x1b4/0x1b8
Bug: 312786487
Bug: 326329246
Bug: 333069246
Bug: 333317508
Link: https://lore.kernel.org/linux-scsi/20240416171357.1062583-1-bvanassche@acm.org/T/#mbfa6b7a56e07c792ddca7801fb8900f8370d4731
Change-Id: I48e93516d2aae3b2ad62b0b51144e8e2e39d7476
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Unexport this function because it is not used outside the UFSHCI core
driver and because it is not possible to use this function from outside
the UFSHCI core driver without triggering a race condition.
Bug: 312786487
Bug: 326329246
Bug: 333069246
Bug: 333317508
Change-Id: I1bb504b0310c3618db94e9401ff4f7e13633d6a0
Signed-off-by: Bart Van Assche <bvanassche@google.com>
In a20b68c396127cd6387f37845c5bc05e44e2fd0e, SPF is supported for swap
fault. But in __lock_page_or_retry(), it will unlock mmap_lock
unconditionally. That will cause unpaired lock release in handling SPF.
Bug: 333508035
Change-Id: Ia1da66c85e0d58883cf518f10cd33fc5cad387b8
Signed-off-by: Oven <liyangouwen1@oppo.com>
(cherry picked from commit 63070883166ae63620a87d958319deba86f236ae)
Userspace apps often analyze memory consumption by the use of mm
rss_stat counters -- via the kmem/rss_stat trace event or from
/proc/<pid>/statm.
rss_stat counters are only updated when the PTEs are updated. What this
means is that pages can be present in the page cache from readahead but
not visible to userspace (not attributed to the app) as there is no
corresponding VMA (PTEs) for the respective page cache pages.
A side effect of the loader now extending ELF LOAD segments to be
contiguously mapped in the virtual address space, means that the VMA is
extended to cover the padding pages.
When filesystems, such as f2fs and ext4, that implement
vm_ops->map_pages() attempt to perform a do_fault_around() the extent of
the fault around is restricted by the area of the enclosing VMA. Since
the loader extends LOAD segment VMAs to be contiguously mapped, the extent
of the fault around is also increased. The result of which, is that the
PTEs corresponding to the padding pages are updated and reflected in the
rss_stat counters.
It is not common that userspace application developers be aware of this
nuance in the kernel's memory accounting. To avoid apparent regressions
in memory usage to userspace, restrict the fault around range to only
valid data pages (i.e. exclude the padding pages at the end of the VMA).
Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: I2c7a39ec1b040be2b9fb47801f95042f5dbf869d
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
When performing LOAD segment extension, the dynamic linker knows what
portion of the VMA is padding. In order for the kernel to implement
mitigations that ensure app compatibility, the extent of the padding
must be made available to the kernel.
To achieve this, reuse MADV_DONTNEED on single VMAs to hint the padding
range to the kernel. This information is then stored in vm_flag bits.
This allows userspace (dynamic linker) to set the padding pages on the
VMA without a need for new out-of-tree UAPI.
Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: I3421de32ab38ad3cb0fbce73ecbd8f7314287cde
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
The dynamic linker may extend ELF LOAD segment mappings to be contiguous
in memory when loading a 16kB compatible ELF on a 4kB page-size system.
This is done to reduce the use of unreclaimable VMA slab memory for the
otherwise necessary "gap" VMAs. The extended portion of the mapping
(VMA) can be viewed as "padding", meaning that the mapping in that range
corresponds to an area of the file that does not contain contents of the
respective segments (maybe zero's depending on how the ELF is built).
For some compatibility mitigations, the region of a VMA corresponding to
these padding sections need to be known.
In order to represent such regions without adding addtional overhead or
breaking ABI, some upper bits of vm_flags are used.
Add the VMA padding pages representation and the necessary APIs to
manipulate it.
Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: Ieb9fa98e30ec9b0bec62256624f14e3ed6062a75
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Migrating from 4kB to 16kB page-size in Android requires first making
the platform page-agnostic, which involves increasing Android-ELFs'
max-page-size (p_align) from 4kB to 16kB.
Increasing the ELF max-page-size was found to cause compatibility issues
in apps that use obfuscation or depend on the ELF segments being mapped
based on 4kB-alignment.
Working around these compatibility issues involves both kernel and
userspace (dynamic linker) changes.
Introduce a knob for userspace (dynamic linker) to determine whether the
kernel supports the mitigations needed for page-size migration compatibility.
The knob also allows for userspace to turn on or off these mitigations
by writing 1 or 0 to /sys/kernel/mm/pgsize_miration/enabled:
echo 1 > /sys/kernel/mm//pgsize_miration/enabled # Enable
echo 0 > /sys/kernel/mm//pgsize_miration/enabled # Disable
Bug: 330117029
Bug: 327600007
Bug: 330767927
Bug: 328266487
Bug: 329803029
Change-Id: I9ac1d15d397b8226b27827ecffa30502da91e10e
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
commit 0d459e2ffb541841714839e8228b845458ed3b27 upstream.
The commit mutex should not be released during the critical section
between nft_gc_seq_begin() and nft_gc_seq_end(), otherwise, async GC
worker could collect expired objects and get the released commit lock
within the same GC sequence.
nf_tables_module_autoload() temporarily releases the mutex to load
module dependencies, then it goes back to replay the transaction again.
Move it at the end of the abort phase after nft_gc_seq_end() is called.
Bug: 332996726
Cc: stable@vger.kernel.org
Fixes: 720344340f ("netfilter: nf_tables: GC transaction race with abort path")
Reported-by: Kuan-Ting Chen <hexrabbit@devco.re>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8038ee3c3e)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I637389421d8eca5ab59a41bd1a4b70432440034c
commit a45e6889575c2067d3c0212b6bc1022891e65b91 upstream.
Unlike early commit path stage which triggers a call to abort, an
explicit release of the batch is required on abort, otherwise mutex is
released and commit_list remains in place.
Add WARN_ON_ONCE to ensure commit_list is empty from the abort path
before releasing the mutex.
After this patch, commit_list is always assumed to be empty before
grabbing the mutex, therefore
03c1f1ef15 ("netfilter: Cleanup nft_net->module_list from nf_tables_exit_net()")
only needs to release the pending modules for registration.
Bug: 332996726
Cc: stable@vger.kernel.org
Fixes: c0391b6ab8 ("netfilter: nf_tables: missing validation from the abort path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b0b36dcbe0)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I38f9b05ac4eadd1d2b7b306cccaf0aeacb61b57a
commit 552705a3650bbf46a22b1adedc1b04181490fc36 upstream.
While the rhashtable set gc runs asynchronously, a race allows it to
collect elements from anonymous sets with timeouts while it is being
released from the commit path.
Mingi Cho originally reported this issue in a different path in 6.1.x
with a pipapo set with low timeouts which is not possible upstream since
7395dfacfff6 ("netfilter: nf_tables: use timestamp to check for set
element timeout").
Fix this by setting on the dead flag for anonymous sets to skip async gc
in this case.
According to 08e4c8c5919f ("netfilter: nf_tables: mark newset as dead on
transaction abort"), Florian plans to accelerate abort path by releasing
objects via workqueue, therefore, this sets on the dead flag for abort
path too.
Bug: 329205787
Cc: stable@vger.kernel.org
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Reported-by: Mingi Cho <mgcho.minic@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 406b0241d0)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I6170493c267e020c50a739150f8c421deb635b35
commit 01acb2e8666a6529697141a6017edbf206921913 upstream.
Remove netdevice from inet/ingress basechain in case NETDEV_UNREGISTER
event is reported, otherwise a stale reference to netdevice remains in
the hook list.
Bug: 332803585
Fixes: 60a3815da7 ("netfilter: add inet ingress support")
Cc: stable@vger.kernel.org
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 70f17b48c8)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I28482dca416b61dcf2e722ba0aef62d2d41a8f23
Newer DualSense firmware supports a revised classic rumble mode,
which feels more similar to rumble as supported on previous PlayStation
controllers. It has been made the default on PlayStation and non-PlayStation
devices now (e.g. iOS and Windows). Default to this new mode when
supported.
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20221010212313.78275-4-roderick.colenbrander@sony.com
Bug: 260685629
(cherry picked from commit 9fecab247e)
Change-Id: Icd330111a4d1b1e76a04cd11c623d0982ce3d66f
Signed-off-by: Farid Chahla <farid.chahla@sony.com>
(cherry picked from commit cf8edf192858c5997cae10fa2c028ee9e2a9db6b)
Signed-off-by: Lee Jones <joneslee@google.com>
brightness_set_blocking() callback expects function returning int. This fixes
the follwoing build failure:
drivers/hid/hid-playstation.c: In function ‘dualsense_player_led_set_brightness’:
drivers/hid/hid-playstation.c:885:1: error: no return statement in function returning non-void [-Werror=return-type]
}
^
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Bug: 260685629
(cherry picked from commit 3c92cb4cb6)
Change-Id: Id16b960826a26ac22c1a14572444f9af29689ed6
Signed-off-by: Farid Chahla <farid.chahla@sony.com>
(cherry picked from commit 4281e236100d7ca198bca4e0e7e74410dc3fe751)
Signed-off-by: Lee Jones <joneslee@google.com>
The DualSense player LEDs were so far not adjustable from user-space.
This patch exposes each LED individually through the LED class. Each
LED uses the new 'player' function resulting in a name like:
'inputX:white:player-1' for the first LED.
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Bug: 260685629
(cherry picked from commit 8c0ab553b0)
Change-Id: I49c699a99b0b8a7bb7980560e3ea7a12faf646aa
Signed-off-by: Farid Chahla <farid.chahla@sony.com>
(cherry picked from commit 1c2aceb8d7ca297ec5b485163361d40a93023347)
Signed-off-by: Lee Jones <joneslee@google.com>
Player LEDs are commonly found on game controllers from Nintendo and Sony
to indicate a player ID across a number of LEDs. For example, "Player 2"
might be indicated as "-x--" on a device with 4 LEDs where "x" means on.
This patch introduces LED_FUNCTION_PLAYER1-5 defines to properly indicate
player LEDs from the kernel. Until now there was no good standard, which
resulted in inconsistent behavior across xpad, hid-sony, hid-wiimote and
other drivers. Moving forward new drivers should use LED_FUNCTION_PLAYERx.
Note: management of Player IDs is left to user space, though a kernel
driver may pick a default value.
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Bug: 260685629
(cherry picked from commit 61177c088a)
Change-Id: Ie1de4d66304bb25fc2c9fcdb1ec9b7589ad9e7ac
Signed-off-by: Farid Chahla <farid.chahla@sony.com>
(cherry picked from commit 8abc9ed234b1b10e4949720e056c294dab4552d7)
Signed-off-by: Lee Jones <joneslee@google.com>
The DualSense lightbar has so far been supported, but it was not yet
adjustable from user space. This patch exposes it through a multi-color
LED.
Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Bug: 260685629
(cherry picked from commit fc97b4d6a1)
Change-Id: I48204113da804b13ad5bed2f651a5826ab5a86f7
Signed-off-by: Farid Chahla <farid.chahla@sony.com>
(cherry picked from commit 392b327fe02113aaaa332ca4cf06e4edb36f5566)
Signed-off-by: Lee Jones <joneslee@google.com>
The current implementation of the mark_victim tracepoint provides only the
process ID (pid) of the victim process. This limitation poses challenges
for userspace tools requiring real-time OOM analysis and intervention.
Although this information is available from the kernel logs, it’s not
the appropriate format to provide OOM notifications. In Android, BPF
programs are used with the mark_victim trace events to notify userspace of
an OOM kill. For consistency, update the trace event to include the same
information about the OOMed victim as the kernel logs.
- UID
In Android each installed application has a unique UID. Including
the `uid` assists in correlating OOM events with specific apps.
- Process Name (comm)
Enables identification of the affected process.
- OOM Score
Will allow userspace to get additional insight of the relative kill
priority of the OOM victim. In Android, the oom_score_adj is used to
categorize app state (foreground, background, etc.), which aids in
analyzing user-perceptible impacts of OOM events [1].
- Total VM, RSS Stats, and pgtables
Amount of memory used by the victim that will, potentially, be freed up
by killing it.
[1] 246dc8fc95:frameworks/base/services/core/java/com/android/server/am/ProcessList.java;l=188-283
Signed-off-by: Carlos Galo <carlosgalo@google.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 331214192
(cherry picked from commit 72ba14deb40a9e9668ec5e66a341ed657e5215c2)
Link: https://lore.kernel.org/all/20240223173258.174828-1-carlosgalo@google.com/
Change-Id: I24f503ceca04b83f8abf42fcd04a3409e17be6b5
This reverts commit b9e9a2c009.
Reason for revert: b/331214192
Signed-off-by: Carlos Galo <carlosgalo@google.com>
Change-Id: I5895d3b8a0577f7aa67a8fbab81991ced49f8eab
[ Upstream commit b0e256f3dd2ba6532f37c5c22e07cb07a36031ee ]
Clone already always provides a current view of the lookup table, use it
to destroy the set, otherwise it is possible to destroy elements twice.
This fix requires:
212ed75dc5 ("netfilter: nf_tables: integrate pipapo into commit protocol")
which came after:
9827a0e6e2 ("netfilter: nft_set_pipapo: release elements in clone from abort path").
Bug: 330876672
Fixes: 9827a0e6e2 ("netfilter: nft_set_pipapo: release elements in clone from abort path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ff90050771)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I8c0811e69f82681c7fcfdca1111f1702e27bb80e
Commit 6d98eb95b4 ("binder: avoid potential data leakage when copying
txn") introduced changes to how binder objects are copied. In doing so,
it unintentionally removed an offset alignment check done through calls
to binder_alloc_copy_from_buffer() -> check_buffer().
These calls were replaced in binder_get_object() with copy_from_user(),
so now an explicit offset alignment check is needed here. This avoids
later complications when unwinding the objects gets harder.
It is worth noting this check existed prior to commit 7a67a39320
("binder: add function to copy binder object from buffer"), likely
removed due to redundancy at the time.
Fixes: 6d98eb95b4 ("binder: avoid potential data leakage when copying txn")
Cc: <stable@vger.kernel.org>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Bug: 320661088
Link: https://lore.kernel.org/all/20240330190115.1877819-1-cmllamas@google.com/
Change-Id: Iaddabaa28de7ba7b7d35dbb639d38ca79dbc5077
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Update the pixel_watch symbol list.
Bug: 330275264
Change-Id: I843394f80d93a3f3d1a33846d1af4f189803b829
Signed-off-by: Ben Fennema <fennema@google.com>
Wnen coalescing a table into a block, the break-before-make sequence
must invalidate the whole range of addresses translated by the entry in
order to avoid the possibility of a TLB conflict.
Fix the coalescing post-table walker so that the whole range of the old
table is invalidated, rather than just the first address, since a
refcount of 1 on the child page is not sufficient to ensure the absence
of any valid mappings.
Cc: Sebastian Ene <sebastianene@google.com>
Reported-by: Mostafa Saleh <smostafa@google.com>
Fixes: 6b38102053 ("ANDROID: KVM: arm64: Coalesce host stage2 entries on ownership reclaim")
Bug: 331232642
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: I4c94f552e4385599ad88b1be50b69ffbafa64a9b
commit e26d3009ef upstream.
Never used from userspace, disallow these parameters.
Bug: 329205828
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b7be6c737a)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I3d8358a6dee3246e3ac56697dbb2be8fdc5f716f
Code patching for the dynamically enabled shadow call stack comes down
to finding PACIASP and AUTIASP instructions -which behave as NOPs on
cores that do not implement pointer authentication- and converting them
into shadow call stack pushes and pops, respectively.
Due to past bad experiences with the highly complex and overengineered
DWARF standard that describes the unwind metadata that we are using to
locate these instructions, let's make this patching logic a little bit
more robust so that any issues with the unwind metadata detected at boot
time can de dealt with gracefully.
The DWARF annotations that are used for this are emitted at function
granularity, and due to the fact that the instructions we are patching
will simply behave as NOPs if left unpatched, we can abort on errors as
long as we don't leave any functions in a half-patched state.
So do a dry run of each FDE frame (covering a single function) before
performing the actual patching, and give up if the DWARF metadata cannot
be understood.
Change-Id: Iea167b37a4d84e2b444189c7af939cf58d6dc9cf
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20221213142849.1629026-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 54c968bec3)
Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Check if the mmap_lock is contended when looping over the pages that
are requested to be filled. When it is observed, we rely on the already
existing mechanism to return bytes copied/filled and -EAGAIN as error.
This helps by avoiding contention of mmap_lock for long running
userfaultfd operations. The userspace can perform other tasks before
retrying the operation for the remaining pages.
Bug: 320478828
Change-Id: I6d485fd03c96a826956ee3962e58058be3cf81c1
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
In case mmap_lock is contended, it is possible that userspace can spend
time performing other tasks rather than waiting in uninterruptible-sleep
state for the lock to become available. Even if no other task is
available, it is better to yield or sleep rather than adding contention
to already contended lock.
We introduce MMAP_TRYLOCK mode so that when possible, userspace can
request to use mmap_read_trylock(), returning -EAGAIN if and when it
fails.
Bug: 320478828
Change-Id: I2d196fd317e054af03dbd35ac1b0c7634cb370dc
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
When [1] was cherry-picked from 5.10 into 5.15 kernel, it modified
the variable used to store isolate_migratepages_block() return value
like it was done in 5.10. However in 5.15 the variable used to store
the return value is different. As a result, failure to isolate a block
is not reported back to the caller. Fix by restoring the original
code and using the right variable to store the return value.
[1] ANDROID: mm: do not allow file-backed pages from CMA
Bug: 326556976
Change-Id: I06900eb43de356584ff63acfe6e994f11610b494
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
This reverts commit d217ccf7c8.
Debug drivers should not be included in the GKI kernel configuration.
Hence this revert.
Bug: 326456248
Change-Id: I18db9d07ad49b22f09b6b3414d39e6ed0a728d73
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Since the kernel now has dynamic Shadow Call Stack (SCS) enabled, on
CPUs that don't support Pointer Authentication Codes (PAC) the kernel
runtime-patches paciasp and autiasp instructions into instructions that
push and pop from the shadow call stack. This includes instructions in
loaded modules. This broke the fips140 integrity check which needs to
know how to undo all text changes made by the module loader in order to
re-create the original text.
Fix this by updating fips140.ko to undo the dynamic SCS patching.
Bug: 188620248
Change-Id: I992bcd6c34b3340c6489b40a125715e1304cb445
Signed-off-by: Eric Biggers <ebiggers@google.com>