Try to mitigate potential future driver core api changes by adding a
padding to struct pci_sriov, struct pci_dev, struct pci_bus, and struct
pci_driver.
Based on a change made to the RHEL/CENTOS 8 kernel.
Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I236df60165b25a33b06fc81f76014162401ba742
There are a lot of different structures that need to have a "frozen" abi
for the next 5+ years. Add padding to a lot of them in order to be able
to handle any future changes that might be needed due to LTS and
security fixes that might come up.
It's a best guess, based on what has happened in the past from the
5.10.0..5.10.110 release (1 1/2 years). Yes, past changes do not mean
that future changes will also be needed in the same area, but that is a
hint that those areas are both well maintained and looked after, and
there have been previous problems found in them.
Also the list of structures that are being required based on OEM usage
in the android/ symbol lists were consulted as that's a larger list than
what has been changed in the past.
Hopefully we caught everything we need to worry about, only time will
tell...
Bug: 151154716
Change-Id: I880bbcda0628a7459988eeb49d18655522697664
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the
protocol field of the flow structure, build by raw_sendmsg() /
rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy
lookup when some policies are defined with a protocol in the selector.
For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to
specify the protocol. Just accept all values for IPPROTO_RAW socket.
For ipv4, the sin_port field of 'struct sockaddr_in' could not be used
without breaking backward compatibility (the value of this field was never
checked). Let's add a new kind of control message, so that the userland
could specify which protocol is used.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
CC: stable@vger.kernel.org
Change-Id: I168bca9e37561b255268414e2fdfac18cd08381c
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
(cherry picked from commit 3632679d9e)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Users who want to share a single public IP address for outgoing connections
between several hosts traditionally reach for SNAT. However, SNAT requires
state keeping on the node(s) performing the NAT.
A stateless alternative exists, where a single IP address used for egress
can be shared between several hosts by partitioning the available ephemeral
port range. In such a setup:
1. Each host gets assigned a disjoint range of ephemeral ports.
2. Applications open connections from the host-assigned port range.
3. Return traffic gets routed to the host based on both, the destination IP
and the destination port.
An application which wants to open an outgoing connection (connect) from a
given port range today can choose between two solutions:
1. Manually pick the source port by bind()'ing to it before connect()'ing
the socket.
This approach has a couple of downsides:
a) Search for a free port has to be implemented in the user-space. If
the chosen 4-tuple happens to be busy, the application needs to retry
from a different local port number.
Detecting if 4-tuple is busy can be either easy (TCP) or hard
(UDP). In TCP case, the application simply has to check if connect()
returned an error (EADDRNOTAVAIL). That is assuming that the local
port sharing was enabled (REUSEADDR) by all the sockets.
# Assume desired local port range is 60_000-60_511
s = socket(AF_INET, SOCK_STREAM)
s.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
s.bind(("192.0.2.1", 60_000))
s.connect(("1.1.1.1", 53))
# Fails only if 192.0.2.1:60000 -> 1.1.1.1:53 is busy
# Application must retry with another local port
In case of UDP, the network stack allows binding more than one socket
to the same 4-tuple, when local port sharing is enabled
(REUSEADDR). Hence detecting the conflict is much harder and involves
querying sock_diag and toggling the REUSEADDR flag [1].
b) For TCP, bind()-ing to a port within the ephemeral port range means
that no connecting sockets, that is those which leave it to the
network stack to find a free local port at connect() time, can use
the this port.
IOW, the bind hash bucket tb->fastreuse will be 0 or 1, and the port
will be skipped during the free port search at connect() time.
2. Isolate the app in a dedicated netns and use the use the per-netns
ip_local_port_range sysctl to adjust the ephemeral port range bounds.
The per-netns setting affects all sockets, so this approach can be used
only if:
- there is just one egress IP address, or
- the desired egress port range is the same for all egress IP addresses
used by the application.
For TCP, this approach avoids the downsides of (1). Free port search and
4-tuple conflict detection is done by the network stack:
system("sysctl -w net.ipv4.ip_local_port_range='60000 60511'")
s = socket(AF_INET, SOCK_STREAM)
s.setsockopt(SOL_IP, IP_BIND_ADDRESS_NO_PORT, 1)
s.bind(("192.0.2.1", 0))
s.connect(("1.1.1.1", 53))
# Fails if all 4-tuples 192.0.2.1:60000-60511 -> 1.1.1.1:53 are busy
For UDP this approach has limited applicability. Setting the
IP_BIND_ADDRESS_NO_PORT socket option does not result in local source
port being shared with other connected UDP sockets.
Hence relying on the network stack to find a free source port, limits the
number of outgoing UDP flows from a single IP address down to the number
of available ephemeral ports.
To put it another way, partitioning the ephemeral port range between hosts
using the existing Linux networking API is cumbersome.
To address this use case, add a new socket option at the SOL_IP level,
named IP_LOCAL_PORT_RANGE. The new option can be used to clamp down the
ephemeral port range for each socket individually.
The option can be used only to narrow down the per-netns local port
range. If the per-socket range lies outside of the per-netns range, the
latter takes precedence.
UAPI-wise, the low and high range bounds are passed to the kernel as a pair
of u16 values in host byte order packed into a u32. This avoids pointer
passing.
PORT_LO = 40_000
PORT_HI = 40_511
s = socket(AF_INET, SOCK_STREAM)
v = struct.pack("I", PORT_HI << 16 | PORT_LO)
s.setsockopt(SOL_IP, IP_LOCAL_PORT_RANGE, v)
s.bind(("127.0.0.1", 0))
s.getsockname()
# Local address between ("127.0.0.1", 40_000) and ("127.0.0.1", 40_511),
# if there is a free port. EADDRINUSE otherwise.
[1] https://github.com/cloudflare/cloudflare-blog/blob/232b432c1d57/2022-02-connectx/connectx.py#L116
Reviewed-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Change-Id: I06e1860472cd2f90bf030076be0c87b9b775a3df
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 91d0b78c51)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit: aa69c36f31 upstream.
We do consider thermal pressure in util_fits_cpu() for uclamp_min only.
With the exception of the biggest cores which by definition are the max
performance point of the system and all tasks by definition should fit.
Even under thermal pressure, the capacity of the biggest CPU is the
highest in the system and should still fit every task. Except when it
reaches capacity inversion point, then this is no longer true.
We can handle this by using the inverted capacity as capacity_orig in
util_fits_cpu(). Which not only addresses the problem above, but also
ensure uclamp_max now considers the inverted capacity. Force fitting
a task when a CPU is in this adverse state will contribute to making the
thermal throttling last longer.
Change-Id: I3c1a4758eb387c5c8e0ba68751840517fff4ae64
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-10-qais.yousef@arm.com
(cherry picked from commit aa69c36f31)
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 799c7301de)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit: 44c7b80bff upstream.
Check each performance domain to see if thermal pressure is causing its
capacity to be lower than another performance domain.
We assume that each performance domain has CPUs with the same
capacities, which is similar to an assumption made in energy_model.c
We also assume that thermal pressure impacts all CPUs in a performance
domain equally.
If there're multiple performance domains with the same capacity_orig, we
will trigger a capacity inversion if the domain is under thermal
pressure.
The new cpu_in_capacity_inversion() should help users to know when
information about capacity_orig are not reliable and can opt in to use
the inverted capacity as the 'actual' capacity_orig.
Change-Id: I23a98594e6b24697eb33d9bbfb8dd46a12bfa050
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220804143609.515789-9-qais.yousef@arm.com
(cherry picked from commit 44c7b80bff)
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fe1c982958)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
With FEAT_FGT, most bits in hfgwtr_el2 must be set to 1 to enable
trapping of MSR writes of certain registers. However, there is a
notable (and arguably curious) exception for nSMPRI_EL1 and
nTPIDR2_EL0 which must be set to 1 to _disable_ trapping of the
corresponding SME registers.
Make sure to initialize hfgwtr_el2 in the pKVM init params accordingly
to avoid accidentally enabling certain traps on hardware that supports
FEAT_FGT and FEAT_SME.
Bug: 282917063
Bug: 282993310
Change-Id: Ia96fa6856b4e7ef98b3cea4f03fcbc0ee03f10c5
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
commit 7e01c7f704 upstream.
Currently in cdc_ncm_check_tx_max(), if dwNtbOutMaxSize is lower than
the calculated "min" value, but greater than zero, the logic sets
tx_max to dwNtbOutMaxSize. This is then used to allocate a new SKB in
cdc_ncm_fill_tx_frame() where all the data is handled.
For small values of dwNtbOutMaxSize the memory allocated during
alloc_skb(dwNtbOutMaxSize, GFP_ATOMIC) will have the same size, due to
how size is aligned at alloc time:
size = SKB_DATA_ALIGN(size);
size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
Thus we hit the same bug that we tried to squash with
commit 2be6d4d16a ("net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero")
Low values of dwNtbOutMaxSize do not cause an issue presently because at
alloc_skb() time more memory (512b) is allocated than required for the
SKB headers alone (320b), leaving some space (512b - 320b = 192b)
for CDC data (172b).
However, if more elements (for example 3 x u64 = [24b]) were added to
one of the SKB header structs, say 'struct skb_shared_info',
increasing its original size (320b [320b aligned]) to something larger
(344b [384b aligned]), then suddenly the CDC data (172b) no longer
fits in the spare SKB data area (512b - 384b = 128b).
Consequently the SKB bounds checking semantics fails and panics:
skbuff: skb_over_panic: text:ffffffff831f755b len:184 put:172 head:ffff88811f1c6c00 data:ffff88811f1c6c00 tail:0xb8 end:0x80 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:113!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 57 Comm: kworker/0:2 Not tainted 5.15.106-syzkaller-00249-g19c0ed55a470 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023
Workqueue: mld mld_ifc_work
RIP: 0010:skb_panic net/core/skbuff.c:113 [inline]
RIP: 0010:skb_over_panic+0x14c/0x150 net/core/skbuff.c:118
[snip]
Call Trace:
<TASK>
skb_put+0x151/0x210 net/core/skbuff.c:2047
skb_put_zero include/linux/skbuff.h:2422 [inline]
cdc_ncm_ndp16 drivers/net/usb/cdc_ncm.c:1131 [inline]
cdc_ncm_fill_tx_frame+0x11ab/0x3da0 drivers/net/usb/cdc_ncm.c:1308
cdc_ncm_tx_fixup+0xa3/0x100
Deal with too low values of dwNtbOutMaxSize, clamp it in the range
[USB_CDC_NCM_NTB_MIN_OUT_SIZE, CDC_NCM_NTB_MAX_SIZE_TX]. We ensure
enough data space is allocated to handle CDC data by making sure
dwNtbOutMaxSize is not smaller than USB_CDC_NCM_NTB_MIN_OUT_SIZE.
Fixes: 289507d336 ("net: cdc_ncm: use sysfs for rx/tx aggregation tuning")
Cc: stable@vger.kernel.org
Reported-by: syzbot+9f575a1f15fc0c01ed69@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=b982f1059506db48409d
Link: https://lore.kernel.org/all/20211202143437.1411410-1-lee.jones@linaro.org/
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230517133808.1873695-2-tudor.ambarus@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 281604646
Bug: 281606231
Change-Id: Ic1d912e7bf2ba53620eb8293b68ec6046422e047
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
update symbol list related to meminfo for galaxy
3 variable symbol(s) added
'struct tracepoint __tracepoint_android_vh_meminfo_cache_adjust'
'struct tracepoint __tracepoint_android_vh_si_mem_available_adjust'
'struct tracepoint __tracepoint_android_vh_si_meminfo_adjust'
Bug: 283896254
Change-Id: Ia6081e553312781a37c5eae8b0f6751909fad4bf
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Add vendor hooks for extra memory. If there is extra memory, this can
be accounted like other memory stats. One of the usecases could be
cleancache. If some of ram memory is used for cleancache, its free,
cache, and total size could be added through these vendor hooks.
Bug: 283896254
Change-Id: Iad7330310528581f09842f45860f05dc84823f41
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
When the IO pressure increases or the system performs dirty
page balancing, the frame rate of the foreground application
may become unstable. Therefore, a hook point is added to limit
the buffer IO rate from the source.
Bug: 262189942
Change-Id: I5214d611a388c5e8d87dc44ffde86ead1834ddff
Signed-off-by: xiaofeng <xiaofeng5@xiaomi.com>
cleancache is enabled for only ext4 and btrfs. Since f2fs is one
of the main fs in Android, this patch enables cleancache for f2fs.
Bug: 236887352
Change-Id: Ib6e6e832234f72681058da848e9cff36c5a496f2
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 32dfefb4c8)
Android reported a performance regression in the userfaultfd unmap path.
A closer inspection on the userfaultfd_unmap_prep() change showed that a
second tree walk would be necessary in the reworked code.
Fix the regression by passing each VMA that will be unmapped through to
the userfaultfd_unmap_prep() function as they are added to the unmap list,
instead of re-walking the tree for the VMA.
Link: https://lkml.kernel.org/r/20230601015402.2819343-1-Liam.Howlett@oracle.com
Fixes: 69dbe6daf1 ("userfaultfd: use maple tree iterator to iterate VMAs")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit de53cc0be1c8b47d595682932beb3c11be9e4e5a
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
Bug: 274059236
Change-Id: Ia189a5e98ffe86c4ca5ac3b686ada5f51826f2ed
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Sometimes the user needs to revert to the previous slot, regardless of if
it is empty or not. Add an interface to go to the previous slot.
Since there can't be two consecutive NULLs in the tree, the mas_prev()
function can be implemented by calling mas_prev_slot() a maximum of 2
times. Change the underlying interface to use mas_prev_slot() to align
the code.
Link: https://lkml.kernel.org/r/20230518145544.1722059-31-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 20ea986f0b783cf38e7fbcc6887b24c28b02c189
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
Bug: 274059236
Change-Id: Ib5a677ee01f5d9b88cd9dede943d70c14f0ffb96
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Sometimes, during a tree walk, the user needs the next slot regardless of
if it is empty or not. Add an interface to get the next slot.
Since there are no consecutive NULLs allowed in the tree, the mas_next()
function can only advance two slots at most. So use the new
mas_next_slot() interface to align both implementations. Use this method
for mas_find() as well.
Link: https://lkml.kernel.org/r/20230518145544.1722059-28-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit d0e70747bdb8f9cfc9ff48474e83171587ad94c1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
Bug: 274059236
Change-Id: I9efc69267859c26aae1b776d103dff584f1961ea
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Keep a reference to the node when possible with mas_prev(). This will
avoid re-walking the tree. In keeping a reference to the node, keep the
last/index accurate to the range being referenced. This means the limit
may be within the range, but the range may extend outside of the limit.
Also fix the single entry tree to respect the range (of 0), or set the
node to MAS_NONE in the case of shifting beyond 0.
Link: https://lkml.kernel.org/r/20230518145544.1722059-25-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 20e9433710317ab0278c1d76821e213fb2d11e19
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
Bug: 274059236
Change-Id: If0b40925884dac6e334474249098d03175ba6dd6
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Calculate the number of nodes based on the pending write action instead
of assuming the worst case.
This addresses a performance regression introduced in platforms that
have longer allocation timing.
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/lkml/20230601021605.2823123-14-Liam.Howlett@oracle.com/
[surenb: adjust node_size calculation, allow to store a slot when
possible]
Bug: 274059236
Change-Id: I1db402fb463ee1e391081d2d81c34619f15713ac
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Create a new function to get the size of the mas_wr_node_size() since it
will be used elsewhere soon.
Drop the incrementing of the node size if this is the left-most node.
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Link: d781921bc8
[surenb: due to the differences with upstream kernel where the node size
can be obtained using mas_wr_new_end() function, this patch is not
applicable upstream. The patch was obtained from the author's tree]
Bug: 274059236
Change-Id: I9f0b5238294d0842b4c2717437ed7288b17c7617
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Relocate it and call mas_wr_extend_null() from within mas_wr_end_piv().
Extending the NULL may affect the end pivot value so call
mas_wr_endtend_null() from within mas_wr_end_piv() to keep it all
together.
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/lkml/20230601021605.2823123-12-Liam.Howlett@oracle.com/
[surenb: moved additional wr_mas->end_piv assignment missing in later
kernel versions]
Bug: 274059236
Change-Id: I55c5843273e7a679aef918e66d4b4ed034d493da
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Only write when necessary to the maple tree. This should only occur
when the VMA changes. In the __vma_adjust() case, it is either the vma
when it is expanded, the next vma when the boundary expands into 'vma',
writing the 'insert', or when vma expands/shrinks for shift_arg_pages().
The mas_preallocate() setup should track the intended write to ensure
the correct number of nodes are preallocated for the pending write.
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Link: 61b337f650
[surenb: __vma_adjust was removed in 6.3, therefore these fixes are
not applicable upstream anymore. The patch was obtained from the
author's tree]
Bug: 274059236
Change-Id: I69d68a5b4ff11c40985f7b03b31eec4bb24dcbb6
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
The maple tree node limits are implied by the parent. When walking up the
tree, the limit may not be known until a slot that does not have implied
limits are encountered. However, if the node is the left-most or
right-most node, the walking up to find that limit can be skipped.
This commit also fixes the debug/testing code that was not setting the
limit on walking down the tree as that optimization is not compatible with
this change.
Link: https://lkml.kernel.org/r/20230518145544.1722059-4-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 0f4e7f5fc2122534ae0573b37224ddfa367fa7ac
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
Bug: 274059236
Change-Id: I4a5e852906692b27ea598fdf38eba8e1a69355d9
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
The majority of the calls to munmap a VMA is for a single vma. The
maple tree is able to store a single entry at 0, with a size of 1 as a
pointer and avoid any allocations. Change do_vmi_align_munmap() to
store the VMAs being munmap()'ed into a tree indexed by the count. This
will leverage the ability to store the first entry without a node
allocation.
Storing the entries into a tree by the count and not the vma start and
end means changing the functions which iterate over the entries. Update
unmap_vmas() and free_pgtables() to take a maple state and a tree end
address to support this functionality.
Passing through the same maple state to unmap_vmas() and free_pgtables()
means the state needs to be reset between calls. This happens in the
static unmap_region() and exit_mmap().
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/lkml/20230601021605.2823123-5-Liam.Howlett@oracle.com/
[surenb: skip changes passing maple state to unmap_vmas() and
free_pgtables()]
Bug: 274059236
Change-Id: If38cfecd51da884bcfdbdfdfbf955a0b338d3d60
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
In preparation of passing the vma state through split, the pre-allocation
that occurs before the split has to be moved to after. Since the
preallocation would then live right next to the store, just call store
instead of preallocating. This effectively restores the potential error
path of splitting and not munmap'ing which pre-dates the maple tree.
Link: https://lkml.kernel.org/r/20230120162650.984577-12-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 0378c0a0e9)
Bug: 274059236
Change-Id: I3539fb3a08043dae1bc8aaa6c7f285711a0b5548
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Add vendor hook in smaps_pte_entry for swap entry
- android_vh_smaps_pte_entry
- android_vh_show_smap
This vendor hook is to show more information for
swap entries of a process based on the
characteristics, such as written-back, same-filled
or huge (uncompressed).
Bug: 284059805
Change-Id: Ie4a48ae42212c056992d34a10b026b60439d0012
Signed-off-by: Sooyong Suk <s.suk@samsung.com>