Needed for controlling BC1.2 detection logic specific to the chip architecture.
Also, needed to implement additional logic to make debug accessorires
specifically designed for Pixel work. These are outside the purview of Type-C spec.
OOT_bug:
Bug: 169213252
Bug: 168245874
Bug: 173252019
Bug: 271294543
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Change-Id: I36fe75dddd8cd4e2054db01ed4fee7ea08dd8702
(cherry picked from commit 08879ea0d6)
This change fixes a bug where inbound packets to nested IPsec tunnels
fails to pass policy checks due to the inner tunnel's policy checks
not having a reference to the outer policy/template. This causes the
policy check to fail, since the first entries in the secpath correlate
to the outer tunnel, while the templates being verified are for the
inner tunnel.
In order to ensure that the appropriate policy and template context is
searchable, the policy checks must be done incrementally between each
decryption step. As such, this marks secpath entries as having been
successfully matched, skipping them (treating as optional) on subsequent
policy checks
By skipping the immediate error return in the case where the secpath
entry had previously been validated, this change allows secpath entries
that matched a policy/template previously, while still requiring that
each searched template find a match in the secpath.
For security:
- All templates must have matching secpath entries
- Unchanged by current patch; templates that do not match any secpath
entry still return -1. This patch simply allows skipping earlier
blocks of verified secpath entries
- All entries (except trailing transport mode entries) must have a
matching template
- Unvalidated entries, including transport-mode entries still return
the errored index if it does not match the correct template.
Bug: 236423446
Bug: 277711867
Test: Tested against Android Kernel Unit Tests
Link: https://lore.kernel.org/netdev/20220824221252.4130836-2-benedictwong@google.com/
[benedictwong: fixed minor style issues]
Signed-off-by: Benedict Wong <benedictwong@google.com>
Change-Id: Ic32831cb00151d0de2e465f18ec37d5f7b680e54
(cherry picked from commit 970e02667c)
This reverts commit 0b892d8fe9
After manual bisection, I found that 0b892d8fe9 is the culprit of the failed android.net.cts.IpSecManagerTunnelTest .
Bug: 277711867
Signed-off-by: Kelvin Zhang <zhangkelvin@google.com>
Change-Id: Ife350047225fb5d825ec92c5d087313c70965acf
This change ensures that all nested XFRM packets have their policy
checked before decryption of the next layer, so that policies are
verified at each intermediate step of the decryption process.
Notably, raw ESP/AH packets do not perform policy checks inherently,
whereas all other encapsulated packets (UDP, TCP encapsulated) do policy
checks after calling xfrm_input handling in the respective encapsulation
layer.
This is necessary especially for nested tunnels, as the IP addresses,
protocol and ports may all change, thus not matching the previous
policies. In order to ensure that packets match the relevant inbound
templates, the xfrm_policy_check should be done before handing off to
the inner XFRM protocol to decrypt and decapsulate.
In order to prevent double-checking packets both here and in the
encapsulation layers, this check is currently limited to nested
tunnel-mode transforms and checked prior to decapsulation of inner
tunnel layers (prior to hitting a nested tunnel's xfrm_input, there
is no great way to detect a nested tunnel). This is primarily a
performance consideration, as a general blanket check at the end of
xfrm_input would suffice, but may result in multiple policy checks.
Bug: 236423446
Bug: 277711867
Test: Tested against Android Kernel Unit Tests
Link: https://lore.kernel.org/netdev/20220824221252.4130836-3-benedictwong@google.com/
Signed-off-by: Benedict Wong <benedictwong@google.com>
Change-Id: I20c5abf39512d7f6cf438c0921a78a84e281b4e9
(cherry picked from commit b5bf2997c3)
Android places by default the modules into /lib/modules/ instead of using
the default path /lib/modules/<uname>.
Bug: 254835242
Change-Id: I49ed4be25c29302fc9b99a9f2ef5f1c84df3adc9
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Fallback to the default module path (/lib/modules/<uname>) if module
loading failed for the selected path in CONFIG_PKVM_MODULE_PATH. This
intends to follow the same mechanism as Android init.
Bug: 254835242
Change-Id: Ia7764d57fe71521e4a1fe6d2c85ba057790069a8
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Currently, no module path will be given to modprobe when loading a pKVM
module, the module must then be found in /lib/modules/<uname>. Add
CONFIG_PKVM_MODULE_PATH to allow setting a different path from the
kernel config.
Bug: 254835242
Change-Id: I4f355518628b44ac03de2cee3d7a90e1ad5bf1e2
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Add android_vh_ufs_clock_scaling to MTK symbol list
Bug: 277668337
Change-Id: Ieda17e4daff8ce5ee699fbfbf5d4238303b81ef2
Signed-off-by: Ed Tsai <ed.tsai@mediatek.com>
Update the folio generation in place with or without
current->reclaim_state->mm_walk. The LRU lock is held for longer, if
mm_walk is NULL and the number of folios to update is more than
PAGEVEC_SIZE.
This causes a measurable regression from the LRU lock contention during a
microbencmark. But a tiny regression is not worth the complexity.
Link: https://lkml.kernel.org/r/20230118001827.1040870-8-talumbau@google.com
Change-Id: I9ce18b4f4062e6c1c13c98ece9422478eb8e1846
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit abf086721a)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
This patch adds POSIX_FADV_NOREUSE to vma_has_recency() so that the LRU
algorithm can ignore access to mapped files marked by this flag.
The advantages of POSIX_FADV_NOREUSE are:
1. Unlike MADV_SEQUENTIAL and MADV_RANDOM, it does not alter the
default readahead behavior.
2. Unlike MADV_SEQUENTIAL and MADV_RANDOM, it does not split VMAs and
therefore does not take mmap_lock.
3. Unlike MADV_COLD, setting it has a negligible cost, regardless of
how many pages it affects.
Its limitations are:
1. Like POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL, it currently does
not support range. IOW, its scope is the entire file.
2. It currently does not ignore access through file descriptors.
Specifically, for the active/inactive LRU, given a file page shared
by two users and one of them having set POSIX_FADV_NOREUSE on the
file, this page will be activated upon the second user accessing
it. This corner case can be covered by checking POSIX_FADV_NOREUSE
before calling folio_mark_accessed() on the read path. But it is
considered not worth the effort.
There have been a few attempts to support POSIX_FADV_NOREUSE, e.g., [1].
This time the goal is to fill a niche: a few desktop applications, e.g.,
large file transferring and video encoding/decoding, want fast file
streaming with mmap() rather than direct IO. Among those applications, an
SVT-AV1 regression was reported when running with MGLRU [2]. The
following test can reproduce that regression.
kb=$(awk '/MemTotal/ { print $2 }' /proc/meminfo)
kb=$((kb - 8*1024*1024))
modprobe brd rd_nr=1 rd_size=$kb
dd if=/dev/zero of=/dev/ram0 bs=1M
mkfs.ext4 /dev/ram0
mount /dev/ram0 /mnt/
swapoff -a
fallocate -l 8G /mnt/swapfile
mkswap /mnt/swapfile
swapon /mnt/swapfile
wget http://ultravideo.cs.tut.fi/video/Bosphorus_3840x2160_120fps_420_8bit_YUV_Y4M.7z
7z e -o/mnt/ Bosphorus_3840x2160_120fps_420_8bit_YUV_Y4M.7z
SvtAv1EncApp --preset 12 -w 3840 -h 2160 \
-i /mnt/Bosphorus_3840x2160.y4m
For MGLRU, the following change showed a [9-11]% increase in FPS,
which makes it on par with the active/inactive LRU.
patch Source/App/EncApp/EbAppMain.c <<EOF
31a32
> #include <fcntl.h>
35d35
< #include <fcntl.h> /* _O_BINARY */
117a118
> posix_fadvise(config->mmap.fd, 0, 0, POSIX_FADV_NOREUSE);
EOF
[1] https://lore.kernel.org/r/1308923350-7932-1-git-send-email-andrea@betterlinux.com/
[2] https://openbenchmarking.org/result/2209259-PTS-MGLRU8GB57
Link: https://lkml.kernel.org/r/20221230215252.2628425-2-yuzhao@google.com
Change-Id: I0b7f5f971d78014ea1ba44cee6a8ec902a4330d0
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 17e810229c)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Add vma_has_recency() to indicate whether a VMA may exhibit temporal
locality that the LRU algorithm relies on.
This function returns false for VMAs marked by VM_SEQ_READ or
VM_RAND_READ. While the former flag indicates linear access, i.e., a
special case of spatial locality, both flags indicate a lack of temporal
locality, i.e., the reuse of an area within a relatively small duration.
"Recency" is chosen over "locality" to avoid confusion between temporal
and spatial localities.
Before this patch, the active/inactive LRU only ignored the accessed bit
from VMAs marked by VM_SEQ_READ. After this patch, the active/inactive
LRU and MGLRU share the same logic: they both ignore the accessed bit if
vma_has_recency() returns false.
For the active/inactive LRU, the following fio test showed a [6, 8]%
increase in IOPS when randomly accessing mapped files under memory
pressure.
kb=$(awk '/MemTotal/ { print $2 }' /proc/meminfo)
kb=$((kb - 8*1024*1024))
modprobe brd rd_nr=1 rd_size=$kb
dd if=/dev/zero of=/dev/ram0 bs=1M
mkfs.ext4 /dev/ram0
mount /dev/ram0 /mnt/
swapoff -a
fio --name=test --directory=/mnt/ --ioengine=mmap --numjobs=8 \
--size=8G --rw=randrw --time_based --runtime=10m \
--group_reporting
The discussion that led to this patch is here [1]. Additional test
results are available in that thread.
[1] https://lore.kernel.org/r/Y31s%2FK8T85jh05wH@google.com/
Link: https://lkml.kernel.org/r/20221230215252.2628425-1-yuzhao@google.com
Change-Id: I291dcb795197659e40e46539cd32b857677c34ad
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8788f67814)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Set KMI_GENERATION=4 for 4/12 KMI update
1 function symbol(s) removed
'int of_mdiobus_register(struct mii_bus*, struct device_node*)'
2 function symbol(s) added
'void* memremap_pages(struct dev_pagemap*, int)'
'void memunmap_pages(struct dev_pagemap*)'
function symbol changed from 'bool cfg80211_rx_control_port(struct net_device*, struct sk_buff*, bool)' to 'bool cfg80211_rx_control_port(struct net_device*, struct sk_buff*, bool, int)'
CRC changed from 0x19c30d56 to 0x70d8333f
type changed from 'bool(struct net_device*, struct sk_buff*, bool)' to 'bool(struct net_device*, struct sk_buff*, bool, int)'
parameter 4 of type 'int' was added
function symbol 'struct block_device* I_BDEV(struct inode*)' changed
CRC changed from 0xc79e45c3 to 0xbf847796
function symbol 'void __ClearPageMovable(struct page*)' changed
CRC changed from 0x4cf602fa to 0xd312e35b
function symbol 'void __SetPageMovable(struct page*, const struct movable_operations*)' changed
CRC changed from 0x60f5778b to 0x9c92af65
... 3672 omitted; 3675 symbols have only CRC changes
type 'struct pglist_data' changed
byte size changed from 7168 to 9088
member changed from 'struct zone node_zones[3]' to 'struct zone node_zones[4]'
type changed from 'struct zone[3]' to 'struct zone[4]'
number of elements changed from 3 to 4
member 'struct zonelist node_zonelists[1]' changed
offset changed by 12800
22 members ('int nr_zones' .. 'unsigned long totalreserve_pages') changed
offset changed by 12928
3 members ('struct cacheline_padding _pad1_' .. 'struct lruvec __lruvec') changed
offset changed by 13312
2 members ('unsigned long flags' .. 'struct lru_gen_mm_walk mm_walk') changed
offset changed by 14848
member 'struct lru_gen_memcg memcg_lru' changed
offset changed by 15104
3 members ('struct cacheline_padding _pad2_' .. 'atomic_long_t vm_stat[42]') changed
offset changed by 15360
type 'struct iommu_group' changed
byte size changed from 208 to 224
member 'struct xarray pasid_array' was added
11 members ('struct mutex mutex' .. 'void* owner') changed
offset changed by 128
type 'struct iommu_domain' changed
byte size changed from 72 to 88
member 'iommu_fault_handler_t handler' was removed
member 'void* handler_token' was removed
2 members ('struct iommu_domain_geometry geometry' .. 'struct iommu_dma_cookie* iova_cookie') changed
offset changed by -128
member 'enum iommu_page_response_code(* iopf_handler)(struct iommu_fault*, void*)' was added
member 'void* fault_data' was added
member 'union { struct { iommu_fault_handler_t handler; void* handler_token; }; struct { struct mm_struct* mm; int users; }; }' was added
type 'struct iommu_device' changed
byte size changed from 40 to 48
member 'u32 max_pasids' was added
type 'struct iommu_ops' changed
byte size changed from 152 to 136
member 'struct iommu_sva*(* sva_bind)(struct device*, struct mm_struct*, void*)' was removed
member 'void(* sva_unbind)(struct iommu_sva*)' was removed
member 'u32(* sva_get_pasid)(struct iommu_sva*)' was removed
2 members ('int(* page_response)(struct device*, struct iommu_fault_event*, struct iommu_page_response*)' .. 'int(* def_domain_type)(struct device*)') changed
offset changed by -192
member 'void(* remove_dev_pasid)(struct device*, ioasid_t)' was added
3 members ('const struct iommu_domain_ops* default_domain_ops' .. 'struct module* owner') changed
offset changed by -128
type 'struct vm_event_state' changed
byte size changed from 728 to 752
member changed from 'unsigned long event[91]' to 'unsigned long event[94]'
type changed from 'unsigned long[91]' to 'unsigned long[94]'
number of elements changed from 91 to 94
type 'struct dev_iommu' changed
byte size changed from 72 to 80
member 'u32 max_pasids' was added
type 'struct io_uring_cmd' changed
member changed from 'union { void(* task_work_cb)(struct io_uring_cmd*); void* cookie; }' to 'union { void(* task_work_cb)(struct io_uring_cmd*, unsigned int); void* cookie; }'
type changed from 'union { void(* task_work_cb)(struct io_uring_cmd*); void* cookie; }' to 'union { void(* task_work_cb)(struct io_uring_cmd*, unsigned int); void* cookie; }'
member changed from 'void(* task_work_cb)(struct io_uring_cmd*)' to 'void(* task_work_cb)(struct io_uring_cmd*, unsigned int)'
type changed from 'void(*)(struct io_uring_cmd*)' to 'void(*)(struct io_uring_cmd*, unsigned int)'
pointed-to type changed from 'void(struct io_uring_cmd*)' to 'void(struct io_uring_cmd*, unsigned int)'
parameter 2 of type 'unsigned int' was added
type 'struct dentry_operations' changed
member changed from 'void(* d_canonical_path)(const struct path*, struct path*)' to 'int(* d_canonical_path)(const struct path*, struct path*)'
type changed from 'void(*)(const struct path*, struct path*)' to 'int(*)(const struct path*, struct path*)'
pointed-to type changed from 'void(const struct path*, struct path*)' to 'int(const struct path*, struct path*)'
return type changed from 'void' to 'int'
type 'struct fscrypt_operations' changed
byte size changed from 72 to 104
member 'u64 android_kabi_reserved1' was added
member 'u64 android_kabi_reserved2' was added
member 'u64 android_kabi_reserved3' was added
member 'u64 android_kabi_reserved4' was added
type 'struct zone' changed
member changed from 'long lowmem_reserve[3]' to 'long lowmem_reserve[4]'
type changed from 'long[3]' to 'long[4]'
number of elements changed from 3 to 4
15 members ('struct pglist_data* zone_pgdat' .. 'int initialized') changed
offset changed by 64
type 'struct zonelist' changed
byte size changed from 64 to 80
member changed from 'struct zoneref _zonerefs[4]' to 'struct zoneref _zonerefs[5]'
type changed from 'struct zoneref[4]' to 'struct zoneref[5]'
number of elements changed from 4 to 5
type 'enum zone_type' changed
enumerator 'ZONE_DEVICE' (3) was added
enumerator '__MAX_NR_ZONES' value changed from 3 to 4
type 'struct lruvec' changed
byte size changed from 1224 to 1416
2 members ('struct lru_gen_mm_state mm_state' .. 'struct pglist_data* pgdat') changed
offset changed by 1536
type 'struct lru_gen_mm_walk' changed
byte size changed from 152 to 184
member changed from 'int nr_pages[4][2][3]' to 'int nr_pages[4][2][4]'
type changed from 'int[4][2][3]' to 'int[4][2][4]'
element type changed from 'int[2][3]' to 'int[2][4]'
element type changed from 'int[3]' to 'int[4]'
number of elements changed from 3 to 4
4 members ('int mm_stats[6]' .. 'bool force_scan') changed
offset changed by 256
type 'struct iommu_domain_ops' changed
byte size changed from 112 to 120
member 'int(* set_dev_pasid)(struct iommu_domain*, struct device*, ioasid_t)' was added
12 members ('int(* map)(struct iommu_domain*, unsigned long, phys_addr_t, size_t, int, gfp_t)' .. 'void(* free)(struct iommu_domain*)') changed
offset changed by 64
type 'struct mem_cgroup_per_node' changed
byte size changed from 2096 to 2328
2 members ('struct lruvec_stats_percpu* lruvec_stats_percpu' .. 'struct lruvec_stats lruvec_stats') changed
offset changed by 1536
member changed from 'unsigned long lru_zone_size[3][5]' to 'unsigned long lru_zone_size[4][5]'
offset changed from 15232 to 16768
type changed from 'unsigned long[3][5]' to 'unsigned long[4][5]'
number of elements changed from 3 to 4
6 members ('struct mem_cgroup_reclaim_iter iter' .. 'struct mem_cgroup* memcg') changed
offset changed by 1856
type 'struct lru_gen_folio' changed
byte size changed from 960 to 1152
member changed from 'struct list_head folios[4][2][3]' to 'struct list_head folios[4][2][4]'
type changed from 'struct list_head[4][2][3]' to 'struct list_head[4][2][4]'
element type changed from 'struct list_head[2][3]' to 'struct list_head[2][4]'
element type changed from 'struct list_head[3]' to 'struct list_head[4]'
number of elements changed from 3 to 4
member changed from 'long nr_pages[4][2][3]' to 'long nr_pages[4][2][4]'
offset changed from 3520 to 4544
type changed from 'long[4][2][3]' to 'long[4][2][4]'
element type changed from 'long[2][3]' to 'long[2][4]'
element type changed from 'long[3]' to 'long[4]'
number of elements changed from 3 to 4
9 members ('unsigned long avg_refaulted[2][4]' .. 'struct hlist_nulls_node list') changed
offset changed by 1536
Bug: 277759776
Change-Id: I31065f7aa7589d55cf402ed8e00da061cffe1246
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
The following symbol was removed by commit c2b6e1a440 ("net: mdio: fix
owner field for mdio buses registered using device-tree"). It also needs
to be removed from this symbol list to reflect this update.
- of_mdiobus_register
Bug: 277759776
Change-Id: I4ab79a86f13404c2d0b2e423154aaa8b512bc1c4
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Previously errors from the daemon in FUSE_CANONICAL_PATH were simply
ignored. In order to block inotifys, it is useful to be able to return
errors from this opcode.
Bug: 238619640
Test: inotify no longer works on /storage/emulated/0/Android/media but
does on child folders
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: Icb15c090c6286c174338471a787712f8388de316
2 function symbol(s) added
'void* memremap_pages(struct dev_pagemap*, int)'
'void memunmap_pages(struct dev_pagemap*)'
Add the memremap_pages() and memunmap_pages() functions exposed by
CONFIG_ZONE_DEVICE, in order to allow drivers to map device memory in
the logical mapping using memremap_pages().
Bug: 274657829
Change-Id: I4dfcbdbb1d2493f4137c356ba1d1a9679156cfed
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Enable CONFIG_ZONE_DEVICE to allow drivers to map device memory in the
logical mapping using memremap_pages().
Bug: 274657829
Change-Id: Ie4ac78b7667ddb5ea20c7f4ed2b0df127012008a
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
This reverts commit d608563925.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: I4198b98bd5c4012501237c4498de164e65f1a1c3
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit d9f36cae1c.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: I9f883317631792227d5cc365cfd84fb5e745c434
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Tweak the I/O page fault handling framework to route the page faults to
the domain and call the page fault handler retrieved from the domain.
This makes the I/O page fault handling framework possible to serve more
usage scenarios as long as they have an IOMMU domain and install a page
fault handler in it. Some unused functions are also removed to avoid
dead code.
The iommu_get_domain_for_dev_pasid() which retrieves attached domain
for a {device, PASID} pair is used. It will be used by the page fault
handling framework which knows {device, PASID} reported from the iommu
driver. We have a guarantee that the SVA domain doesn't go away during
IOPF handling, because unbind() won't free the domain until all the
pending page requests have been flushed from the pipeline. The drivers
either call iopf_queue_flush_dev() explicitly, or in stall case, the
device driver is required to flush all DMAs including stalled
transactions before calling unbind().
This also renames iopf_handle_group() to iopf_handler() to avoid
confusing.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-13-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit 4bb4211e48)
Bug: 271394577
Change-Id: I5f8762d04f43b64fd76cdbbbcddb9740c7449746
Signed-off-by: Michael Shavit <mshavit@google.com>
This adds some mechanisms around the iommu_domain so that the I/O page
fault handling framework could route a page fault to the domain and
call the fault handler from it.
Add pointers to the page fault handler and its private data in struct
iommu_domain. The fault handler will be called with the private data
as a parameter once a page fault is routed to the domain. Any kernel
component which owns an iommu domain could install handler and its
private parameter so that the page fault could be further routed and
handled.
This also prepares the SVA implementation to be the first consumer of
the per-domain page fault handling model. The I/O page fault handler
for SVA is copied to the SVA file with mmget_not_zero() added before
mmap_read_lock().
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-12-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit 8cc93159f9)
Bug: 271394577
Change-Id: Ibf0c080875760a2ad789770885eb6c8db7170cbe
Signed-off-by: Michael Shavit <mshavit@google.com>
The existing iommu SVA interfaces are implemented by calling the SVA
specific iommu ops provided by the IOMMU drivers. There's no need for
any SVA specific ops in iommu_ops vector anymore as we can achieve
this through the generic attach/detach_dev_pasid domain ops.
This refactors the IOMMU SVA interfaces implementation by using the
iommu_attach/detach_device_pasid interfaces and align them with the
concept of the SVA iommu domain. Put the new SVA code in the SVA
related file in order to make it self-contained.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit be51b1d6bb)
Bug: 271394577
Change-Id: I4f17688c8659a6ed2e0433ab5afd95f6f7860d3b
Signed-off-by: Michael Shavit <mshavit@google.com>
Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops. This implementation is based on the existing SVA
code. Possible cleanup and refactoring are left for incremental
changes later.
The VT-d driver will also need to support setting a DMA domain to a
PASID of device. Current SVA implementation uses different data
structures to track the domain and device PASID relationship. That's
the reason why we need to check the domain type in remove_dev_pasid
callback. Eventually we'll consolidate the data structures and remove
the need of domain type check.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit eaca8889a1)
Bug: 271394577
Change-Id: I09fcc5817810f3f351dd5fa6e22e3b5f304005ce
Signed-off-by: Michael Shavit <mshavit@google.com>
The SVA iommu_domain represents a hardware pagetable that the IOMMU
hardware could use for SVA translation. This adds some infrastructures
to support SVA domain in the iommu core. It includes:
- Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA domain
type. The IOMMU drivers that support allocation of the SVA domain
should provide its own SVA domain specific iommu_domain_ops.
- Add a helper to allocate an SVA domain. The iommu_domain_free()
is still used to free an SVA domain.
The report_iommu_fault() should be replaced by the new
iommu_report_device_fault(). Leave the existing fault handler with the
existing users and the newly added SVA members excludes it.
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit 136467962e)
Bug: 271394577
Change-Id: I0c6ce7f05f76d7cdcaab5ecd3ad0cf72bbff7d03
Signed-off-by: Michael Shavit <mshavit@google.com>
Attaching an IOMMU domain to a PASID of a device is a generic operation
for modern IOMMU drivers which support PASID-granular DMA address
translation. Currently visible usage scenarios include (but not limited):
- SVA (Shared Virtual Address)
- kernel DMA with PASID
- hardware-assist mediated device
This adds the set_dev_pasid domain ops for setting the domain onto a
PASID of a device and remove_dev_pasid iommu ops for removing any setup
on a PASID of device. This also adds interfaces for device drivers to
attach/detach/retrieve a domain for a PASID of a device.
If multiple devices share a single group, it's fine as long the fabric
always routes every TLP marked with a PASID to the host bridge and only
the host bridge. For example, ACS achieves this universally and has been
checked when pci_enable_pasid() is called. As we can't reliably tell the
source apart in a group, all the devices in a group have to be considered
as the same source, and mapped to the same PASID table.
The DMA ownership is about the whole device (more precisely, iommu group),
including the RID and PASIDs. When the ownership is converted, the pasid
array must be empty. This also adds necessary checks in the DMA ownership
interfaces.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-6-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit 1660370455)
Bug: 271394577
Change-Id: I8057b72c3db7b83cd26c1b0ceeb81eeeff97c6f3
Signed-off-by: Michael Shavit <mshavit@google.com>
The Requester ID/Process Address Space ID (PASID) combination
identifies an address space distinct from the PCI bus address space,
e.g., an address space defined by an IOMMU.
But the PCIe fabric routes Memory Requests based on the TLP address,
ignoring any PASID (PCIe r6.0, sec 2.2.10.4), so a TLP with PASID that
SHOULD go upstream to the IOMMU may instead be routed as a P2P
Request if its address falls in a bridge window.
To ensure that all Memory Requests with PASID are routed upstream,
only enable PASID if ACS P2P Request Redirect and Upstream Forwarding
are enabled for the path leading to the device.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit 201007ef70)
Bug: 271394577
Change-Id: I302e3c26cd5d3a0122f4ef52a0191959aef574c0
Signed-off-by: Michael Shavit <mshavit@google.com>
The current kernel DMA with PASID support is based on the SVA with a flag
SVM_FLAG_SUPERVISOR_MODE. The IOMMU driver binds the kernel memory address
space to a PASID of the device. The device driver programs the device with
kernel virtual address (KVA) for DMA access. There have been security and
functional issues with this approach:
- The lack of IOTLB synchronization upon kernel page table updates.
(vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
- Other than slight more protection, using kernel virtual address (KVA)
has little advantage over physical address. There are also no use
cases yet where DMA engines need kernel virtual addresses for in-kernel
DMA.
This removes SVM_FLAG_SUPERVISOR_MODE support from the IOMMU interface.
The device drivers are suggested to handle kernel DMA with PASID through
the kernel DMA APIs.
The drvdata parameter in iommu_sva_bind_device() and all callbacks is not
needed anymore. Cleanup them as well.
Link: https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@nvidia.com/
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit 942fd5435d)
Bug: 271394577
Change-Id: I568663921ccb4af3898806d576e3d7b605157d32
Signed-off-by: Michael Shavit <mshavit@google.com>
Use this field to keep the number of supported PASIDs that an IOMMU
hardware is able to support. This is a generic attribute of an IOMMU
and lifting it into the per-IOMMU device structure makes it possible
to allocate a PASID for device without calls into the IOMMU drivers.
Any iommu driver that supports PASID related features should set this
field before enabling them on the devices.
In the Intel IOMMU driver, intel_iommu_sm is moved to CONFIG_INTEL_IOMMU
enclave so that the pasid_supported() helper could be used in dmar.c
without compilation errors.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Link: https://lore.kernel.org/r/20221031005917.45690-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit 1adf3cc20d)
Bug: 271394577
Change-Id: I64f01079ffca23d28eb1d8c8d4e72afcbf197430
Signed-off-by: Michael Shavit <mshavit@google.com>
'struct fscrypt_operations' shouldn't really be part of the KMI, as
there's no reason for loadable modules to use it. However, due to the
way MODVERSIONS calculates symbol CRCs by recursively dereferencing
structures, changes to 'struct fscrypt_operations' affect the CRCs of
KMI functions exported from certain core kernel files such as
fs/dcache.c. That brings it in-scope for the KMI freeze.
Therefore, add some reserved fields to this struct for LTS updates.
Bug: 151154716
Change-Id: Ic3bf66c93a9be167a0a5b257bd55e2719d99a1b4
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jung Jinwoo <j7093.jung@samsung.com>
Add the symbol sock_gen_put which is needed by rmnet modules.
Symbols added:
sock_gen_put
Bug: 277377865
Change-Id: Ie98c2269ae7f1f4022dcf84973d9d00d5fa927c5
Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
SoCs featuring peripherals that can issue non-coherent DMA traffic
beyond the point of coherency (PoC) present multiple challenges for the
DMA-API implementation in Linux. Many of these challenges can be
overcome by suitable configuration of the interconnect, however the
presence of a cacheable alias for non-cacheable buffers can still lead
to coherence issues arising when stale clean lines are back-snooped from
the cache hierarchy to satisfy a non-cacheable transaction at the PoC.
Removing all cacheable aliases on a case-by-cases basis is both
error-prone and expensive. Instead, leverage the stage-2 identity
mapping installed by pKVM to enforce consistent cacheability for all
stage-1 aliases.
Bug: 240786634
Change-Id: I78b0aa51fe3e23811bbd25481173086aa957c4bf
Signed-off-by: Will Deacon <willdeacon@google.com>