Gunyah is an open-source Type-1 hypervisor developed by Qualcomm. It
does not depend on any lower-privileged OS/kernel code for its core
functionality. This increases its security and can support a smaller
trusted computing based when compared to Type-2 hypervisors.
Add documentation describing the Gunyah hypervisor and the main
components of the Gunyah hypervisor which are of interest to Linux
virtualization development.
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Change-Id: I471b5ad02732f25e35efe033cd281025c84d0f09
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Bug: 268234781
Link: https://lore.kernel.org/all/20230304010632.2127470-2-quic_eberman@quicinc.com/
The CRC for shmem_file_setup changed after
https://r.android.com/c/2512924/5 rebased on commit due to the bi-weekly
KMI update happening in tandem.
The KMI changes include:
function symbol 'struct file* shmem_file_setup(const char*,
loff_t, unsigned long)' changed
CRC changed from 0x5979e157 to 0xe9ef458b
Also, abi_gki_protected_exports_aarch64 needed an update due to
the hashtag 'kmi-changes-for-2023-03-29'.
Bug: 273448633
Change-Id: Ie174f036ddfbac12e454bbaf927c4a15cf9020f0
Signed-off-by: Will McVicker <willmcvicker@google.com>
Add a trace hook whose callback be used to fill the folio that can be
used for shmem fs. This VH also takes the 'shmem_inode_info' which
can contain the vendor specific data.
Bug: 273448633
Change-Id: Ia48480bba6dba1ee37a3297b69fd61877dae8dc9
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Add vendor specific data in the 'struct shmem_inode_info'.
Bug: 273448633
Change-Id: I83a3ac822275d2464af7eb25b869b816fdb7276e
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Set KMI_GENERATION=3 for 3/29 KMI update
5 function symbol(s) added
'struct page* dmabuf_page_pool_alloc(struct dmabuf_page_pool*)'
'struct dmabuf_page_pool* dmabuf_page_pool_create(gfp_t, unsigned int)'
'void dmabuf_page_pool_destroy(struct dmabuf_page_pool*)'
'void dmabuf_page_pool_free(struct dmabuf_page_pool*, struct page*)'
'unsigned long dmabuf_page_pool_get_size(struct dmabuf_page_pool*)'
function symbol changed from 'void __wake_up(struct wait_queue_head*, unsigned int, int, void*)' to 'int __wake_up(struct wait_queue_head*, unsigned int, int, void*)'
CRC changed from 0x3eeb2322 to 0xe2964344
type changed from 'void(struct wait_queue_head*, unsigned int, int, void*)' to 'int(struct wait_queue_head*, unsigned int, int, void*)'
return type changed from 'void' to 'int'
function symbol changed from 'void cfg80211_ch_switch_notify(struct net_device*, struct cfg80211_chan_def*, unsigned int)' to 'void cfg80211_ch_switch_notify(struct net_device*, struct cfg80211_chan_def*, unsigned int, u16)'
CRC changed from 0xe8432c8b to 0xdcde54a6
type changed from 'void(struct net_device*, struct cfg80211_chan_def*, unsigned int)' to 'void(struct net_device*, struct cfg80211_chan_def*, unsigned int, u16)'
parameter 4 of type 'u16' was added
function symbol changed from 'void cfg80211_ch_switch_started_notify(struct net_device*, struct cfg80211_chan_def*, unsigned int, u8, bool)' to 'void cfg80211_ch_switch_started_notify(struct net_device*, struct cfg80211_chan_def*, unsigned int, u8, bool, u16)'
CRC changed from 0xe086a1f0 to 0x86eba6c4
type changed from 'void(struct net_device*, struct cfg80211_chan_def*, unsigned int, u8, bool)' to 'void(struct net_device*, struct cfg80211_chan_def*, unsigned int, u8, bool, u16)'
parameter 6 of type 'u16' was added
function symbol changed from 'void cfg80211_port_authorized(struct net_device*, const u8*, gfp_t)' to 'void cfg80211_port_authorized(struct net_device*, const u8*, const u8*, u8, gfp_t)'
CRC changed from 0x2ce6ed68 to 0x8ba3e8b9
type changed from 'void(struct net_device*, const u8*, gfp_t)' to 'void(struct net_device*, const u8*, const u8*, u8, gfp_t)'
parameter 3 type changed from 'gfp_t' = 'unsigned int' to 'const u8*'
resolved type changed from 'unsigned int' to 'const u8*'
parameter 4 of type 'u8' was added
parameter 5 of type 'gfp_t' was added
function symbol changed from 'unsigned long drm_gem_lru_scan(struct drm_gem_lru*, unsigned int, bool(*)(struct drm_gem_object*))' to 'unsigned long drm_gem_lru_scan(struct drm_gem_lru*, unsigned int, unsigned long*, bool(*)(struct drm_gem_object*))'
CRC changed from 0x98fde9d to 0x9acfaf20
type changed from 'unsigned long(struct drm_gem_lru*, unsigned int, bool(*)(struct drm_gem_object*))' to 'unsigned long(struct drm_gem_lru*, unsigned int, unsigned long*, bool(*)(struct drm_gem_object*))'
parameter 3 type changed from 'bool(*)(struct drm_gem_object*)' to 'unsigned long*'
pointed-to type changed from 'bool(struct drm_gem_object*)' to 'unsigned long'
parameter 4 of type 'bool(*)(struct drm_gem_object*)' was added
function symbol changed from 'char* kobject_get_path(struct kobject*, gfp_t)' to 'char* kobject_get_path(const struct kobject*, gfp_t)'
CRC changed from 0x6d2bc3a7 to 0x62b056f6
type changed from 'char*(struct kobject*, gfp_t)' to 'char*(const struct kobject*, gfp_t)'
parameter 1 type changed from 'struct kobject*' to 'const struct kobject*'
pointed-to type changed from 'struct kobject' to 'const struct kobject'
qualifier const added
function symbol 'struct block_device* I_BDEV(struct inode*)' changed
CRC changed from 0x66b14c8d to 0xc79e45c3
function symbol 'void __ClearPageMovable(struct page*)' changed
CRC changed from 0xbf6e946f to 0x4cf602fa
function symbol 'void __SetPageMovable(struct page*, const struct movable_operations*)' changed
CRC changed from 0x8c770d3 to 0x60f5778b
... 1724 omitted; 1727 symbols have only CRC changes
type 'enum nl80211_attrs' changed
enumerator 'NL80211_ATTR_TD_BITMAP' (321) was added
enumerator 'NL80211_ATTR_PUNCT_BITMAP' (322) was added
enumerator '__NL80211_ATTR_AFTER_LAST' value changed from 321 to 323
enumerator 'NUM_NL80211_ATTR' value changed from 321 to 323
enumerator 'NL80211_ATTR_MAX' value changed from 320 to 322
type 'struct scsi_device' changed
member 'unsigned int no_vpd_size : 1' was added
type 'struct dma_buf' changed
byte size changed from 264 to 272
member 'struct dma_buf_sysfs_entry* sysfs_entry' was added
type 'struct ufs_hba' changed
member 'unsigned int android_quirks' was added
member 'unsigned int dev_quirks' changed
offset changed by 32
type 'struct cfg80211_connect_resp_params' changed
byte size changed from 472 to 592
member changed from 'struct { const u8* addr; const u8* bssid; struct cfg80211_bss* bss; } links[15]' to 'struct { const u8* addr; const u8* bssid; struct cfg80211_bss* bss; u16 status; } links[15]'
type changed from 'struct { const u8* addr; const u8* bssid; struct cfg80211_bss* bss; }[15]' to 'struct { const u8* addr; const u8* bssid; struct cfg80211_bss* bss; u16 status; }[15]'
element type changed from 'struct { const u8* addr; const u8* bssid; struct cfg80211_bss* bss; }' to 'struct { const u8* addr; const u8* bssid; struct cfg80211_bss* bss; u16 status; }'
byte size changed from 24 to 32
member 'u16 status' was added
type 'struct station_info' changed
byte size changed from 232 to 256
member 'bool mlo_params_valid' was added
member 'u8 assoc_link_id' was added
member 'u8 mld_addr[6]' was added
member 'const u8* assoc_resp_ies' was added
member 'size_t assoc_resp_ies_len' was added
type 'struct cfg80211_external_auth_params' changed
byte size changed from 64 to 72
member 'u8 mld_addr[6]' was added
type 'struct cfg80211_rx_assoc_resp' changed
byte size changed from 288 to 408
member changed from 'struct { const u8* addr; struct cfg80211_bss* bss; } links[15]' to 'struct { const u8* addr; struct cfg80211_bss* bss; u16 status; } links[15]'
type changed from 'struct { const u8* addr; struct cfg80211_bss* bss; }[15]' to 'struct { const u8* addr; struct cfg80211_bss* bss; u16 status; }[15]'
element type changed from 'struct { const u8* addr; struct cfg80211_bss* bss; }' to 'struct { const u8* addr; struct cfg80211_bss* bss; u16 status; }'
byte size changed from 16 to 24
member 'u16 status' was added
type 'struct cfg80211_update_owe_info' changed
byte size changed from 24 to 40
member 'int assoc_link_id' was added
member 'u8 peer_mld_addr[6]' was added
type 'struct pglist_data' changed
byte size changed from 6976 to 7168
2 members ('unsigned long flags' .. 'struct lru_gen_mm_walk mm_walk') changed
offset changed by 128
member 'struct lru_gen_memcg memcg_lru' was added
3 members ('struct cacheline_padding _pad2_' .. 'atomic_long_t vm_stat[42]') changed
offset changed by 1536
type 'struct pci_host_bridge' changed
member 'unsigned int no_inc_mrrs : 1' was added
9 members ('unsigned int native_aer : 1' .. 'unsigned int msi_domain : 1') changed
offset changed by 1
type 'struct hid_device' changed
member 'unsigned int initial_quirks' was added
member 'bool io_started' changed
offset changed by 32
type 'struct tcpm_port' changed
member 'bool potential_contaminant' was added
type 'struct tcpci' changed
byte size changed from 224 to 232
member 'struct tcpci_data* data' changed
offset changed by 64
type 'struct tcpci_data' changed
byte size changed from 64 to 72
member 'void(* check_contaminant)(struct tcpci*, struct tcpci_data*)' was added
type 'struct blk_mq_tags' changed
byte size changed from 168 to 184
member 'struct sbitmap_queue breserved_tags' changed
offset changed by 64
4 members ('struct request** rqs' .. 'spinlock_t lock') changed
offset changed by 128
type 'struct netns_ct' changed
member 'u8 ctnetlink_has_listener' was removed
6 members ('bool ecache_dwork_pending' .. 'u8 sysctl_checksum') changed
offset changed by -8
type 'struct lruvec' changed
byte size changed from 1208 to 1224
member changed from 'struct lru_gen_struct lrugen' to 'struct lru_gen_folio lrugen'
type changed from 'struct lru_gen_struct' to 'struct lru_gen_folio'
2 members ('struct lru_gen_mm_state mm_state' .. 'struct pglist_data* pgdat') changed
offset changed by 128
type 'struct tcpc_dev' changed
byte size changed from 184 to 192
member 'void(* check_contaminant)(struct tcpc_dev*)' was added
type 'enum tcpm_state' changed
enumerator 'CHECK_CONTAMINANT' (2) was added
enumerator 'SRC_UNATTACHED' value changed from 2 to 3
enumerator 'SRC_ATTACH_WAIT' value changed from 3 to 4
enumerator 'SRC_ATTACHED' value changed from 4 to 5
enumerator 'SRC_STARTUP' value changed from 5 to 6
enumerator 'SRC_SEND_CAPABILITIES' value changed from 6 to 7
enumerator 'SRC_SEND_CAPABILITIES_TIMEOUT' value changed from 7 to 8
enumerator 'SRC_NEGOTIATE_CAPABILITIES' value changed from 8 to 9
enumerator 'SRC_TRANSITION_SUPPLY' value changed from 9 to 10
enumerator 'SRC_READY' value changed from 10 to 11
enumerator 'SRC_WAIT_NEW_CAPABILITIES' value changed from 11 to 12
enumerator 'SNK_UNATTACHED' value changed from 12 to 13
enumerator 'SNK_ATTACH_WAIT' value changed from 13 to 14
enumerator 'SNK_DEBOUNCED' value changed from 14 to 15
enumerator 'SNK_ATTACHED' value changed from 15 to 16
enumerator 'SNK_STARTUP' value changed from 16 to 17
enumerator 'SNK_DISCOVERY' value changed from 17 to 18
enumerator 'SNK_DISCOVERY_DEBOUNCE' value changed from 18 to 19
enumerator 'SNK_DISCOVERY_DEBOUNCE_DONE' value changed from 19 to 20
enumerator 'SNK_WAIT_CAPABILITIES' value changed from 20 to 21
enumerator 'SNK_NEGOTIATE_CAPABILITIES' value changed from 21 to 22
enumerator 'SNK_NEGOTIATE_PPS_CAPABILITIES' value changed from 22 to 23
enumerator 'SNK_TRANSITION_SINK' value changed from 23 to 24
enumerator 'SNK_TRANSITION_SINK_VBUS' value changed from 24 to 25
enumerator 'SNK_READY' value changed from 25 to 26
enumerator 'ACC_UNATTACHED' value changed from 26 to 27
enumerator 'DEBUG_ACC_ATTACHED' value changed from 27 to 28
enumerator 'AUDIO_ACC_ATTACHED' value changed from 28 to 29
enumerator 'AUDIO_ACC_DEBOUNCE' value changed from 29 to 30
enumerator 'HARD_RESET_SEND' value changed from 30 to 31
enumerator 'HARD_RESET_START' value changed from 31 to 32
enumerator 'SRC_HARD_RESET_VBUS_OFF' value changed from 32 to 33
enumerator 'SRC_HARD_RESET_VBUS_ON' value changed from 33 to 34
enumerator 'SNK_HARD_RESET_SINK_OFF' value changed from 34 to 35
enumerator 'SNK_HARD_RESET_WAIT_VBUS' value changed from 35 to 36
enumerator 'SNK_HARD_RESET_SINK_ON' value changed from 36 to 37
enumerator 'SOFT_RESET' value changed from 37 to 38
enumerator 'SRC_SOFT_RESET_WAIT_SNK_TX' value changed from 38 to 39
enumerator 'SNK_SOFT_RESET' value changed from 39 to 40
enumerator 'SOFT_RESET_SEND' value changed from 40 to 41
enumerator 'DR_SWAP_ACCEPT' value changed from 41 to 42
enumerator 'DR_SWAP_SEND' value changed from 42 to 43
enumerator 'DR_SWAP_SEND_TIMEOUT' value changed from 43 to 44
enumerator 'DR_SWAP_CANCEL' value changed from 44 to 45
enumerator 'DR_SWAP_CHANGE_DR' value changed from 45 to 46
enumerator 'PR_SWAP_ACCEPT' value changed from 46 to 47
enumerator 'PR_SWAP_SEND' value changed from 47 to 48
enumerator 'PR_SWAP_SEND_TIMEOUT' value changed from 48 to 49
enumerator 'PR_SWAP_CANCEL' value changed from 49 to 50
enumerator 'PR_SWAP_START' value changed from 50 to 51
enumerator 'PR_SWAP_SRC_SNK_TRANSITION_OFF' value changed from 51 to 52
enumerator 'PR_SWAP_SRC_SNK_SOURCE_OFF' value changed from 52 to 53
enumerator 'PR_SWAP_SRC_SNK_SOURCE_OFF_CC_DEBOUNCED' value changed from 53 to 54
enumerator 'PR_SWAP_SRC_SNK_SINK_ON' value changed from 54 to 55
enumerator 'PR_SWAP_SNK_SRC_SINK_OFF' value changed from 55 to 56
enumerator 'PR_SWAP_SNK_SRC_SOURCE_ON' value changed from 56 to 57
enumerator 'PR_SWAP_SNK_SRC_SOURCE_ON_VBUS_RAMPED_UP' value changed from 57 to 58
enumerator 'VCONN_SWAP_ACCEPT' value changed from 58 to 59
enumerator 'VCONN_SWAP_SEND' value changed from 59 to 60
enumerator 'VCONN_SWAP_SEND_TIMEOUT' value changed from 60 to 61
enumerator 'VCONN_SWAP_CANCEL' value changed from 61 to 62
enumerator 'VCONN_SWAP_START' value changed from 62 to 63
enumerator 'VCONN_SWAP_WAIT_FOR_VCONN' value changed from 63 to 64
enumerator 'VCONN_SWAP_TURN_ON_VCONN' value changed from 64 to 65
enumerator 'VCONN_SWAP_TURN_OFF_VCONN' value changed from 65 to 66
enumerator 'FR_SWAP_SEND' value changed from 66 to 67
enumerator 'FR_SWAP_SEND_TIMEOUT' value changed from 67 to 68
enumerator 'FR_SWAP_SNK_SRC_TRANSITION_TO_OFF' value changed from 68 to 69
enumerator 'FR_SWAP_SNK_SRC_NEW_SINK_READY' value changed from 69 to 70
enumerator 'FR_SWAP_SNK_SRC_SOURCE_VBUS_APPLIED' value changed from 70 to 71
enumerator 'FR_SWAP_CANCEL' value changed from 71 to 72
enumerator 'SNK_TRY' value changed from 72 to 73
enumerator 'SNK_TRY_WAIT' value changed from 73 to 74
enumerator 'SNK_TRY_WAIT_DEBOUNCE' value changed from 74 to 75
enumerator 'SNK_TRY_WAIT_DEBOUNCE_CHECK_VBUS' value changed from 75 to 76
enumerator 'SRC_TRYWAIT' value changed from 76 to 77
enumerator 'SRC_TRYWAIT_DEBOUNCE' value changed from 77 to 78
enumerator 'SRC_TRYWAIT_UNATTACHED' value changed from 78 to 79
enumerator 'SRC_TRY' value changed from 79 to 80
enumerator 'SRC_TRY_WAIT' value changed from 80 to 81
enumerator 'SRC_TRY_DEBOUNCE' value changed from 81 to 82
enumerator 'SNK_TRYWAIT' value changed from 82 to 83
enumerator 'SNK_TRYWAIT_DEBOUNCE' value changed from 83 to 84
enumerator 'SNK_TRYWAIT_VBUS' value changed from 84 to 85
enumerator 'BIST_RX' value changed from 85 to 86
enumerator 'GET_STATUS_SEND' value changed from 86 to 87
enumerator 'GET_STATUS_SEND_TIMEOUT' value changed from 87 to 88
enumerator 'GET_PPS_STATUS_SEND' value changed from 88 to 89
enumerator 'GET_PPS_STATUS_SEND_TIMEOUT' value changed from 89 to 90
enumerator 'GET_SINK_CAP' value changed from 90 to 91
enumerator 'GET_SINK_CAP_TIMEOUT' value changed from 91 to 92
enumerator 'ERROR_RECOVERY' value changed from 92 to 93
enumerator 'PORT_RESET' value changed from 93 to 94
enumerator 'PORT_RESET_WAIT_OFF' value changed from 94 to 95
enumerator 'AMS_START' value changed from 95 to 96
enumerator 'CHUNK_NOT_SUPP' value changed from 96 to 97
type 'struct cfg80211_ap_settings' changed
byte size changed from 904 to 912
member 'u16 punct_bitmap' was added
type 'struct bss_parameters' changed
member 'int link_id' was added
3 members ('int use_cts_prot' .. 'int use_short_slot_time') changed
offset changed by 32
type 'struct cfg80211_csa_settings' changed
member 'u16 punct_bitmap' was added
type 'struct sbitmap_queue' changed
byte size changed from 56 to 64
member 'atomic_t completion_cnt' was added
member 'atomic_t wakeup_cnt' was added
type 'struct mem_cgroup_per_node' changed
byte size changed from 2080 to 2096
9 members ('struct lruvec_stats_percpu* lruvec_stats_percpu' .. 'struct mem_cgroup* memcg') changed
offset changed by 128
type 'struct sbq_wait_state' changed
member 'atomic_t wait_cnt' was removed
member 'wait_queue_head_t wait' changed
offset changed by -64
type 'struct pkvm_module_ops' changed
byte size changed from 208 to 496
member 'int(* host_share_hyp)(u64)' was added
member 'int(* host_unshare_hyp)(u64)' was added
member 'int(* pin_shared_mem)(void*, void*)' was added
member 'void(* unpin_shared_mem)(void*, void*)' was added
5 members ('void*(* memcpy)(void*, const void*, size_t)' .. 'unsigned long(* kern_hyp_va)(unsigned long)') changed
offset changed by 256
member 'u64 android_kabi_reserved1' was added
member 'u64 android_kabi_reserved2' was added
member 'u64 android_kabi_reserved3' was added
member 'u64 android_kabi_reserved4' was added
member 'u64 android_kabi_reserved5' was added
member 'u64 android_kabi_reserved6' was added
member 'u64 android_kabi_reserved7' was added
member 'u64 android_kabi_reserved8' was added
member 'u64 android_kabi_reserved9' was added
member 'u64 android_kabi_reserved10' was added
member 'u64 android_kabi_reserved11' was added
member 'u64 android_kabi_reserved12' was added
member 'u64 android_kabi_reserved13' was added
member 'u64 android_kabi_reserved14' was added
member 'u64 android_kabi_reserved15' was added
member 'u64 android_kabi_reserved16' was added
member 'u64 android_kabi_reserved17' was added
member 'u64 android_kabi_reserved18' was added
member 'u64 android_kabi_reserved19' was added
member 'u64 android_kabi_reserved20' was added
member 'u64 android_kabi_reserved21' was added
member 'u64 android_kabi_reserved22' was added
member 'u64 android_kabi_reserved23' was added
member 'u64 android_kabi_reserved24' was added
member 'u64 android_kabi_reserved25' was added
member 'u64 android_kabi_reserved26' was added
member 'u64 android_kabi_reserved27' was added
member 'u64 android_kabi_reserved28' was added
member 'u64 android_kabi_reserved29' was added
member 'u64 android_kabi_reserved30' was added
member 'u64 android_kabi_reserved31' was added
member 'u64 android_kabi_reserved32' was added
type 'struct kvm_vcpu' changed
byte size changed from 9696 to 9680
5 members ('struct kvm_vcpu_stat stat' .. 'u64 last_used_slot_gen') changed
offset changed by -128
type 'struct kvm_vcpu_arch' changed
byte size changed from 8464 to 8448
member 'struct task_struct* parent_task' was removed
12 members ('struct { struct kvm_guest_debug_arch regs; u64 pmscr_el1; u64 trfcr_el1; } host_debug_state' .. 'struct { u64 last_steal; gpa_t base; } steal') changed
offset changed by -64
Bug: 273751441
Change-Id: I7a5d2599515e67b55871f17eafd239c6cbf136bd
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Prevent the imminent collision between the upstream quirk bits (now up
to '1 << 19') and the Android quirk bits (starting at '1 << 20') by
moving the Android quirk bits into their own field in struct ufs_hba.
Bug: 162257402
Change-Id: I5373c092734d16f693300d9bd73c7c9063ac921e
Signed-off-by: Eric Biggers <ebiggers@google.com>
Non-protected mode relies on the host to restore its SVE state if
necessary. However, protected VMs shouldn't reveal any
information to the host, including whether they have potentially
dirtied the host's sve state. Therefore, save and restore the
host's sve state at hyp in protected mode.
Currently this behavior applies to protected and non-protected
VMs in protected mode. It could be optimised for non-protected
VMs by applying the same behavior as non-protected mode, which is
to inform the host that it should restore its sve state. But for
now it's kept this way to maintain the same behavior for all VMs
in protected mode.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: Ifbcc64b387c3f821a6c1047e8c843f6250a3f690
The code for deactivating traps, to be able to update the fpsimd
registers, is the only code in this file that is n/vhe specific.
Move it to specialized functions.
This is also needed for the subsequent patch, since the logic for
deciding which traps to enable/disable will get more complex.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: Ia0477450aa9319a46a91b3c31c1910ad02fbe246
In subsequent patches, vhe/pKVM(nvhe) will diverge significantly
on saving the host fpsimd/sve state when taking a guest fpsimd
trap. Add a specialized helper to handle that.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: Ib6b13cafad8bf568694804e3b55e0a5a4fcd70a4
Allocate memory and donate it to hyp at setup time for tracking
the host sve state at hyp in protected mode. This memory is used
in the subsequent patch.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: If07eec9ea9c7b216d02e2d1ea69bd62d99f08081
The code to determine the maximum sve vector length by the system
isn't trivial. In subsequent patches hyp needs to know it for
allocating memory for the host sve state.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: I2561af67722a99d8a989b26cb47d073eba3869ff
Subsequent patches will augment this state to allocate space for
tracking the host sve state. SVE state size is not static, and
there isn't support for dynamic per_cpu allocation in hyp.
This is done as a first step in allowing us to allocate SVE state
under the same umbrella.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: I0902623a5ab81a80105f5b00a26765d257bc1ceb
The state will be augmented in future patches and accessed in
more than one location. It makes it easier to reason about the
code.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: If3a3a9266c201f63c126860b61da9698be9b9faa
Subsequent patches will change how the fpsimd state is allocated,
and add tracking of sve state. Moving this to a helper makes
future code cleaner and patches easier to reason about.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: Ic46b8889c1fe11f0cfdd7b5f3d2b98bf412183f0
Before the conversion of the various booleans into an enum
representing the state, this helper clarified things. Since the
introduction of the enum, the helper obfuscates rather than
helps.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: I83c870146ed2d910bf10d625d1048b95c8b23736
pKVM maintains its own state for tracking the host fpsimd state.
Therefore, no need to map and share the host's view with it.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: I5e5164a7694881ffa641b5b6a8691a542fd55a14
Expand comment clarifying why the host value representing sve
vector length being restored for ZCR_EL1 on guest exit isn't the
same as it was on guest entry.
Signed-off-by: Fuad Tabba <tabba@google.com>
Bug: 267291591
Change-Id: I5889407b4391a80dfcf77b31375c3a17705b68da
This reverts commit ad8dbd4420.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: If9e034122448b199c0c98b689c6e6d0e52d388fd
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit eb1f5e4656.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: I2f1ddf6bb64c0b7bc4a6b653d60cdd256820080d
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 8c8619f60e.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: Ic7a4708cc761cc7343ceee47ffb63a7cde23516a
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 1a291b98a3.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: I53ec3c627f1a697a1d2054f85e9d794e6e18df37
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit b5a444808a.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: I595ea6488d44620f3576e4c9ecf5dc7e4d269909
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit e0f8567110.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: I4f91f78a34c8ab5761d85ebb4061f9620eec592a
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 77bcc673f6.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: I0737d8ce7f08323128fa61389169e5e692b56351
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 1e993e7647.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: I7b6c62480765cfa9751874641092b2a52229ea84
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit f12f3bc9c7.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: Idfe61bf8ce3a83c66c31769b845572454f0f196b
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 8ec4245b45.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: I82776674a83f38800e3144d025631e4256cc53f4
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 2b47e2bee0.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: I7efd4dd7abde9f5baa37cb3731aa40b7ff94d2bb
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit e7db10fe57.
It was perserving the ABI, but that is not needed anymore at this point
in time.
Change-Id: Ifc79e504dbf17466a88ac76162ed77dcb5c13d19
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Recall that the per-node memcg LRU has two generations and they alternate
when the last memcg (of a given node) is moved from one to the other.
Each generation is also sharded into multiple bins to improve scalability.
A reclaimer starts with a random bin (in the old generation) and, if it
fails, it will retry, i.e., to try the rest of the bins.
If a reclaimer fails with the last memcg, it should move this memcg to the
young generation first, which causes the generations to alternate, and
then retry. Otherwise, the retries will be futile because all other bins
are empty.
Link: https://lkml.kernel.org/r/20230213075322.1416966-1-yuzhao@google.com
Fixes: e4dde56cd2 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit 9f550d78b4)
Change-Id: Ie92535676b005ec9e7987632b742fdde8d54436f
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Among the flags in scan_control:
1. sc->may_swap, which indicates swap constraint due to memsw.max, is
supported as usual.
2. sc->proactive, which indicates reclaim by memory.reclaim, may not
opportunistically skip the aging path, since it is considered less
latency sensitive.
3. !(sc->gfp_mask & __GFP_IO), which indicates IO constraint, lowers
swappiness to prioritize file LRU, since clean file folios are more
likely to exist.
4. sc->may_writepage and sc->may_unmap, which indicates opportunistic
reclaim, are rejected, since unmapped clean folios are already
prioritized. Scanning for more of them is likely futile and can
cause high reclaim latency when there is a large number of memcgs.
The rest are handled by the existing code.
Link: https://lkml.kernel.org/r/20221222041905.2431096-8-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit e9d4e1ee78)
[TJ: Resolved conflict with older function signature for min_cgroup_below_min, and over
cdded86118 ("ANDROID: MGLRU: Don't skip anon reclaim if swap low")]
Change-Id: Ic2e779eaf4e91a3921831b4e2fa10c740dc59d50
Signed-off-by: T.J. Mercier <tjmercier@google.com>
For each node, memcgs are divided into two generations: the old and
the young. For each generation, memcgs are randomly sharded into
multiple bins to improve scalability. For each bin, an RCU hlist_nulls
is virtually divided into three segments: the head, the tail and the
default.
An onlining memcg is added to the tail of a random bin in the old
generation. The eviction starts at the head of a random bin in the old
generation. The per-node memcg generation counter, whose reminder (mod
2) indexes the old generation, is incremented when all its bins become
empty.
There are four operations:
1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in
its current generation (old or young) and updates its "seg" to
"head";
2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in
its current generation (old or young) and updates its "seg" to
"tail";
3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in
the old generation, updates its "gen" to "old" and resets its "seg"
to "default";
4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin
in the young generation, updates its "gen" to "young" and resets
its "seg" to "default".
The events that trigger the above operations are:
1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
2. The first attempt to reclaim an memcg below low, which triggers
MEMCG_LRU_TAIL;
3. The first attempt to reclaim an memcg below reclaimable size
threshold, which triggers MEMCG_LRU_TAIL;
4. The second attempt to reclaim an memcg below reclaimable size
threshold, which triggers MEMCG_LRU_YOUNG;
5. Attempting to reclaim an memcg below min, which triggers
MEMCG_LRU_YOUNG;
6. Finishing the aging on the eviction path, which triggers
MEMCG_LRU_YOUNG;
7. Offlining an memcg, which triggers MEMCG_LRU_OLD.
Note that memcg LRU only applies to global reclaim, and the
round-robin incrementing of their max_seq counters ensures the
eventual fairness to all eligible memcgs. For memcg reclaim, it still
relies on mem_cgroup_iter().
Link: https://lkml.kernel.org/r/20221222041905.2431096-7-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit e4dde56cd2)
[TJ: Resolved conflicts with older function signatures for
min_cgroup_below_min / min_cgroup_below_low and includes]
Change-Id: Idc8a0f635e035d72dd911f807d1224cb47cbd655
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Recall that the aging produces the youngest generation: first it scans
for accessed folios and updates their gen counters; then it increments
lrugen->max_seq.
The current aging fairness safeguard for kswapd uses two passes to
ensure the fairness to multiple eligible memcgs. On the first pass,
which is shared with the eviction, it checks whether all eligible
memcgs are low on cold folios. If so, it requires a second pass, on
which it ages all those memcgs at the same time.
With memcg LRU, the aging, while ensuring eventual fairness, will run
when necessary. Therefore the current aging fairness safeguard for
kswapd will not be needed.
Note that memcg LRU only applies to global reclaim. For memcg reclaim,
the aging can be unfair to different memcgs, i.e., their
lrugen->max_seq can be incremented at different paces.
Link: https://lkml.kernel.org/r/20221222041905.2431096-5-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit 7348cc9182)
[TJ: Resolved conflicts with older function signatures for
min_cgroup_below_min / min_cgroup_below_low]
Change-Id: I6e36ecfbaaefbc0a56d9a9d5d7cbe404ed7f57a5
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Recall that the eviction consumes the oldest generation: first it
bucket-sorts folios whose gen counters were updated by the aging and
reclaims the rest; then it increments lrugen->min_seq.
The current eviction fairness safeguard for global reclaim has a
dilemma: when there are multiple eligible memcgs, should it continue
or stop upon meeting the reclaim goal? If it continues, it overshoots
and increases direct reclaim latency; if it stops, it loses fairness
between memcgs it has taken memory away from and those it has yet to.
With memcg LRU, the eviction, while ensuring eventual fairness, will
stop upon meeting its goal. Therefore the current eviction fairness
safeguard for global reclaim will not be needed.
Note that memcg LRU only applies to global reclaim. For memcg reclaim,
the eviction will continue, even if it is overshooting. This becomes
unconditional due to code simplification.
Link: https://lkml.kernel.org/r/20221222041905.2431096-4-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit a579086c99)
Change-Id: I08ac1b3c90e29cafd0566785aaa4bcdb5db7d22c
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Patch series "mm: multi-gen LRU: memcg LRU", v3.
Overview
========
An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
since each node and memcg combination has an LRU of folios (see
mem_cgroup_lruvec()).
Its goal is to improve the scalability of global reclaim, which is
critical to system-wide memory overcommit in data centers. Note that
memcg reclaim is currently out of scope.
Its memory bloat is a pointer to each lruvec and negligible to each
pglist_data. In terms of traversing memcgs during global reclaim, it
improves the best-case complexity from O(n) to O(1) and does not affect
the worst-case complexity O(n). Therefore, on average, it has a sublinear
complexity in contrast to the current linear complexity.
The basic structure of an memcg LRU can be understood by an analogy to
the active/inactive LRU (of folios):
1. It has the young and the old (generations), i.e., the counterparts
to the active and the inactive;
2. The increment of max_seq triggers promotion, i.e., the counterpart
to activation;
3. Other events trigger similar operations, e.g., offlining an memcg
triggers demotion, i.e., the counterpart to deactivation.
In terms of global reclaim, it has two distinct features:
1. Sharding, which allows each thread to start at a random memcg (in
the old generation) and improves parallelism;
2. Eventual fairness, which allows direct reclaim to bail out at will
and reduces latency without affecting fairness over some time.
The commit message in patch 6 details the workflow:
https://lore.kernel.org/r/20221222041905.2431096-7-yuzhao@google.com/
The following is a simple test to quickly verify its effectiveness.
Test design:
1. Create multiple memcgs.
2. Each memcg contains a job (fio).
3. All jobs access the same amount of memory randomly.
4. The system does not experience global memory pressure.
5. Periodically write to the root memory.reclaim.
Desired outcome:
1. All memcgs have similar pgsteal counts, i.e., stddev(pgsteal)
over mean(pgsteal) is close to 0%.
2. The total pgsteal is close to the total requested through
memory.reclaim, i.e., sum(pgsteal) over sum(requested) is close
to 100%.
Actual outcome [1]:
MGLRU off MGLRU on
stddev(pgsteal) / mean(pgsteal) 75% 20%
sum(pgsteal) / sum(requested) 425% 95%
####################################################################
MEMCGS=128
for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
mkdir /sys/fs/cgroup/memcg$memcg
done
start() {
echo $BASHPID > /sys/fs/cgroup/memcg$memcg/cgroup.procs
fio -name=memcg$memcg --numjobs=1 --ioengine=mmap \
--filename=/dev/zero --size=1920M --rw=randrw \
--rate=64m,64m --random_distribution=random \
--fadvise_hint=0 --time_based --runtime=10h \
--group_reporting --minimal
}
for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
start &
done
sleep 600
for ((i = 0; i < 600; i++)); do
echo 256m >/sys/fs/cgroup/memory.reclaim
sleep 6
done
for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
grep "pgsteal " /sys/fs/cgroup/memcg$memcg/memory.stat
done
####################################################################
[1]: This was obtained from running the above script (touches less
than 256GB memory) on an EPYC 7B13 with 512GB DRAM for over an
hour.
This patch (of 8):
The new name lru_gen_folio will be more distinct from the coming
lru_gen_memcg.
Link: https://lkml.kernel.org/r/20221222041905.2431096-1-yuzhao@google.com
Link: https://lkml.kernel.org/r/20221222041905.2431096-2-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit 391655fe08)
Change-Id: I7df67e0e2435ba28f10eaa57d28d98b61a9210a6
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Users of dmabuf_page_pool should not need to refer to its fields, so
hide them from the KMI. Add dmabuf_page_pool_get_size to fullfill the
needs of users.
Bug: 264474028
Change-Id: I848ff52e73a13568f561deeb6aea48f40dc0960b
Signed-off-by: T.J. Mercier <tjmercier@google.com>
kmap_atomic was deprecated in 5.11, and checkpatch now warns about use
of it. Replace with kmap_local_page, and do not manually disable
preemption or page faults.
Bug: 264474028
Fixes: 818b4f6bb8b8 ("ANDROID: dma-buf: system_heap: Add pagepool support to system heap")
Change-Id: Idd6413ff56aadf4fd925acb6f567366d0e03166f
Signed-off-by: T.J. Mercier <tjmercier@google.com>
We should use spinlock to protect page pool's critical section as
1. The critical section is short, using spinlock is more efficient.
2. Spinlock could protect priority inversion. Ex. Low priority
thread (dmabuf-deferred) hold the page lock but get scheduled
out under heavy loading. Then the other high priority threads
need to wait for dmabuf-deferred to release the lock. It causes
long allocation latency and possible UI jank.
Also, we could move NR_KERNEL_MISC_RECLAIMABLE stat out of the
critical section to make it shorter as mod_node_page_state can
handle concurrent access cases.
Bug: 245454030
Change-Id: I15f349f9e893621f71ca79f1de037de184c33edf
Signed-off-by: Martin Liu <liumartin@google.com>
dmabuf_page_pool_init_shrinker needs to be static to prevent a warning
when compiling with -Wmissing-prototypes. Change it to be static.
Fixes: e7dac4c323 ("ANDROID: dma-buf: heaps: Add a shrinker controlled page pool")
Bug: 168742043
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I64184cf4062e33c14a60b9c3d505db922f2b9c0b
Since vendors might depend on them for their system heap
implementations, make the page-pool library built-in to
freeze its KMI.
Bug: 183902174
Bug: 212210831
Change-Id: If633619ec1f78d0fbd73c43c48b19d98db7807af
Signed-off-by: Hridya Valsaraju <hridya@google.com>