Don't use the uclamp of current task as the default uclamp for
binders, because the uclamp of current task influence
binders' placement when not in a transaction.
Just use default value 0 and SCHED_CAPACITY_SCALE for binders'
default uclamp min and max. Also replace set_inherited_uclamp with
set_binder_prio_uclamp
Bug: 277389699
Change-Id: I07c4f40c2689dbc7eb23e7d3e2a2f435353dc25f
Signed-off-by: Chungkai Mei <chungkai@google.com>
Per-vcpu flags are updated using a non-atomic RMW operation.
Which means it is possible to get preempted between the read and
write operations.
Another interesting thing to note is that preemption also updates
flags, as we have some flag manipulation in both the load and put
operations.
It is thus possible to lose information communicated by either
load or put, as the preempted flag update will overwrite the flags
when the thread is resumed. This is specially critical if either
load or put has stored information which depends on the physical
CPU the vcpu runs on.
This results in really elusive bugs, and kudos must be given to
Mostafa for the long hours of debugging, and finally spotting
the problem.
Fix it by disabling preemption during the RMW operation, which
ensures that the state stays consistent. Also upgrade vcpu_get_flag
path to use READ_ONCE() to make sure the field is always atomically
accessed.
Fixes: e87abb73e5 ("KVM: arm64: Add helpers to manipulate vcpu flags among a set")
Reported-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230418125737.2327972-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
(cherry picked from commit 35dcb3ac66)
[willdeacon@: also update __vcpu_copy_flag()]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 278750073
Change-Id: I63058ff1494e4092dab9d29cb66c295dd8fe9d86
The existing pKVM code attempts to advertise CSV2/3 using values
initialized to 0, but never set. To advertise CSV2/3 to protected
guests, pass the CSV2/3 values to hyp when initializing hyp's
view of guests' ID_AA64PFR0_EL1.
Similar to non-protected KVM, these are system-wide, rather than
per cpu, for simplicity.
Fixes: 6c30bfb18d ("KVM: arm64: Add handlers for protected VM System Registers")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20230404152321.413064-1-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
(cherry picked from commit e81625218b)
[willdeacon@: fixed_config.h has been moved into kvm_pkvm.h]
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 278750073
Change-Id: I27821a28bcde0dbce3d45bac6cf4de20dcf299f9
Add hook to boost thread when process killed.
Bug: 237749933
Signed-off-by: xieliujie <xieliujie@oppo.com>
Change-Id: I7cc6f248397021f3a8271433144a0e582ed27cfa
(cherry picked from commit 709679142d583b0b7338d931fdd43b27b1bbf9e0)
and sched_waking to let module probe them
Get task info about sleep and waking
Bug: 190422437
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: I828c93f531f84e6133c2c3a7f8faada51683afcf
(cherry picked from commit 13af062abf)
(cherry picked from commit 869954e72dac700580d0ea5734d07b574e41afe9)
Get task info about scheduling delay, iowait, and block time.
It is used to get thread scheduling info when thread happened abnormal situation.
Bug: 189415303
Change-Id: Ib6b548f8a78de5b26d555e9a89e3cc79ea2d1024
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
(cherry picked from commit a6bb1af39d)
(cherry picked from commit 6d8d2ab52facfd6d5de2715e2470872e6a70cf22)
Export get_wchan to get the block reason.
It is used to get the block reason(why the thread blocked in Uninterrupted Sleep) when happened long D state. We use this information check if it's reasonable.
Bug: 205684022
Signed-off-by: xieliujie <xieliujie@oppo.com>
Change-Id: I7b65bb502b805e7dac13e5f9d725da1ff70fe306
(cherry picked from commit 0db6925868)
(cherry picked from commit de72c813d12537ea6ced87b39ffcad446815609a)
Export symbol of the function wq_worker_comm() in kernel/workqueue.c for dlkm to get the description of the kworker process. It is used to get the description when kworker thread happened abnormal situation.
Bug: 208394207
Signed-off-by: zhengding chen <chenzhengding@oppo.com>
Change-Id: I2e7ddd52a15e22e99e6596f16be08243af1bb473
(cherry picked from commit 28de741861)
(cherry picked from commit 87e0e98c25ba8e121975708943335e3abad651d9)
Add a vendor hook for meminfo fixup.
Bug: 277746082
Change-Id: Ifa7850f75ccdf862d900c9a6c00f165b07e84595
Signed-off-by: Martin Liu <liumartin@google.com>
The upstream driver has added start/stop link inline functions that
result in quite a few changes to the designware code. The upstream patch
for this issue, builds on top of these changes. Since we are based on
5.15, it is cleaner to apply a simpler fix on top of the existing code.
In dw_pcie_host_init() regardless of whether the link has been started
or not, the code waits for the link to come up. Even in cases where
start_link() is not defined the code ends up spinning in a loop for 1
second. Since in some systems dw_pcie_host_init() gets called during
probe, this one second loop for each pcie interface instance ends up
extending the boot time.
Call trace when start_link() is not defined:
dw_pcie_wait_for_link << spins in a loop for 1 second
dw_pcie_host_init
Bug: 270085637
Link: https://lore.kernel.org/all/20230412093425.3659088-1-ajayagarwal@google.com/
Change-Id: Ibc42801fa06674e43e921b4976ec83c9fb5483cf
Signed-off-by: Sajid Dalvi <sdalvi@google.com>
With coalescing we don't refcount default PTE entries. Fix an issue
which clears out non-refcounted PTE entries on the unmap path.
Bug: 279165129
Change-Id: Ie4fdabcc420d54c1338272d38abbe393fc5ce75c
Signed-off-by: Sebastian Ene <sebastianene@google.com>
In the process of switching USB config from rndis to other config,
if the hardware does not support the ->pullup callback, or the
hardware encounters a low probability fault, both of them may cause
the ->pullup callback to fail, which will then cause a system panic
(use after free).
The gadget drivers sometimes need to be unloaded regardless of the
hardware's behavior.
Analysis as follows:
=======================================================================
(1) write /config/usb_gadget/g1/UDC "none"
gether_disconnect+0x2c/0x1f8
rndis_disable+0x4c/0x74
composite_disconnect+0x74/0xb0
configfs_composite_disconnect+0x60/0x7c
usb_gadget_disconnect+0x70/0x124
usb_gadget_unregister_driver+0xc8/0x1d8
gadget_dev_desc_UDC_store+0xec/0x1e4
(2) rm /config/usb_gadget/g1/configs/b.1/f1
rndis_deregister+0x28/0x54
rndis_free+0x44/0x7c
usb_put_function+0x14/0x1c
config_usb_cfg_unlink+0xc4/0xe0
configfs_unlink+0x124/0x1c8
vfs_unlink+0x114/0x1dc
(3) rmdir /config/usb_gadget/g1/functions/rndis.gs4
panic+0x1fc/0x3d0
do_page_fault+0xa8/0x46c
do_mem_abort+0x3c/0xac
el1_sync_handler+0x40/0x78
0xffffff801138f880
rndis_close+0x28/0x34
eth_stop+0x74/0x110
dev_close_many+0x48/0x194
rollback_registered_many+0x118/0x814
unregister_netdev+0x20/0x30
gether_cleanup+0x1c/0x38
rndis_attr_release+0xc/0x14
kref_put+0x74/0xb8
configfs_rmdir+0x314/0x374
If gadget->ops->pullup() return an error, function rndis_close() will be
called, then it will causes a use-after-free problem.
=======================================================================
Fixes: 0a55187a1e ("USB: gadget core: Issue ->disconnect() callback from usb_gadget_disconnect()")
Signed-off-by: Jiantao Zhang <water.zhangjiantao@huawei.com>
Signed-off-by: TaoXue <xuetao09@huawei.com>
Link: https://lore.kernel.org/r/20221121130805.10735-1-water.zhangjiantao@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 273510696
Bug: 275027942
Change-Id: I702f324c5852d3b2448081b092fef464f8691989
(cherry picked from commit afdc12887f)
[ray: Resolved minor conflict in drivers/usb/gadget/udc/core.c]
Signed-off-by: Ray Chi <raychi@google.com>
(cherry picked from commit 2ce4ee5f2e02702ce61b07a170eeb9ffede0601a)
Memory passed to kvfree_rcu() that is to be freed is tracked by a
per-CPU kfree_rcu_cpu structure, which in turn contains pointers
to kvfree_rcu_bulk_data structures that contain pointers to memory
that has not yet been handed to RCU, along with an kfree_rcu_cpu_work
structure that tracks the memory that has already been handed to RCU.
These structures track three categories of memory: (1) Memory for
kfree(), (2) Memory for kvfree(), and (3) Memory for both that arrived
during an OOM episode. The first two categories are tracked in a
cache-friendly manner involving a dynamically allocated page of pointers
(the aforementioned kvfree_rcu_bulk_data structures), while the third
uses a simple (but decidedly cache-unfriendly) linked list through the
rcu_head structures in each block of memory.
On a given CPU, these three categories are handled as a unit, with that
CPU's kfree_rcu_cpu_work structure having one pointer for each of the
three categories. Clearly, new memory for a given category cannot be
placed in the corresponding kfree_rcu_cpu_work structure until any old
memory has had its grace period elapse and thus has been removed. And
the kfree_rcu_monitor() function does in fact check for this.
Except that the kfree_rcu_monitor() function checks these pointers one
at a time. This means that if the previous kfree_rcu() memory passed
to RCU had only category 1 and the current one has only category 2, the
kfree_rcu_monitor() function will send that current category-2 memory
along immediately. This can result in memory being freed too soon,
that is, out from under unsuspecting RCU readers.
To see this, consider the following sequence of events, in which:
o Task A on CPU 0 calls rcu_read_lock(), then uses "from_cset",
then is preempted.
o CPU 1 calls kfree_rcu(cset, rcu_head) in order to free "from_cset"
after a later grace period. Except that "from_cset" is freed
right after the previous grace period ended, so that "from_cset"
is immediately freed. Task A resumes and references "from_cset"'s
member, after which nothing good happens.
In full detail:
CPU 0 CPU 1
---------------------- ----------------------
count_memcg_event_mm()
|rcu_read_lock() <---
|mem_cgroup_from_task()
|// css_set_ptr is the "from_cset" mentioned on CPU 1
|css_set_ptr = rcu_dereference((task)->cgroups)
|// Hard irq comes, current task is scheduled out.
cgroup_attach_task()
|cgroup_migrate()
|cgroup_migrate_execute()
|css_set_move_task(task, from_cset, to_cset, true)
|cgroup_move_task(task, to_cset)
|rcu_assign_pointer(.., to_cset)
|...
|cgroup_migrate_finish()
|put_css_set_locked(from_cset)
|from_cset->refcount return 0
|kfree_rcu(cset, rcu_head) // free from_cset after new gp
|add_ptr_to_bulk_krc_lock()
|schedule_delayed_work(&krcp->monitor_work, ..)
kfree_rcu_monitor()
|krcp->bulk_head[0]'s work attached to krwp->bulk_head_free[]
|queue_rcu_work(system_wq, &krwp->rcu_work)
|if rwork->rcu.work is not in WORK_STRUCT_PENDING_BIT state,
|call_rcu(&rwork->rcu, rcu_work_rcufn) <--- request new gp
// There is a perious call_rcu(.., rcu_work_rcufn)
// gp end, rcu_work_rcufn() is called.
rcu_work_rcufn()
|__queue_work(.., rwork->wq, &rwork->work);
|kfree_rcu_work()
|krwp->bulk_head_free[0] bulk is freed before new gp end!!!
|The "from_cset" is freed before new gp end.
// the task resumes some time later.
|css_set_ptr->subsys[(subsys_id) <--- Caused kernel crash, because css_set_ptr is freed.
This commit therefore causes kfree_rcu_monitor() to refrain from moving
kfree_rcu() memory to the kfree_rcu_cpu_work structure until the RCU
grace period has completed for all three categories.
v2: Use helper function instead of inserted code block at kfree_rcu_monitor().
Fixes: 34c8817455 ("rcu: Support kfree_bulk() interface in kfree_rcu()")
Fixes: 5f3c8d6204 ("rcu/tree: Maintain separate array for vmalloc ptrs")
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Ziwei Dai <ziwei.dai@unisoc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Bug: 275289142
Link: https://lore.kernel.org/rcu/1680266529-28429-1-git-send-email-ziwei.dai@unisoc.com/T/#m9d1d6a4542548acddee133a2807511bccf2b01b6
(cherry picked from commit e222f9a512539c3f4093a55d16624d9da614800b
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev.2023.03.30a)
[ziwei.dai: Added missing need_offload_krc() function ]
Change-Id: I63b618ed8454cb2826f04e8789c762be8f1ba1e1
Signed-off-by: Ziwei Dai <ziwei.dai@unisoc.com>
(cherry picked from commit 6ccb91c80a501c800cc48a3fcbb5473f1ec72c02)
Add the following symbols:
- __arm_smccc_sve_check
Bug: 232434016
Change-Id: I8eb7bdfbffcfc46ed5e5ab845aa8e9d892c8d30e
Signed-off-by: Lucas Wei <lucaswei@google.com>
rtl8852be wifi drivers need ABI Symbol list for Amlogic SOC
1 function symbol(s) added
'void cfg80211_ch_switch_started_notify(struct net_device*, struct cfg80211_chan_def*, unsigned int, u8, bool)'
Bug: 278600971
Change-Id: I6782e9f7539a521dd300ed73b73289920d43722c
Signed-off-by: Qinglin Li <qinglin.li@amlogic.com>
More platforms are using the Renesas XHCI controller, so enable it in
the configuration so that we do not have to export a bunch of internal
xhci controller functions that should not be part of any stable api.
Bug: 278153046
Change-Id: I9d8aa6a1783f0bb3bf0d794c7101d1762dd96b3d
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This change fixes a bug where inbound packets to nested IPsec tunnels
fails to pass policy checks due to the inner tunnel's policy checks
not having a reference to the outer policy/template. This causes the
policy check to fail, since the first entries in the secpath correlate
to the outer tunnel, while the templates being verified are for the
inner tunnel.
In order to ensure that the appropriate policy and template context is
searchable, the policy checks must be done incrementally between each
decryption step. As such, this marks secpath entries as having been
successfully matched, skipping them (treating as optional) on subsequent
policy checks
By skipping the immediate error return in the case where the secpath
entry had previously been validated, this change allows secpath entries
that matched a policy/template previously, while still requiring that
each searched template find a match in the secpath.
For security:
- All templates must have matching secpath entries
- Unchanged by current patch; templates that do not match any secpath
entry still return -1. This patch simply allows skipping earlier
blocks of verified secpath entries
- All entries (except trailing transport mode entries) must have a
matching template
- Unvalidated entries, including transport-mode entries still return
the errored index if it does not match the correct template.
Bug: 236423446
Bug: 277711867
Test: Tested against Android Kernel Unit Tests
Test: Tested against Android CTS
Link: https://lore.kernel.org/netdev/20220824221252.4130836-2-benedictwong@google.com/
[benedictwong: fixed minor style issues]
Signed-off-by: Benedict Wong <benedictwong@google.com>
(cherry picked from commit 970e02667c)
Merged-In: Ic32831cb00151d0de2e465f18ec37d5f7b680e54
Change-Id: Ic32831cb00151d0de2e465f18ec37d5f7b680e54
This change ensures that all nested XFRM packets have their policy
checked before decryption of the next layer, so that policies are
verified at each intermediate step of the decryption process.
Notably, raw ESP/AH packets do not perform policy checks inherently,
whereas all other encapsulated packets (UDP, TCP encapsulated) do policy
checks after calling xfrm_input handling in the respective encapsulation
layer.
This is necessary especially for nested tunnels, as the IP addresses,
protocol and ports may all change, thus not matching the previous
policies. In order to ensure that packets match the relevant inbound
templates, the xfrm_policy_check should be done before handing off to
the inner XFRM protocol to decrypt and decapsulate.
In order to prevent double-checking packets both here and in the
encapsulation layers, this check is currently limited to nested
tunnel-mode transforms and checked prior to decapsulation of inner
tunnel layers (prior to hitting a nested tunnel's xfrm_input, there
is no great way to detect a nested tunnel). This is primarily a
performance consideration, as a general blanket check at the end of
xfrm_input would suffice, but may result in multiple policy checks.
Bug: 236423446
Bug: 277711867
Test: Tested against Android Kernel Unit Tests
Link: https://lore.kernel.org/netdev/20220824221252.4130836-3-benedictwong@google.com/
Signed-off-by: Benedict Wong <benedictwong@google.com>
(cherry picked from commit b5bf2997c3)
Merged-In: I20c5abf39512d7f6cf438c0921a78a84e281b4e9
Change-Id: I20c5abf39512d7f6cf438c0921a78a84e281b4e9
kthread_park and wait_woken have a similar race that kthread_stop and
wait_woken used to have before it was fixed in
cb6538e740. Extend that fix to also cover
kthread_park.
Bug: 274686101
Link: https://lore.kernel.org/lkml/20230406194053.876844-1-arve@android.com/
Change-Id: Iaec960d7e30862f4ccac5c98dd43d32bbcf9a72b
Signed-off-by: Arve Hjønnevåg <arve@android.com>
(cherry picked from commit 69eba53950444890063b1e0469a61b69f8301767)
Android places by default the modules into /lib/modules/ instead of using
the default path /lib/modules/<uname>.
Bug: 254835242
Change-Id: I49ed4be25c29302fc9b99a9f2ef5f1c84df3adc9
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Fallback to the default module path (/lib/modules/<uname>) if module
loading failed for the selected path in CONFIG_PKVM_MODULE_PATH. This
intends to follow the same mechanism as Android init.
Bug: 254835242
Change-Id: Ia7764d57fe71521e4a1fe6d2c85ba057790069a8
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Currently, no module path will be given to modprobe when loading a pKVM
module, the module must then be found in /lib/modules/<uname>. Add
CONFIG_PKVM_MODULE_PATH to allow setting a different path from the
kernel config.
Bug: 254835242
Change-Id: I4f355518628b44ac03de2cee3d7a90e1ad5bf1e2
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
This patch adds POSIX_FADV_NOREUSE to vma_has_recency() so that the LRU
algorithm can ignore access to mapped files marked by this flag.
The advantages of POSIX_FADV_NOREUSE are:
1. Unlike MADV_SEQUENTIAL and MADV_RANDOM, it does not alter the
default readahead behavior.
2. Unlike MADV_SEQUENTIAL and MADV_RANDOM, it does not split VMAs and
therefore does not take mmap_lock.
3. Unlike MADV_COLD, setting it has a negligible cost, regardless of
how many pages it affects.
Its limitations are:
1. Like POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL, it currently does
not support range. IOW, its scope is the entire file.
2. It currently does not ignore access through file descriptors.
Specifically, for the active/inactive LRU, given a file page shared
by two users and one of them having set POSIX_FADV_NOREUSE on the
file, this page will be activated upon the second user accessing
it. This corner case can be covered by checking POSIX_FADV_NOREUSE
before calling folio_mark_accessed() on the read path. But it is
considered not worth the effort.
There have been a few attempts to support POSIX_FADV_NOREUSE, e.g., [1].
This time the goal is to fill a niche: a few desktop applications, e.g.,
large file transferring and video encoding/decoding, want fast file
streaming with mmap() rather than direct IO. Among those applications, an
SVT-AV1 regression was reported when running with MGLRU [2]. The
following test can reproduce that regression.
kb=$(awk '/MemTotal/ { print $2 }' /proc/meminfo)
kb=$((kb - 8*1024*1024))
modprobe brd rd_nr=1 rd_size=$kb
dd if=/dev/zero of=/dev/ram0 bs=1M
mkfs.ext4 /dev/ram0
mount /dev/ram0 /mnt/
swapoff -a
fallocate -l 8G /mnt/swapfile
mkswap /mnt/swapfile
swapon /mnt/swapfile
wget http://ultravideo.cs.tut.fi/video/Bosphorus_3840x2160_120fps_420_8bit_YUV_Y4M.7z
7z e -o/mnt/ Bosphorus_3840x2160_120fps_420_8bit_YUV_Y4M.7z
SvtAv1EncApp --preset 12 -w 3840 -h 2160 \
-i /mnt/Bosphorus_3840x2160.y4m
For MGLRU, the following change showed a [9-11]% increase in FPS,
which makes it on par with the active/inactive LRU.
patch Source/App/EncApp/EbAppMain.c <<EOF
31a32
> #include <fcntl.h>
35d35
< #include <fcntl.h> /* _O_BINARY */
117a118
> posix_fadvise(config->mmap.fd, 0, 0, POSIX_FADV_NOREUSE);
EOF
[1] https://lore.kernel.org/r/1308923350-7932-1-git-send-email-andrea@betterlinux.com/
[2] https://openbenchmarking.org/result/2209259-PTS-MGLRU8GB57
Link: https://lkml.kernel.org/r/20221230215252.2628425-2-yuzhao@google.com
Change-Id: Iee2a7df5ccd86162089e007e32f9fa9b2b9f198b
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 17e810229c)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Add vma_has_recency() to indicate whether a VMA may exhibit temporal
locality that the LRU algorithm relies on.
This function returns false for VMAs marked by VM_SEQ_READ or
VM_RAND_READ. While the former flag indicates linear access, i.e., a
special case of spatial locality, both flags indicate a lack of temporal
locality, i.e., the reuse of an area within a relatively small duration.
"Recency" is chosen over "locality" to avoid confusion between temporal
and spatial localities.
Before this patch, the active/inactive LRU only ignored the accessed bit
from VMAs marked by VM_SEQ_READ. After this patch, the active/inactive
LRU and MGLRU share the same logic: they both ignore the accessed bit if
vma_has_recency() returns false.
For the active/inactive LRU, the following fio test showed a [6, 8]%
increase in IOPS when randomly accessing mapped files under memory
pressure.
kb=$(awk '/MemTotal/ { print $2 }' /proc/meminfo)
kb=$((kb - 8*1024*1024))
modprobe brd rd_nr=1 rd_size=$kb
dd if=/dev/zero of=/dev/ram0 bs=1M
mkfs.ext4 /dev/ram0
mount /dev/ram0 /mnt/
swapoff -a
fio --name=test --directory=/mnt/ --ioengine=mmap --numjobs=8 \
--size=8G --rw=randrw --time_based --runtime=10m \
--group_reporting
The discussion that led to this patch is here [1]. Additional test
results are available in that thread.
[1] https://lore.kernel.org/r/Y31s%2FK8T85jh05wH@google.com/
Link: https://lkml.kernel.org/r/20221230215252.2628425-1-yuzhao@google.com
Change-Id: I3e0dfa1ca2a3c14fb0239b8f612c697e1d8d6a64
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8788f67814)
[TJ: #include <linux/mm_inline.h>, folio -> page renames]
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
When running as a Xen PV guests commit eed9a328aa ("mm: x86: add
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG") can cause a protection violation in
pmdp_test_and_clear_young():
BUG: unable to handle page fault for address: ffff8880083374d0
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 3026067 P4D 3026067 PUD 3027067 PMD 7fee5067 PTE 8010000008337065
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 158 Comm: kswapd0 Not tainted 6.1.0-rc5-20221118-doflr+ #1
RIP: e030:pmdp_test_and_clear_young+0x25/0x40
This happens because the Xen hypervisor can't emulate direct writes to
page table entries other than PTEs.
This can easily be fixed by introducing arch_has_hw_nonleaf_pmd_young()
similar to arch_has_hw_pte_young() and test that instead of
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG.
Link: https://lkml.kernel.org/r/20221123064510.16225-1-jgross@suse.com
Fixes: eed9a328aa ("mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG")
Change-Id: Ib88d54891c619feccc7eb40a14d5ba6e349912d6
Signed-off-by: Juergen Gross <jgross@suse.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Yu Zhao <yuzhao@google.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: David Hildenbrand <david@redhat.com> [core changes]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 4aaf269c76)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Set KMI_GENERATION=4 for 4/12 KMI update
function symbol 'struct block_device* I_BDEV(struct inode*)' changed
CRC changed from 0x6d0d2bcc to 0xf2df037e
function symbol 'void* PDE_DATA(const struct inode*)' changed
CRC changed from 0xbb8786db to 0x3c36f860
function symbol 'void __ClearPageMovable(struct page*)' changed
CRC changed from 0x6cd6821a to 0xafefd4e
... 2828 omitted; 2831 symbols have only CRC changes
type 'struct dentry_operations' changed
member changed from 'void(* d_canonical_path)(const struct path*, struct path*)' to 'int(* d_canonical_path)(const struct path*, struct path*)'
type changed from 'void(*)(const struct path*, struct path*)' to 'int(*)(const struct path*, struct path*)'
pointed-to type changed from 'void(const struct path*, struct path*)' to 'int(const struct path*, struct path*)'
return type changed from 'void' to 'int'
Bug: 277759776
Change-Id: I5f3ed46e6804dcf0db745d4e6dc7c3a317f64648
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Previously errors from the daemon in FUSE_CANONICAL_PATH were simply
ignored. In order to block inotifys, it is useful to be able to return
errors from this opcode.
Bug: 238619640
Test: inotify no longer works on /storage/emulated/0/Android/media but
does on child folders
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: Icb15c090c6286c174338471a787712f8388de316
SoCs featuring peripherals that can issue non-coherent DMA traffic
beyond the point of coherency (PoC) present multiple challenges for the
DMA-API implementation in Linux. Many of these challenges can be
overcome by suitable configuration of the interconnect, however the
presence of a cacheable alias for non-cacheable buffers can still lead
to coherence issues arising when stale clean lines are back-snooped from
the cache hierarchy to satisfy a non-cacheable transaction at the PoC.
Removing all cacheable aliases on a case-by-cases basis is both
error-prone and expensive. Instead, leverage the stage-2 identity
mapping installed by pKVM to enforce consistent cacheability for all
stage-1 aliases.
Bug: 240786634
Change-Id: I78b0aa51fe3e23811bbd25481173086aa957c4bf
Signed-off-by: Will Deacon <willdeacon@google.com>