commit b8897dc54e upstream.
Remove the Qbv BaseTime restriction for I226 so that the BaseTime can be
scheduled to the future time. A new register bit of Tx Qav Control
(Bit-7: FutScdDis) was introduced to allow I226 scheduling future time as
Qbv BaseTime and not having the Tx hang timeout issue.
Besides, according to datasheet section 7.5.2.9.3.3, FutScdDis bit has to
be configured first before the cycle time and base time.
Indeed the FutScdDis bit is only active on re-configuration, thus we have
to set the BASET_L to zero and then only set it to the desired value.
Please also note that the Qbv configuration flow is moved around based on
the Qbv programming guideline that is documented in the latest datasheet.
Co-developed-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit efb78fa86e ("lib/test_meminit: allocate pages up to order
MAX_ORDER") works great in kernels 6.4 and newer thanks to commit
23baf831a3 ("mm, treewide: redefine MAX_ORDER sanely"), but for older
kernels, the loop is off by one, which causes crashes when the test
runs.
Fix this up by changing "<= MAX_ORDER" "< MAX_ORDER" to allow the test
to work properly for older kernel branches.
Fixes: 421855d0d2 ("lib/test_meminit: allocate pages up to order MAX_ORDER")
Cc: Andrew Donnellan <ajd@linux.ibm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Xiaoke Wang <xkernel.wang@foxmail.com>
Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Change-Id: I3c6b6482a276b51273be5fa38ab9d5539d0878fd
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 38fd36728f)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Whitelist pm_suspend_target_state and pm_suspend_global_flags.
Update ABI representation accordingly.
Adding the following symbols:
- pm_suspend_target_state
Bug: 306080942
Change-Id: I78083bdd2ac5bb662f59b2592aa167cae4a7dc30
Signed-off-by: Maulik Shah <quic_mkshah@quicinc.com>
Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>
If userspace tried to add a backing file in a fuse_dentry_revalidate
where there wasn't one originally, this would trigger a crash. Disallow
this operation for now.
Bug: 296013218
Fixes: 57f3ff9648 ("ANDROID: fuse-bpf v1.1")
Test: fuse_test passes, following script no longer crashes:
adb shell su root setenforce 0
adb shell su root chmod ug+w /data/media
adb shell su root rm /data/media/Android -rf
adb shell su root mkdir -p /storage/emulated/Android/data/test
adb shell su root ls -l /storage/emulated/Android/data/test
Change-Id: Id8a67c43d1edfa010403d5f17e31109b796998cf
Signed-off-by: liujinbao1 <liujinbao1@xiaomi.corp-partner.google.com>
the reason why we add a vendor hook adjusting alloc_flags:
1 the user only pass parameter size and gfp_flags once. if we mask the __GFP_RECLAIM, we can't distinguish high-order and low-order, they all will not rise reclaim behavior, it's wrong.
2 for __iommu_dma_alloc_pages, there is a loop to try to alloc pages from high-order to low-order fallback, and we add hook callsite to only change the high-order( > costly order) alloc behavior(which high probability will result more overhead than benifit).
which allow low order alloc to do reclaim behavior still, otherwise may end up with alloc fail.
3 in android ION(drivers/dma-buf/heaps/system_heap.c )
there is same logic, high-order alloc will not do reclaim behavior.
so this change and a vendor hook for adjusting alloc_flags, and add a callsite in __iommu_dma_alloc_pages to turn the reclaim behavior.
Bug: 300857012
Change-Id: I30bd634d8ede1cc29c83d52bdd9276c8cf72ac1e
Signed-off-by: lvwenhuan <lvwenhuan@oppo.com>
(cherry picked from commit d6c24c3a63567676de818011403abe5b9b3d38b0)
commit a282a2f105 upstream.
ceph_frame_desc::fd_lens is an int array. decode_preamble() thus
effectively casts u32 -> int but the checks for segment lengths are
written as if on unsigned values. While reading in HELLO or one of the
AUTH frames (before authentication is completed), arithmetic in
head_onwire_len() can get duped by negative ctrl_len and produce
head_len which is less than CEPH_PREAMBLE_LEN but still positive.
This would lead to a buffer overrun in prepare_read_control() as the
preamble gets copied to the newly allocated buffer of size head_len.
Bug: 303173400
Cc: stable@vger.kernel.org
Fixes: cd1a677cad ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
Reported-by: Thelford Williams <thelford@google.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit db8ca8d9b4)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I49eacd72317664d920b13e3fec087d0e14802b93
Add a vendor hook for pagecache hit/miss and other
vendor specific functions.
Bug: 174088128
Bug: 172987241
Signed-off-by: Chiawei Wang <chiaweiwang@google.com>
Change-Id: Ie9f14a69a86b8ed81de766e44e30f2eba1d9bd84
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit db158b4ae0)
Signed-off-by: Jack Lee <liangjlee@google.com>
[ Upstream commit 7433b6d2af ]
Kyle Zeng reported that there is a race between IPSET_CMD_ADD and IPSET_CMD_SWAP
in netfilter/ip_set, which can lead to the invocation of `__ip_set_put` on a
wrong `set`, triggering the `BUG_ON(set->ref == 0);` check in it.
The race is caused by using the wrong reference counter, i.e. the ref counter instead
of ref_netlink.
Bug: 303172721
Fixes: 24e227896b ("netfilter: ipset: Add schedule point in call_ad().")
Reported-by: Kyle Zeng <zengyhkyle@gmail.com>
Closes: https://lore.kernel.org/netfilter-devel/ZPZqetxOmH+w%2Fmyc@westworld/#r
Tested-by: Kyle Zeng <zengyhkyle@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ea5a61d588)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I33a6a6234830c600a4ebd62ed1fee3a48876b98d
[ Upstream commit 24e227896b ]
syzkaller found a repro that causes Hung Task [0] with ipset. The repro
first creates an ipset and then tries to delete a large number of IPs
from the ipset concurrently:
IPSET_ATTR_IPADDR_IPV4 : 172.20.20.187
IPSET_ATTR_CIDR : 2
The first deleting thread hogs a CPU with nfnl_lock(NFNL_SUBSYS_IPSET)
held, and other threads wait for it to be released.
Previously, the same issue existed in set->variant->uadt() that could run
so long under ip_set_lock(set). Commit 5e29dc36bd ("netfilter: ipset:
Rework long task execution when adding/deleting entries") tried to fix it,
but the issue still exists in the caller with another mutex.
While adding/deleting many IPs, we should release the CPU periodically to
prevent someone from abusing ipset to hang the system.
Note we need to increment the ipset's refcnt to prevent the ipset from
being destroyed while rescheduling.
[0]:
INFO: task syz-executor174:268 blocked for more than 143 seconds.
Not tainted 6.4.0-rc1-00145-gba79e9a73284 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor174 state:D stack:0 pid:268 ppid:260 flags:0x0000000d
Call trace:
__switch_to+0x308/0x714 arch/arm64/kernel/process.c:556
context_switch kernel/sched/core.c:5343 [inline]
__schedule+0xd84/0x1648 kernel/sched/core.c:6669
schedule+0xf0/0x214 kernel/sched/core.c:6745
schedule_preempt_disabled+0x58/0xf0 kernel/sched/core.c:6804
__mutex_lock_common kernel/locking/mutex.c:679 [inline]
__mutex_lock+0x6fc/0xdb0 kernel/locking/mutex.c:747
__mutex_lock_slowpath+0x14/0x20 kernel/locking/mutex.c:1035
mutex_lock+0x98/0xf0 kernel/locking/mutex.c:286
nfnl_lock net/netfilter/nfnetlink.c:98 [inline]
nfnetlink_rcv_msg+0x480/0x70c net/netfilter/nfnetlink.c:295
netlink_rcv_skb+0x1c0/0x350 net/netlink/af_netlink.c:2546
nfnetlink_rcv+0x18c/0x199c net/netfilter/nfnetlink.c:658
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x664/0x8cc net/netlink/af_netlink.c:1365
netlink_sendmsg+0x6d0/0xa4c net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x4b8/0x810 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x1f8/0x2a4 net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__arm64_sys_sendmsg+0x80/0x94 net/socket.c:2593
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x84/0x270 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x134/0x24c arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
el0_svc+0x2c/0x7c arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
Bug: 303172721
Reported-by: syzkaller <syzkaller@googlegroups.com>
Fixes: a7b4f989a6 ("netfilter: ipset: IP set core support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit fea199dbf6)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I7d2c299b1d7d298b5abd76306821320546338a77
In commit 74b95023619b ("UPSTREAM: HID: input: map battery system
charging"), which is in the 6.1.60 release, struct hid_device gets a new
field, which breaks the abi. Fix this up by using one of the reserved
slots for upstream changes and update the .stg file to preserve the
build.
type 'struct hid_device' changed
member 'union { struct { __s32 battery_charge_status; u32 padding; }; struct { u64 android_kabi_reserved1; }; union { }; }' was added
member 'u64 android_kabi_reserved1' was removed
Bug: 305125317
Bug: 161946584
Fixes: 74b95023619b ("UPSTREAM: HID: input: map battery system charging")
Cc: Lu Guohong <luguohong@xiaomi.corp-partner.google.com>
Change-Id: I9ac8ffc758563285f88feb6d592471a555f71c14
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
HID descriptors with Battery System (0x85) Charging (0x44) usage are
ignored and POWER_SUPPLY_STATUS_DISCHARGING is always reported to user
space, even when the device is charging.
Map this usage and when it is reported set the right charging status.
In addition, add KUnit tests to make sure that the charging status is
correctly set and reported. They can be run with the usual command:
$ ./tools/testing/kunit/kunit.py run --kunitconfig=drivers/hid
Change-Id: Ib42e823c5fccfe0c281d8855e124c03b20978132
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
(cherry picked from commit a608dc1c06)
Bug: 305125317
Cc: Lu Guohong <luguohong@xiaomi.corp-partner.google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Users complained about OOM errors during fork without triggering
compaction. This can be fixed by modifying the flags used in
mas_expected_entries() so that the compaction will be triggered in low
memory situations. Since mas_expected_entries() is only used during
fork, the extra argument does not need to be passed through.
Additionally, the two test_maple_tree test cases and one benchmark test
were altered to use the correct locking type so that allocations would
not trigger sleeping and thus fail. Testing was completed with lockdep
atomic sleep detection.
The additional locking change requires rwsem support additions to the
tools/ directory through the use of pthreads pthread_rwlock_t. With
this change test_maple_tree works in userspace, as a module, and
in-kernel.
Users may notice that the system gave up early on attempting to start
new processes instead of attempting to reclaim memory.
Link: https://lore.kernel.org/linux-mm/20231012155233.2272446-1-Liam.Howlett@oracle.com/
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Cc: <stable@vger.kernel.org>
Cc: jason.sim@samsung.com
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
(cherry picked from commit 099d7439ce
https: //git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable)
Bug: 305860603
Change-Id: I1ea250b37198fb3234136da4737b75c6ee3d9817
Signed-off-by: john.hsu <john.hsu@mediatek.com>
Reorder the operations for split and spanning stores so that new data is
placed in the tree prior to marking the old data as dead. This will limit
re-walks on dead data to just once instead of a retry loop.
The order of operations is as follows: Create the new data, put the new
data in place, mark the top node of the old data as dead.
Then repair parent links in the reused nodes through all levels of the
tree, following the new nodes downwards. Finally walk the top dead node
looking for nodes that are no longer used, or subtrees that should be
destroyed (marked dead throughout then freed), follow the partially used
nodes downwards to discover other dead nodes and subtrees.
Link: https://lkml.kernel.org/r/20230804165951.2661157-7-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 530f745c76)
Bug: 305159730
Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com>
Change-Id: I145450351adbf8d379dfb0bfa09f4d41e12f177e
Replacing nodes may cause a live lock-up if CPU resources are saturated by
write operations on the tree by continuously retrying on dead nodes. To
avoid the continuous retry scenario, ensure the new node is inserted into
the tree prior to marking the old data as dead. This will define a window
where old and new data is swapped.
When reusing lower level nodes, ensure the parent pointer is updated after
the parent is marked dead. This ensures that the child is still reachable
from the top of the tree, but walking up to a dead node will result in a
single retry that will start a fresh walk from the top down through the
new node.
Link: https://lkml.kernel.org/r/20230804165951.2661157-3-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 72bcf4aa86)
Bug: 305159730
Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com>
Change-Id: Ib052ccf4a73dfe68a0662185226891d59f98b51a
In order to make the nVHE stack size easily configurable,
introduce NVHE_STACK_SHIFT which must be >= PAGE_SHIFT.
Increase the stack size to 8KB if PAGE_SIZE is 4KB, since
some vendors require a larger stack in the hypervisor.
Bug: 305486112
Change-Id: Ic7612d5d5bf9d20db811ce67b177bbda192adf92
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Add a vendor hook for costly order page counting
and other vendor specific functions.
Bug: 174521902
Bug: 172987241
Signed-off-by: Chiawei Wang <chiaweiwang@google.com>
Change-Id: I89206727a462548cc3500b695d85c83ff003eec7
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 369de37804)
Signed-off-by: liangjlee <liangjlee@google.com>
Protected VM's memory cannot be swapped out because the memory pages are
protected from host access.
Once host accesses to those protected pages, the hardware exception is
triggered and may crash the host. So, we have to make those protected
pages be ineligible for swapping or merging by the host kernel to avoid
host access. To do so, we pin the page when it is assigned (donated) to
VM and unpin when VM relinquish the pages or is destroyed. Besides, the
protected VM’s memory requires hypervisor to clear the content before
returning to host, but VMM may free those memory before clearing, it
will result in those memory pages are reclaimed and reused before
totally clearing. Using pin/unpin can also avoid the above problems.
The implementation is described as follows.
- Use rb_tree to store pinned memory pages.
- Pin the page when handling page fault.
Change-Id: I1c4338e409821cf1f8df46a4951803c4f4728f94
Signed-off-by: Jerry Wang <ze-yu.wang@mediatek.com>
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Liju-Clr Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 301179926
Link: https://lore.kernel.org/all/20230919111210.19615-15-yi-de.wu@mediatek.com/
To balance memory usage and performance, GenieZone supports larger
granularity demand paging, called block-based demand paging.
Gzvm driver uses enable_cap to query the hypervisor if it supports
block-based demand paging and the given granularity or not. Meanwhile,
the gzvm driver allocates a shared buffer for storing the physical
pages later.
If the hypervisor supports, every time the gzvm driver handles guest
page faults, it allocates more memory in advance (default: 2MB) for
demand paging. And fills those physical pages into the allocated shared
memory, then calls the hypervisor to map to guest's memory.
The physical pages allocated for block-based demand paging is not
necessary to be contiguous because in many cases, 2MB block is not
followed. 1st, the memory is allocated because of VMM's page fault
(VMM loads kernel image to guest memory before running). In this case,
the page is allocated by the host kernel and using PAGE_SIZE. 2nd is
that guest may return memory to host via ballooning and that is still
4KB (or PAGE_SIZE) granularity. Therefore, we do not have to allocate
physically contiguous 2MB pages.
Change-Id: Id101da5a8ff73dbf5c9a3ab03526cdf5d2f6006e
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: kevenny hsieh <kevenny.hsieh@mediatek.com>
Signed-off-by: Liju-Clr Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 301179926
Link: https://lore.kernel.org/all/20230919111210.19615-16-yi-de.wu@mediatek.com/
This page fault handler helps GenieZone hypervisor to do demand paging.
On a lower level translation fault, GenieZone hypervisor will first
check the fault GPA (guest physical address or IPA in ARM) is valid
e.g. within the registered memory region, then it will setup the
vcpu_run->exit_reason with necessary information for returning to
gzvm driver.
With the fault information, the gzvm driver looks up the physical
address and call the MT_HVC_GZVM_MAP_GUEST to request the hypervisor
maps the found PA to the fault GPA (IPA).
There is one exception, for protected vm, we will populate full VM's
memory region in advance in order to improve performance.
Change-Id: Ia295ef4c43201d731202d03e74889fa6403e7176
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Jerry Wang <ze-yu.wang@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 301179926
Link: https://lore.kernel.org/all/20230919111210.19615-14-yi-de.wu@mediatek.com/
- Consolidate address translation functions to gzvm_mmu.c.
- To improve performance (especially for booting) we allocate memory
pages in advance for protected VM. When the virtual machine monitor
(e.g., crosvm) sets the virtual machine as protected via
enable_cap ioctl, we allocate full memory pages in advance for the VM.
Change-Id: Ifee061e55e8cfa0e46d2b6dd4b84388a875580cb
Signed-off-by: Jerry Wang <ze-yu.wang@mediatek.com>
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Liju-Clr Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 301179926
Link: https://lore.kernel.org/all/20230919111210.19615-12-yi-de.wu@mediatek.com/
The used definitions related to GZVM_REG_* are below and move them to
uapi header file `gzvm.h`.
- Keep `GZVM_REG_SIZE_SHIFT` and `GZVM_REG_SIZE_MASK` so that we can
determine the reg size.
- The other definitions related to GZVM_REG_* are used by crosvm.
- Rename the definitions to make it easy to understand.
Also removed the unused definitions related to GZVM_REG_* in
gzvm_arch.h and fixed sparse warnings.
Change-Id: I7f709f2532b95d418d22beeb4a46257afa4d92f2
Signed-off-by: kevenny hsieh <kevenny.hsieh@mediatek.com>
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Liju-Clr Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 301179926
Link: https://lore.kernel.org/all/20230919111210.19615-8-yi-de.wu@mediatek.com/
[ Upstream commit 01f1ae2733 ]
The synchronize_irq(c->irq) will not return until the IRQ handler
mtk_uart_apdma_irq_handler() is completed. If the synchronize_irq()
holds a spin_lock and waits the IRQ handler to complete, but the
IRQ handler also needs the same spin_lock. The deadlock will happen.
The process is shown below:
cpu0 cpu1
mtk_uart_apdma_device_pause() | mtk_uart_apdma_irq_handler()
spin_lock_irqsave() |
| spin_lock_irqsave()
//hold the lock to wait |
synchronize_irq() |
This patch reorders the synchronize_irq(c->irq) outside the spin_lock
in order to mitigate the bug.
Fixes: 9135408c3a ("dmaengine: mediatek: Add MediaTek UART APDMA support")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Eugen Hristev <eugen.hristev@collabora.com>
Link: https://lore.kernel.org/r/20230806032511.45263-1-duoming@zju.edu.cn
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c0409dd3d1 ]
In idxd_cmd_exec(), wait_event_lock_irq() explicitly calls
spin_unlock_irq()/spin_lock_irq(). If the interrupt is on before entering
wait_event_lock_irq(), it will become off status after
wait_event_lock_irq() is called. Later, wait_for_completion() may go to
sleep but irq is disabled. The scenario is warned in might_sleep().
Fix it by using spin_lock_irqsave() instead of the primitive spin_lock()
to save the irq status before entering wait_event_lock_irq() and using
spin_unlock_irqrestore() instead of the primitive spin_unlock() to restore
the irq status before entering wait_for_completion().
Before the change:
idxd_cmd_exec() {
interrupt is on
spin_lock() // interrupt is on
wait_event_lock_irq()
spin_unlock_irq() // interrupt is enabled
...
spin_lock_irq() // interrupt is disabled
spin_unlock() // interrupt is still disabled
wait_for_completion() // report "BUG: sleeping function
// called from invalid context...
// in_atomic() irqs_disabled()"
}
After applying spin_lock_irqsave():
idxd_cmd_exec() {
interrupt is on
spin_lock_irqsave() // save the on state
// interrupt is disabled
wait_event_lock_irq()
spin_unlock_irq() // interrupt is enabled
...
spin_lock_irq() // interrupt is disabled
spin_unlock_irqrestore() // interrupt is restored to on
wait_for_completion() // No Call trace
}
Fixes: f9f4082dbc ("dmaengine: idxd: remove interrupt disable for cmd_lock")
Signed-off-by: Rex Zhang <rex.zhang@intel.com>
Signed-off-by: Lijun Pan <lijun.pan@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20230916060619.3744220-1-rex.zhang@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit d35652a5fc upstream.
Fei has reported that KASAN triggers during apply_alternatives() on
a 5-level paging machine:
BUG: KASAN: out-of-bounds in rcu_is_watching()
Read of size 4 at addr ff110003ee6419a0 by task swapper/0/0
...
__asan_load4()
rcu_is_watching()
trace_hardirqs_on()
text_poke_early()
apply_alternatives()
...
On machines with 5-level paging, cpu_feature_enabled(X86_FEATURE_LA57)
gets patched. It includes KASAN code, where KASAN_SHADOW_START depends on
__VIRTUAL_MASK_SHIFT, which is defined with cpu_feature_enabled().
KASAN gets confused when apply_alternatives() patches the
KASAN_SHADOW_START users. A test patch that makes KASAN_SHADOW_START
static, by replacing __VIRTUAL_MASK_SHIFT with 56, works around the issue.
Fix it for real by disabling KASAN while the kernel is patching alternatives.
[ mingo: updated the changelog ]
Fixes: 6657fca06e ("x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=y")
Reported-by: Fei Yang <fei.yang@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231012100424.1456-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 427694cfaa upstream.
When NCM is used with hosts like Windows PC, it is observed that there are
multiple NTB's contained in one usb request giveback. Since the driver
unwraps the obtained request data assuming only one NTB is present, we
loose the subsequent NTB's present resulting in data loss.
Fix this by checking the parsed block length with the obtained data
length in usb request and continue parsing after the last byte of current
NTB.
Cc: stable@vger.kernel.org
Fixes: 9f6ce4240a ("usb: gadget: f_ncm.c added")
Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20230927105858.12950-1-quic_kriskura@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>