[ Upstream commit 82e0572b23 ]
During tx rings configuration default XPS queue config is set and
__I40E_TX_XPS_INIT_DONE is locked. __I40E_TX_XPS_INIT_DONE state is
cleared and set again with default mapping only during queues build,
it means after first setup or reset with queues rebuild. (i.e.
ethtool -L <interface> combined <number>) After other resets (i.e.
ethtool -t <interface>) XPS_INIT_DONE is not cleared and those default
maps cannot be set again. It results in cleared xps_cpus mapping
until queues are not rebuild or mapping is not set by user.
Add clearing __I40E_TX_XPS_INIT_DONE state during reset to let
the driver set xps_cpus to defaults again after it was cleared.
Fixes: 6f853d4f8e ("i40e: allow XPS with QoS enabled")
Signed-off-by: Michal Jaron <michalx.jaron@intel.com>
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8b4fc1390 ]
The pp->indir[0] value comes from the user. It is passed to:
if (cpu_online(pp->rxq_def))
inside the mvneta_percpu_elect() function. It needs bounds checkeding
to ensure that it is not beyond the end of the cpu bitmap.
Fixes: cad5d847a0 ("net: mvneta: Fix the CPU choice in mvneta_percpu_elect")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d50b7914fa ]
A NAPI is setup for each network sring to poll data to kernel
The sring with source host is destroyed before live migration and
new sring with target host is setup after live migration.
The NAPI for the old sring is not deleted until setup new sring
with target host after migration. With busy_poll/busy_read enabled,
the NAPI can be polled before got deleted when resume VM.
BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
IP: xennet_poll+0xae/0xd20
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
Call Trace:
finish_task_switch+0x71/0x230
timerqueue_del+0x1d/0x40
hrtimer_try_to_cancel+0xb5/0x110
xennet_alloc_rx_buffers+0x2a0/0x2a0
napi_busy_loop+0xdb/0x270
sock_poll+0x87/0x90
do_sys_poll+0x26f/0x580
tracing_map_insert+0x1d4/0x2f0
event_hist_trigger+0x14a/0x260
finish_task_switch+0x71/0x230
__schedule+0x256/0x890
recalc_sigpending+0x1b/0x50
xen_sched_clock+0x15/0x20
__rb_reserve_next+0x12d/0x140
ring_buffer_lock_reserve+0x123/0x3d0
event_triggers_call+0x87/0xb0
trace_event_buffer_commit+0x1c4/0x210
xen_clocksource_get_cycles+0x15/0x20
ktime_get_ts64+0x51/0xf0
SyS_ppoll+0x160/0x1a0
SyS_ppoll+0x160/0x1a0
do_syscall_64+0x73/0x130
entry_SYSCALL_64_after_hwframe+0x41/0xa6
...
RIP: xennet_poll+0xae/0xd20 RSP: ffffb4f041933900
CR2: 0000000000000008
---[ end trace f8601785b354351c ]---
xen frontend should remove the NAPIs for the old srings before live
migration as the bond srings are destroyed
There is a tiny window between the srings are set to NULL and
the NAPIs are disabled, It is safe as the NAPI threads are still
frozen at that time
Signed-off-by: Lin Liu <lin.liu@citrix.com>
Fixes: 4ec2411980 ([NET]: Do not check netif_running() and carrier state in ->poll())
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 25f427ac7b ]
A loop for reading MISTAT register continues while regmap_read() fails
and (mistat & BUSY), but if regmap_read() fails a value of mistat is
undefined.
The patch proposes to check for BUSY flag only when regmap_read()
succeed. Compile test only.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: d70e53262f ("net: Microchip encx24j600 driver")
Signed-off-by: Valentina Goncharenko <goncharenko.vp@ispras.ru>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 167b3f2dcc ]
In functions regmap_encx24j600_phy_reg_read() and
regmap_encx24j600_phy_reg_write() in the conditions of the waiting
cycles for filling the variable 'ret' it is necessary to add parentheses
to prevent wrong assignment due to logical operations precedence.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: d70e53262f ("net: Microchip encx24j600 driver")
Signed-off-by: Valentina Goncharenko <goncharenko.vp@ispras.ru>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b3d72d3135 ]
Kernel fault injection test reports null-ptr-deref as follows:
BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:cfg802154_netdev_notifier_call+0x120/0x310 include/linux/list.h:114
Call Trace:
<TASK>
raw_notifier_call_chain+0x6d/0xa0 kernel/notifier.c:87
call_netdevice_notifiers_info+0x6e/0xc0 net/core/dev.c:1944
unregister_netdevice_many_notify+0x60d/0xcb0 net/core/dev.c:1982
unregister_netdevice_queue+0x154/0x1a0 net/core/dev.c:10879
register_netdevice+0x9a8/0xb90 net/core/dev.c:10083
ieee802154_if_add+0x6ed/0x7e0 net/mac802154/iface.c:659
ieee802154_register_hw+0x29c/0x330 net/mac802154/main.c:229
mcr20a_probe+0xaaa/0xcb1 drivers/net/ieee802154/mcr20a.c:1316
ieee802154_if_add() allocates wpan_dev as netdev's private data, but not
init the list in struct wpan_dev. cfg802154_netdev_notifier_call() manage
the list when device register/unregister, and may lead to null-ptr-deref.
Use INIT_LIST_HEAD() on it to initialize it correctly.
Fixes: fcf39e6e88 ("ieee802154: add wpan_dev_list")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20221130091705.1831140-1-weiyongjun@huaweicloud.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2f3957c7eb ]
bt_init() calls bt_leds_init() to register led, but if it fails later,
bt_leds_cleanup() is not called to unregister it.
This can cause panic if the argument "bluetooth-power" in text is freed
and then another led_trigger_register() tries to access it:
BUG: unable to handle page fault for address: ffffffffc06d3bc0
RIP: 0010:strcmp+0xc/0x30
Call Trace:
<TASK>
led_trigger_register+0x10d/0x4f0
led_trigger_register_simple+0x7d/0x100
bt_init+0x39/0xf7 [bluetooth]
do_one_initcall+0xd0/0x4e0
Fixes: e64c97b53b ("Bluetooth: Add combined LED trigger for controller power")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 747da1308b ]
hci_get_route() takes reference, we should use hci_dev_put() to release
it when not need anymore.
Fixes: 6b8d4a6a03 ("Bluetooth: 6LoWPAN: Use connected oriented channel instead of fixed one")
Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 28e96556ba ]
Without this change, the interrupt test fail with MSI-X environment:
$ sudo ethtool -t enp0s2 offline
[ 43.921783] igb 0000:00:02.0: offline testing starting
[ 44.855824] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Down
[ 44.961249] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 51.272202] igb 0000:00:02.0: testing shared interrupt
[ 56.996975] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
The test result is FAIL
The test extra info:
Register test (offline) 0
Eeprom test (offline) 0
Interrupt test (offline) 4
Loopback test (offline) 0
Link test (on/offline) 0
Here, "4" means an expected interrupt was not delivered.
To fix this, route IRQs correctly to the first MSI-X vector by setting
IVAR_MISC. Also, set bit 0 of EIMS so that the vector will not be
masked. The interrupt test now runs properly with this change:
$ sudo ethtool -t enp0s2 offline
[ 42.762985] igb 0000:00:02.0: offline testing starting
[ 50.141967] igb 0000:00:02.0: testing shared interrupt
[ 56.163957] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
The test result is PASS
The test extra info:
Register test (offline) 0
Eeprom test (offline) 0
Interrupt test (offline) 0
Loopback test (offline) 0
Link test (on/offline) 0
Fixes: 4eefa8f013 ("igb: add single vector msi-x testing to interrupt test")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eed913f691 ]
e1000_xmit_frame is expected to stop the queue and dispatch frames to
hardware if there is not sufficient space for the next frame in the
buffer, but sometimes it failed to do so because the estimated maximum
size of frame was wrong. As the consequence, the later invocation of
e1000_xmit_frame failed with NETDEV_TX_BUSY, and the frame in the buffer
remained forever, resulting in a watchdog failure.
This change fixes the estimated size by making it match with the
condition for NETDEV_TX_BUSY. Apparently, the old estimation failed to
account for the following lines which determines the space requirement
for not causing NETDEV_TX_BUSY:
```
/* reserve a descriptor for the offload context */
if ((mss) || (skb->ip_summed == CHECKSUM_PARTIAL))
count++;
count++;
count += DIV_ROUND_UP(len, adapter->tx_fifo_limit);
```
This issue was found when running http-stress02 test included in Linux
Test Project 20220930 on QEMU with the following commandline:
```
qemu-system-x86_64 -M q35,accel=kvm -m 8G -smp 8
-drive if=virtio,format=raw,file=root.img,file.locking=on
-device e1000e,netdev=netdev
-netdev tap,script=ifup,downscript=no,id=netdev
```
Fixes: bc7f75fa97 ("[E1000E]: New pci-express e1000 driver (currently for ICH9 devices only)")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45fecdb9f6 ]
for_each_pci_dev() is implemented by pci_get_device(). The comment of
pci_get_device() says that it will increase the reference count for the
returned pci_dev and also decrease the reference count for the input
pci_dev @from if it is not NULL.
If we break for_each_pci_dev() loop with pdev not NULL, we need to call
pci_dev_put() to decrease the reference count. Add the missing
pci_dev_put() after the 'out' label. Since pci_dev_put() can handle NULL
input parameter, there is no problem for the 'Device not found' branch.
For the normal path, add pci_dev_put() in amd_gpio_exit().
Fixes: f942a7de04 ("gpio: add a driver for GPIO pins found on AMD-8111 south bridge chips")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8c11586450 ]
According to the description in ti-sn65dsi86's datasheet:
CHA_HSYNC_POLARITY:
0 = Active High Pulse. Synchronization signal is high for the sync
pulse width. (default)
1 = Active Low Pulse. Synchronization signal is low for the sync
pulse width.
CHA_VSYNC_POLARITY:
0 = Active High Pulse. Synchronization signal is high for the sync
pulse width. (Default)
1 = Active Low Pulse. Synchronization signal is low for the sync
pulse width.
We should only set these bits when the polarity is negative.
Fixes: a095f15c00 ("drm/bridge: add support for sn65dsi86 bridge driver")
Signed-off-by: Qiqi Zhang <eddy.zhang@rock-chips.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20221125104558.84616-1-eddy.zhang@rock-chips.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1e24c54da2 ]
The struct cas_control embeds multiple generic SPI structures and we
have to make sure these structures are initialized to default values.
This driver does not set all attributes. When using kmalloc before some
attributes were not initialized and contained random data which caused
random crashes at bootup.
Fixes: ded845a781 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Link: https://lore.kernel.org/r/20221121002201.1339636-1-hauke@hauke-m.de
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit ec61b41918 upstream.
Syzbot reported shift-out-of-bounds in hid_report_raw_event.
microsoft 0003:045E:07DA.0001: hid_field_extract() called with n (128) >
32! (swapper/0)
======================================================================
UBSAN: shift-out-of-bounds in drivers/hid/hid-core.c:1323:20
shift exponent 127 is too large for 32-bit type 'int'
CPU: 0 PID: 0 Comm: swapper/0 Not tainted
6.1.0-rc4-syzkaller-00159-g4bbf3422df78 #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS
Google 10/26/2022
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:151 [inline]
__ubsan_handle_shift_out_of_bounds+0x3a6/0x420 lib/ubsan.c:322
snto32 drivers/hid/hid-core.c:1323 [inline]
hid_input_fetch_field drivers/hid/hid-core.c:1572 [inline]
hid_process_report drivers/hid/hid-core.c:1665 [inline]
hid_report_raw_event+0xd56/0x18b0 drivers/hid/hid-core.c:1998
hid_input_report+0x408/0x4f0 drivers/hid/hid-core.c:2066
hid_irq_in+0x459/0x690 drivers/hid/usbhid/hid-core.c:284
__usb_hcd_giveback_urb+0x369/0x530 drivers/usb/core/hcd.c:1671
dummy_timer+0x86b/0x3110 drivers/usb/gadget/udc/dummy_hcd.c:1988
call_timer_fn+0xf5/0x210 kernel/time/timer.c:1474
expire_timers kernel/time/timer.c:1519 [inline]
__run_timers+0x76a/0x980 kernel/time/timer.c:1790
run_timer_softirq+0x63/0xf0 kernel/time/timer.c:1803
__do_softirq+0x277/0x75b kernel/softirq.c:571
__irq_exit_rcu+0xec/0x170 kernel/softirq.c:650
irq_exit_rcu+0x5/0x20 kernel/softirq.c:662
sysvec_apic_timer_interrupt+0x91/0xb0 arch/x86/kernel/apic/apic.c:1107
======================================================================
If the size of the integer (unsigned n) is bigger than 32 in snto32(),
shift exponent will be too large for 32-bit type 'int', resulting in a
shift-out-of-bounds bug.
Fix this by adding a check on the size of the integer (unsigned n) in
snto32(). To add support for n greater than 32 bits, set n to 32, if n
is greater than 32.
Reported-by: syzbot+8b1641d2f14732407e23@syzkaller.appspotmail.com
Fixes: dde5845a52 ("[PATCH] Generic HID layer - code split")
Signed-off-by: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d180b64961 upstream.
If an empty buf is received, lbuf is also empty. So lbuf is
accessed by index -1.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: f31a2de3fe ("HID: hid-lg4ff: Allow switching of Logitech gaming wheels between compatibility modes")
Signed-off-by: Anastasia Belova <abelova@astralinux.ru>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6d910a89a upstream.
Some additional USB mouse devices are needing ALWAYS_POLL quirk without
which they disconnect and reconnect every 60s.
Add below devices to the known quirk list.
CHERRY VID 0x046a, PID 0x000c
MICROSOFT VID 0x045e, PID 0x0783
PRIMAX VID 0x0461, PID 0x4e2a
Signed-off-by: Ankit Patel <anpatel@nvidia.com>
Signed-off-by: Haotien Hsu <haotienh@nvidia.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a7ba45b1a upstream.
memcg_write_event_control() accesses the dentry->d_name of the specified
control fd to route the write call. As a cgroup interface file can't be
renamed, it's safe to access d_name as long as the specified file is a
regular cgroup file. Also, as these cgroup interface files can't be
removed before the directory, it's safe to access the parent too.
Prior to 347c4a8747 ("memcg: remove cgroup_event->cft"), there was a
call to __file_cft() which verified that the specified file is a regular
cgroupfs file before further accesses. The cftype pointer returned from
__file_cft() was no longer necessary and the commit inadvertently dropped
the file type check with it allowing any file to slip through. With the
invarients broken, the d_name and parent accesses can now race against
renames and removals of arbitrary files and cause use-after-free's.
Fix the bug by resurrecting the file type check in __file_cft(). Now that
cgroupfs is implemented through kernfs, checking the file operations needs
to go through a layer of indirection. Instead, let's check the superblock
and dentry type.
Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org
Fixes: 347c4a8747 ("memcg: remove cgroup_event->cft")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jann Horn <jannh@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org> [3.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5eef214177 upstream.
Sanity checks were added to verify the v4l2_bt_timings blanking fields
in order to avoid integer overflows when userspace passes weird values.
But that assumed that userspace would correctly fill in the front porch,
backporch and sync values, but sometimes all you know is the total
blanking, which is then assigned to just one of these fields.
And that can fail with these checks.
So instead set a maximum for the total horizontal and vertical
blanking and check that each field remains below that.
That is still sufficient to avoid integer overflows, but it also
allows for more flexibility in how userspace fills in these fields.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 4b6d66a45e ("media: v4l2-dv-timings: add sanity checks for blanking values")
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 1fae6eb0fc.
Upstream commit was a fix for an overlook of setting "ent.is_valid"
twice after 5d65b64a3d ("net: dsa: b53: Add support for MDB").
Since MDB support was not backported to stable kernels (it's not a bug
fix) there is nothing to fix there. Backporting this commit resulted in
"env.is_valid" not being set at all.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 74e7e1efda ]
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. So remove kfree_skb()
from the spin_lock_irqsave() section and use the already existing
"drop" label in xenvif_start_xmit() for dropping the SKB. At the
same time replace the dev_kfree_skb() call there with a call of
dev_kfree_skb_any(), as xenvif_start_xmit() can be called with
disabled interrupts.
This is XSA-424 / CVE-2022-42328 / CVE-2022-42329.
Fixes: be81992f90 ("xen/netback: don't queue unlimited number of packages")
Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ad7f402ae4 ]
In some cases, the frontend may send a packet where the protocol headers
are spread across multiple slots. This would result in netback creating
an skb where the protocol headers spill over into the non-linear area.
Some drivers and NICs don't handle this properly resulting in an
interface reset or worse.
This issue was introduced by the removal of an unconditional skb pull in
the tx path to improve performance. Fix this without reintroducing the
pull by setting up grant copy ops for as many slots as needed to reach
the XEN_NETBACK_TX_COPY_LEN size. Adjust the rest of the code to handle
multiple copy operations per skb.
This is XSA-423 / CVE-2022-3643.
Fixes: 7e5d775395 ("xen-netback: remove unconditional __pskb_pull_tail() in guest Tx path")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 8d3c106e19 upstream.
pagetable walks on address ranges mapped by VMAs can be done under the
mmap lock, the lock of an anon_vma attached to the VMA, or the lock of the
VMA's address_space. Only one of these needs to be held, and it does not
need to be held in exclusive mode.
Under those circumstances, the rules for concurrent access to page table
entries are:
- Terminal page table entries (entries that don't point to another page
table) can be arbitrarily changed under the page table lock, with the
exception that they always need to be consistent for
hardware page table walks and lockless_pages_from_mm().
This includes that they can be changed into non-terminal entries.
- Non-terminal page table entries (which point to another page table)
can not be modified; readers are allowed to READ_ONCE() an entry, verify
that it is non-terminal, and then assume that its value will stay as-is.
Retracting a page table involves modifying a non-terminal entry, so
page-table-level locks are insufficient to protect against concurrent page
table traversal; it requires taking all the higher-level locks under which
it is possible to start a page walk in the relevant range in exclusive
mode.
The collapse_huge_page() path for anonymous THP already follows this rule,
but the shmem/file THP path was getting it wrong, making it possible for
concurrent rmap-based operations to cause corruption.
Link: https://lkml.kernel.org/r/20221129154730.2274278-1-jannh@google.com
Link: https://lkml.kernel.org/r/20221128180252.1684965-1-jannh@google.com
Link: https://lkml.kernel.org/r/20221125213714.4115729-1-jannh@google.com
Fixes: 27e1f82731 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[manual backport: this code was refactored from two copies into a common
helper between 5.15 and 6.0]
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a487069e11 ]
Add RmNet support for LARA-L6.
LARA-L6 module can be configured (by AT interface) in three different
USB modes:
* Default mode (Vendor ID: 0x1546 Product ID: 0x1341) with 4 serial
interfaces
* RmNet mode (Vendor ID: 0x1546 Product ID: 0x1342) with 4 serial
interfaces and 1 RmNet virtual network interface
* CDC-ECM mode (Vendor ID: 0x1546 Product ID: 0x1343) with 4 serial
interface and 1 CDC-ECM virtual network interface
In RmNet mode LARA-L6 exposes the following interfaces:
If 0: Diagnostic
If 1: AT parser
If 2: AT parser
If 3: AT parset/alternative functions
If 4: RMNET interface
Signed-off-by: Davide Tronchin <davide.tronchin.94@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a11452a370 ]
When trying to see if we can clone a file range, there are cases where we
end up sending two write operations in case the inode from the source root
has an i_size that is not sector size aligned and the length from the
current offset to its i_size is less than the remaining length we are
trying to clone.
Issuing two write operations when we could instead issue a single write
operation is not incorrect. However it is not optimal, specially if the
extents are compressed and the flag BTRFS_SEND_FLAG_COMPRESSED was passed
to the send ioctl. In that case we can end up sending an encoded write
with an offset that is not sector size aligned, which makes the receiver
fallback to decompressing the data and writing it using regular buffered
IO (so re-compressing the data in case the fs is mounted with compression
enabled), because encoded writes fail with -EINVAL when an offset is not
sector size aligned.
The following example, which triggered a bug in the receiver code for the
fallback logic of decompressing + regular buffer IO and is fixed by the
patchset referred in a Link at the bottom of this changelog, is an example
where we have the non-optimal behaviour due to an unaligned encoded write:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdj
MNT=/mnt/sdj
mkfs.btrfs -f $DEV > /dev/null
mount -o compress $DEV $MNT
# File foo has a size of 33K, not aligned to the sector size.
xfs_io -f -c "pwrite -S 0xab 0 33K" $MNT/foo
xfs_io -f -c "pwrite -S 0xcd 0 64K" $MNT/bar
# Now clone the first 32K of file bar into foo at offset 0.
xfs_io -c "reflink $MNT/bar 0 0 32K" $MNT/foo
# Snapshot the default subvolume and create a full send stream (v2).
btrfs subvolume snapshot -r $MNT $MNT/snap
btrfs send --compressed-data -f /tmp/test.send $MNT/snap
echo -e "\nFile bar in the original filesystem:"
od -A d -t x1 $MNT/snap/bar
umount $MNT
mkfs.btrfs -f $DEV > /dev/null
mount $DEV $MNT
echo -e "\nReceiving stream in a new filesystem..."
btrfs receive -f /tmp/test.send $MNT
echo -e "\nFile bar in the new filesystem:"
od -A d -t x1 $MNT/snap/bar
umount $MNT
Before this patch, the send stream included one regular write and one
encoded write for file 'bar', with the later being not sector size aligned
and causing the receiver to fallback to decompression + buffered writes.
The output of the btrfs receive command in verbose mode (-vvv):
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
utimes
clone bar - source=foo source offset=0 offset=0 length=32768
write bar - offset=32768 length=1024
encoded_write bar - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0
encoded_write bar - falling back to decompress and write due to errno 22 ("Invalid argument")
(...)
This patch avoids the regular write followed by an unaligned encoded write
so that we end up sending a single encoded write that is aligned. So after
this patch the stream content is (output of btrfs receive -vvv):
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
utimes
clone bar - source=foo source offset=0 offset=0 length=32768
encoded_write bar - offset=32768, len=4096, unencoded_offset=32768, unencoded_file_len=32768, unencoded_len=65536, compression=1, encryption=0
(...)
So we get more optimal behaviour and avoid the silent data loss bug in
versions of btrfs-progs affected by the bug referred by the Link tag
below (btrfs-progs v5.19, v5.19.1, v6.0 and v6.0.1).
Link: https://lore.kernel.org/linux-btrfs/cover.1668529099.git.fdmanana@suse.com/
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 05530ef7cf ]
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed.
seq_copy_in_user() and seq_copy_in_kernel() did not have prototypes
matching snd_seq_dump_func_t. Adjust this and remove the casts. There
are not resulting binary output differences.
This was found as a result of Clang's new -Wcast-function-type-strict
flag, which is more sensitive than the simpler -Wcast-function-type,
which only checks for type width mismatches.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/lkml/202211041527.HD8TLSE1-lkp@intel.com
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: alsa-devel@alsa-project.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221118232346.never.380-kees@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit da74858a47 ]
The clock source and the sched_clock provided by the arm_global_timer
on Rockchip rk3066a/rk3188 are quite unstable because their rates
depend on the CPU frequency.
Recent changes to the arm_global_timer driver makes it impossible to use.
On the other side, the arm_global_timer has a higher rating than the
ROCKCHIP_TIMER, it will be selected by default by the time framework
while we want to use the stable Rockchip clock source.
Keep the arm_global_timer disabled in order to have the
DW_APB_TIMER (rk3066a) or ROCKCHIP_TIMER (rk3188) selected by default.
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/f275ca8d-fd0a-26e5-b978-b7f3df815e0a@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 340a982825 ]
Actually in no-MMU SoCs(i.e. i.MXRT) ZERO_PAGE(vaddr) expands to
```
virt_to_page(0)
```
that in order expands to:
```
pfn_to_page(virt_to_pfn(0))
```
and then virt_to_pfn(0) to:
```
((((unsigned long)(0) - PAGE_OFFSET) >> PAGE_SHIFT) +
PHYS_PFN_OFFSET)
```
where PAGE_OFFSET and PHYS_PFN_OFFSET are the DRAM offset(0x80000000) and
PAGE_SHIFT is 12. This way we obtain 16MB(0x01000000) summed to the base of
DRAM(0x80000000).
When ZERO_PAGE(0) is then used, for example in bio_add_page(), the page
gets an address that is out of DRAM bounds.
So instead of using fake virtual page 0 let's allocate a dedicated
zero_page during paging_init() and assign it to a global 'struct page *
empty_zero_page' the same way mmu.c does and it's the same approach used
in m68k with commit dc068f4621 as discussed here[0]. Then let's move
ZERO_PAGE() definition to the top of pgtable.h to be in common between
mmu.c and nommu.c.
[0]: https://lore.kernel.org/linux-m68k/2a462b23-5b8e-bbf4-ec7d-778434a3b9d7@google.com/T/#m1266ceb63
ad140743174d6b3070364d3c9a5179b
Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>