[ Upstream commit fb509f76ac ]
Guests like grsecurity that make heavy use of CR0.WP to implement kernel
level W^X will suffer from the implied VMEXITs.
With EPT there is no need to intercept a guest change of CR0.WP, so
simply make it a guest owned bit if we can do so.
This implies that a read of a guest's CR0.WP bit might need a VMREAD.
However, the only potentially affected user seems to be kvm_init_mmu()
which is a heavy operation to begin with. But also most callers already
cache the full value of CR0 anyway, so no additional VMREAD is needed.
The only exception is nested_vmx_load_cr3().
This change is VMX-specific, as SVM has no such fine grained control
register intercept control.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/r/20230322013731.102955-7-minipli@grsecurity.net
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net> # backport to v6.1.x
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 01b31714bd ]
There is no need to unload the MMU roots with TDP enabled when only
CR0.WP has changed -- the paging structures are still valid, only the
permission bitmap needs to be updated.
One heavy user of toggling CR0.WP is grsecurity's KERNEXEC feature to
implement kernel W^X.
The optimization brings a huge performance gain for this case as the
following micro-benchmark running 'ssdd 10 50000' from rt-tests[1] on a
grsecurity L1 VM shows (runtime in seconds, lower is better):
legacy TDP shadow
kvm-x86/next@d8708b 8.43s 9.45s 70.3s
+patch 5.39s 5.63s 70.2s
For legacy MMU this is ~36% faster, for TDP MMU even ~40% faster. Also
TDP and legacy MMU now both have a similar runtime which vanishes the
need to disable TDP MMU for grsecurity.
Shadow MMU sees no measurable difference and is still slow, as expected.
[1] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/r/20230322013731.102955-3-minipli@grsecurity.net
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 254e69f284 ]
Syzbot reported a null-ptr-deref bug:
ntfs3: loop0: Different NTFS' sector size (1024) and media sector size
(512)
ntfs3: loop0: Mark volume as dirty due to NTFS errors
general protection fault, probably for non-canonical address
0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
RIP: 0010:d_flags_for_inode fs/dcache.c:1980 [inline]
RIP: 0010:__d_add+0x5ce/0x800 fs/dcache.c:2796
Call Trace:
<TASK>
d_splice_alias+0x122/0x3b0 fs/dcache.c:3191
lookup_open fs/namei.c:3391 [inline]
open_last_lookups fs/namei.c:3481 [inline]
path_openat+0x10e6/0x2df0 fs/namei.c:3688
do_filp_open+0x264/0x4f0 fs/namei.c:3718
do_sys_openat2+0x124/0x4e0 fs/open.c:1310
do_sys_open fs/open.c:1326 [inline]
__do_sys_open fs/open.c:1334 [inline]
__se_sys_open fs/open.c:1330 [inline]
__x64_sys_open+0x221/0x270 fs/open.c:1330
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
If the MFT record of ntfs inode is not a base record, inode->i_op can be
NULL. And a null-ptr-deref may happen:
ntfs_lookup()
dir_search_u() # inode->i_op is set to NULL
d_splice_alias()
__d_add()
d_flags_for_inode() # inode->i_op->get_link null-ptr-deref
Fix this by adding a Check on inode->i_op before calling the
d_splice_alias() function.
Fixes: 4342306f0f ("fs/ntfs3: Add file operations and implementation")
Reported-by: syzbot+a8f26a403c169b7593fe@syzkaller.appspotmail.com
Signed-off-by: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 043f85ce81 ]
Using flexible array is more straight forward. It
- saves 1 pointer in the 'zynqmp_ipi_pdata' structure
- saves an indirection when using this array
- saves some LoC and avoids some always spurious pointer arithmetic
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Stable-dep-of: f72f805e72 ("mailbox: zynqmp: Fix counts of child nodes")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9fd0945fe6 ]
Infineon(Cypress) SEMPER NOR flash family has on-die ECC and its program
granularity is 16-byte ECC data unit size. JFFS2 supports write buffer
mode for ECC'd NOR flash. Provide a way to clear the MTD_BIT_WRITEABLE
flag in order to enable JFFS2 write buffer mode support.
A new SNOR_F_ECC flag is introduced to determine if the part has on-die
ECC and if it has, MTD_BIT_WRITEABLE is unset.
In vendor specific driver, a common cypress_nor_ecc_init() helper is
added. This helper takes care for ECC related initialization for SEMPER
flash family by setting up params->writesize and SNOR_F_ECC.
Fixes: c3266af101 ("mtd: spi-nor: spansion: add support for Cypress Semper flash")
Suggested-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano@infineon.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/d586723f6f12aaff44fbcd7b51e674b47ed554ed.1680760742.git.Takahiro.Kuwano@infineon.com
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4eddee7014 ]
Introduce a new (no SFDP) flag for the feature that we are about to
support: Read While Write. This means, if the chip has several banks and
supports RWW, once a page of data to write has been transferred into the
chip's internal SRAM, another read operation happening on a different
bank can be performed during the tPROG delay.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230328154105.448540-7-miquel.raynal@bootlin.com
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Stable-dep-of: 9fd0945fe6 ("mtd: spi-nor: spansion: Enable JFFS2 write buffer for Infineon s28hx SEMPER flash")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1799cd8540 ]
SFDP table of some flash chips do not advertise support of Quad Input
Page Program even though it has support. Use flags and add hardware
cap for these chips.
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@sifive.com>
[tudor.ambarus@microchip.com: move pp setting in spi_nor_init_default_params]
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Link: https://lore.kernel.org/r/20220920184808.44876-2-sudip.mukherjee@sifive.com
Stable-dep-of: 9fd0945fe6 ("mtd: spi-nor: spansion: Enable JFFS2 write buffer for Infineon s28hx SEMPER flash")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bec46859fb ]
Track KVM's supported PERF_CAPABILITIES in kvm_caps instead of computing
the supported capabilities on the fly every time. Using kvm_caps will
also allow for future cleanups as the kvm_caps values can be used
directly in common x86 code.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Like Xu <likexu@tencent.com>
Message-Id: <20221006000314.73240-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Stable-dep-of: 098f4c061e ("KVM: x86/pmu: Disallow legacy LBRs if architectural LBRs are available")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0b9ca98b72 ]
Drop the return value from x86_perf_get_lbr() and have the stub zero out
the @lbr structure instead of returning -1 to indicate "no LBR support".
KVM doesn't actually check the return value, and instead subtly relies on
zeroing the number of LBRs in intel_pmu_init().
Formalize "nr=0 means unsupported" so that KVM doesn't need to add a
pointless check on the return value to fix KVM's benign bug.
Note, the stub is necessary even though KVM x86 selects PERF_EVENTS and
the caller exists only when CONFIG_KVM_INTEL=y. Despite the name,
KVM_INTEL doesn't strictly require CPU_SUP_INTEL, it can be built with
any of INTEL || CENTAUR || ZHAOXIN CPUs.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221006000314.73240-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Stable-dep-of: 098f4c061e ("KVM: x86/pmu: Disallow legacy LBRs if architectural LBRs are available")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45121ad4a1 ]
The PSP IRQ is edge-triggered (MSI or MSI-X) in all cases supported by
the psp module so clear the interrupt status register early in the
handler to prevent missed interrupts. sev_irq_handler() calls wake_up()
on a wait queue, which can result in a new command being submitted from
a different CPU. This then races with the clearing of isr and can result
in missed interrupts. A missed interrupt results in a command waiting
until it times out, which results in the psp being declared dead.
This is unlikely on bare metal, but has been observed when running
virtualized. In the cases where this is observed, sev->cmdresp_reg has
PSP_CMDRESP_RESP set which indicates that the command was processed
correctly but no interrupt was asserted.
The full sequence of events looks like this:
CPU 1: submits SEV cmd #1
CPU 1: calls wait_event_timeout()
CPU 0: enters psp_irq_handler()
CPU 0: calls sev_handler()->wake_up()
CPU 1: wakes up; finishes processing cmd #1
CPU 1: submits SEV cmd #2
CPU 1: calls wait_event_timeout()
PSP: finishes processing cmd #2; interrupt status is still set; no interrupt
CPU 0: clears intsts
CPU 0: exits psp_irq_handler()
CPU 1: wait_event_timeout() times out; psp_dead=true
Fixes: 200664d523 ("crypto: ccp: Add Secure Encrypted Virtualization (SEV) command support")
Cc: stable@vger.kernel.org
Signed-off-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a37a512db3 ]
Legacy Display Unit (LDU) fb dirty support used a custom fb dirty callback. Latter
handled only the DIRTYFB IOCTL presentation path but not the ADDFB2/PAGE_FLIP/RMFB
IOCTL path, common for Wayland compositors.
Get rid of the custom callback in favor of drm_atomic_helper_dirtyfb and unify the
handling of the presentation paths inside of vmw_ldu_primary_plane_atomic_update.
This also homogenizes the fb dirty callbacks across all DUs: LDU, SOU and STDU.
Signed-off-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com>
Fixes: 2f5544ff03 ("drm/vmwgfx: Use atomic helper function for dirty fb IOCTL")
Cc: <stable@vger.kernel.org> # v5.0+
Signed-off-by: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230321020949.335012-3-zack@kde.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e10cdc6e8 ]
The explicit vblank handling was never finished. The driver never had
the full implementation of vblank and what was there is emulated
by DRM when the driver doesn't pretend to be implementing it itself.
Let DRM handle the vblank emulation and stop pretending the driver is
doing anything special with vblank. In the future it would make sense
to implement helpers for full vblank handling because vkms and
amdgpu_vkms already have that code. Exporting it to common helpers and
having all three drivers share it would make sense (that would be largely
just to allow more of igt to run).
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com>
Reviewed-by: Martin Krastev <krastevm@vmware.com>
Reviewed-by: Michael Banack <banackm@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221022040236.616490-15-zack@kde.org
Stable-dep-of: a37a512db3 ("drm/vmwgfx: Fix Legacy Display Unit atomic drm support")
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit c1592a8994 upstream.
Toggle deleted anonymous sets as inactive in the next generation, so
users cannot perform any update on it. Clear the generation bitmask
in case the transaction is aborted.
The following KASAN splat shows a set element deletion for a bound
anonymous set that has been already removed in the same transaction.
[ 64.921510] ==================================================================
[ 64.923123] BUG: KASAN: wild-memory-access in nf_tables_commit+0xa24/0x1490 [nf_tables]
[ 64.924745] Write of size 8 at addr dead000000000122 by task test/890
[ 64.927903] CPU: 3 PID: 890 Comm: test Not tainted 6.3.0+ #253
[ 64.931120] Call Trace:
[ 64.932699] <TASK>
[ 64.934292] dump_stack_lvl+0x33/0x50
[ 64.935908] ? nf_tables_commit+0xa24/0x1490 [nf_tables]
[ 64.937551] kasan_report+0xda/0x120
[ 64.939186] ? nf_tables_commit+0xa24/0x1490 [nf_tables]
[ 64.940814] nf_tables_commit+0xa24/0x1490 [nf_tables]
[ 64.942452] ? __kasan_slab_alloc+0x2d/0x60
[ 64.944070] ? nf_tables_setelem_notify+0x190/0x190 [nf_tables]
[ 64.945710] ? kasan_set_track+0x21/0x30
[ 64.947323] nfnetlink_rcv_batch+0x709/0xd90 [nfnetlink]
[ 64.948898] ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0af462f19e upstream.
The recent fix to ensure atomicity of lookup and allocation inadvertently
broke the pool refill mechanism.
Prior to that change debug_objects_activate() and debug_objecs_assert_init()
invoked debug_objecs_init() to set up the tracking object for statically
initialized objects. That's not longer the case and debug_objecs_init() is
now the only place which does pool refills.
Depending on the number of statically initialized objects this can be
enough to actually deplete the pool, which was observed by Ido via a
debugobjects OOM warning.
Restore the old behaviour by adding explicit refill opportunities to
debug_objects_activate() and debug_objecs_assert_init().
Fixes: 63a759694e ("debugobject: Prevent init race with static objects")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/871qk05a9d.ffs@tglx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 545094d993 upstream.
rn_vbios_smu_set_dcn_low_power_state() produces a valid warning with
gcc-13:
drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr_vbios_smu.c:237:6: error: conflicting types for 'rn_vbios_smu_set_dcn_low_power_state' due to enum/integer mismatch; have 'void(struct clk_mgr_internal *, enum dcn_pwr_state)'
drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr_vbios_smu.h:36:6: note: previous declaration of 'rn_vbios_smu_set_dcn_low_power_state' with type 'void(struct clk_mgr_internal *, int)'
I.e. the type of the 2nd parameter of
rn_vbios_smu_set_dcn_low_power_state() in the declaration is int, while
the definition spells enum dcn_pwr_state. Synchronize them to the
latter (and add a forward enum declaration).
Cc: Martin Liska <mliska@suse.cz>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5180ff1364 upstream.
To prepare for removal of i40e_status, change the variables
from i40e_status to int. This eases the transition when values
are changed to return standard int error codes over enum i40e_status.
As such changes often also change variable orders, a cleanup
is also applied here to make variables conform to RCT and
some lines are also reformatted where applicable.
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d968af27a upstream.
Remove the i40e_stat_str() function which prints the string
representation of the i40e_status error code. With upcoming changes
moving away from i40e_status, there will be no need for this function
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3319dbb3e7 upstream.
ef100_enqueue_skb() generates a valid warning with gcc-13:
drivers/net/ethernet/sfc/ef100_tx.c:370:5: error: conflicting types for 'ef100_enqueue_skb' due to enum/integer mismatch; have 'int(struct efx_tx_queue *, struct sk_buff *)'
drivers/net/ethernet/sfc/ef100_tx.h:25:13: note: previous declaration of 'ef100_enqueue_skb' with type 'netdev_tx_t(struct efx_tx_queue *, struct sk_buff *)'
I.e. the type of the ef100_enqueue_skb()'s return value in the declaration is
int, while the definition spells enum netdev_tx_t. Synchronize them to the
latter.
Cc: Martin Liska <mliska@suse.cz>
Cc: Edward Cree <ecree.xilinx@gmail.com>
Cc: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20221031114440.10461-1-jirislaby@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ff1cc97b1f upstream.
Since gcc13, each member of an enum has the same type as the enum [1]. And
that is inherited from its members. Provided:
VTIME_PER_SEC_SHIFT = 37,
VTIME_PER_SEC = 1LLU << VTIME_PER_SEC_SHIFT,
...
AUTOP_CYCLE_NSEC = 10LLU * NSEC_PER_SEC,
the named type is unsigned long.
This generates warnings with gcc-13:
block/blk-iocost.c: In function 'ioc_weight_prfill':
block/blk-iocost.c:3037:37: error: format '%u' expects argument of type 'unsigned int', but argument 4 has type 'long unsigned int'
block/blk-iocost.c: In function 'ioc_weight_show':
block/blk-iocost.c:3047:34: error: format '%u' expects argument of type 'unsigned int', but argument 3 has type 'long unsigned int'
So split the anonymous enum with large values to a separate enum, so
that they don't affect other members.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36113
Cc: Martin Liska <mliska@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: cgroups@vger.kernel.org
Cc: linux-block@vger.kernel.org
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20221213120826.17446-1-jirislaby@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 777fa87c76 upstream.
Both bond_alb_xmit() and bond_tlb_xmit() produce a valid warning with
gcc-13:
drivers/net/bonding/bond_alb.c:1409:13: error: conflicting types for 'bond_tlb_xmit' due to enum/integer mismatch; have 'netdev_tx_t(struct sk_buff *, struct net_device *)' ...
include/net/bond_alb.h:160:5: note: previous declaration of 'bond_tlb_xmit' with type 'int(struct sk_buff *, struct net_device *)'
drivers/net/bonding/bond_alb.c:1523:13: error: conflicting types for 'bond_alb_xmit' due to enum/integer mismatch; have 'netdev_tx_t(struct sk_buff *, struct net_device *)' ...
include/net/bond_alb.h:159:5: note: previous declaration of 'bond_alb_xmit' with type 'int(struct sk_buff *, struct net_device *)'
I.e. the return type of the declaration is int, while the definitions
spell netdev_tx_t. Synchronize both of them to the latter.
Cc: Martin Liska <mliska@suse.cz>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20221031114409.10417-1-jirislaby@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d311376197 upstream.
tb_port_is_clx_enabled() generates a valid warning with gcc-13:
drivers/thunderbolt/switch.c:1286:6: error: conflicting types for 'tb_port_is_clx_enabled' due to enum/integer mismatch; have 'bool(struct tb_port *, unsigned int)' ...
drivers/thunderbolt/tb.h:1050:6: note: previous declaration of 'tb_port_is_clx_enabled' with type 'bool(struct tb_port *, enum tb_clx)' ...
I.e. the type of the 2nd parameter of tb_port_is_clx_enabled() in the
declaration is unsigned int, while the definition spells enum tb_clx.
Synchronize them to the former as the parameter is in fact a mask of the
enum values.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90c49fce1c upstream.
TCP_Server_Info::hostname may be updated once or many times during
reconnect, so protect its access outside reconnect path as well and
then prevent any potential use-after-free bugs.
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f2779dfa7 upstream.
The behavior of 'enum' types has changed in gcc-13, so now the
UNBUSY_THR_PCT constant is interpreted as a 64-bit number because
it is defined as part of the same enum definition as some other
constants that do not fit within a 32-bit integer. This in turn
leads to some inefficient code on 32-bit architectures as well
as a link error:
arm-linux-gnueabi/bin/arm-linux-gnueabi-ld: block/blk-iocost.o: in function `ioc_timer_fn':
blk-iocost.c:(.text+0x68e8): undefined reference to `__aeabi_uldivmod'
arm-linux-gnueabi-ld: blk-iocost.c:(.text+0x6908): undefined reference to `__aeabi_uldivmod'
Split the enum definition to keep the 64-bit timing constants in
a separate enum type from those constants that can clearly fit
within a smaller type.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230118080706.3303186-1-arnd@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 38d11da522 upstream.
Commit fa247089de ("dm: requeue IO if mapping table not yet available")
added a detection of whether the mapping table is available in the IO
submission process. If the mapping table is unavailable, it returns
BLK_STS_RESOURCE and requeues the IO.
This can lead to the following deadlock problem:
dm create mount
ioctl(DM_DEV_CREATE_CMD)
ioctl(DM_TABLE_LOAD_CMD)
do_mount
vfs_get_tree
ext4_get_tree
get_tree_bdev
sget_fc
alloc_super
// got &s->s_umount
down_write_nested(&s->s_umount, ...);
ext4_fill_super
ext4_load_super
ext4_read_bh
submit_bio
// submit and wait io end
ioctl(DM_DEV_SUSPEND_CMD)
dev_suspend
do_resume
dm_suspend
__dm_suspend
lock_fs
freeze_bdev
get_active_super
grab_super
// wait for &s->s_umount
down_write(&s->s_umount);
dm_swap_table
__bind
// set md->map(can't get here)
IO will be continuously requeued while holding the lock since mapping
table is NULL. At the same time, mapping table won't be set since the
lock is not available.
Like request-based DM, bio-based DM also has the same problem.
It's not proper to just abort IO if the mapping table not available.
So clear DM_SKIP_LOCKFS_FLAG when the mapping table is NULL, this
allows the DM table to be loaded and the IO submitted upon resume.
Fixes: fa247089de ("dm: requeue IO if mapping table not yet available")
Cc: stable@vger.kernel.org
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d32aaa7e6 upstream.
syzkaller found the following problematic rwsem locking (with write
lock already held):
down_read+0x9d/0x450 kernel/locking/rwsem.c:1509
dm_get_inactive_table+0x2b/0xc0 drivers/md/dm-ioctl.c:773
__dev_status+0x4fd/0x7c0 drivers/md/dm-ioctl.c:844
table_clear+0x197/0x280 drivers/md/dm-ioctl.c:1537
In table_clear, it first acquires a write lock
https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L1520
down_write(&_hash_lock);
Then before the lock is released at L1539, there is a path shown above:
table_clear -> __dev_status -> dm_get_inactive_table -> down_read
https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L773
down_read(&_hash_lock);
It tries to acquire the same read lock again, resulting in the deadlock
problem.
Fix this by moving table_clear()'s __dev_status() call to after its
up_write(&_hash_lock);
Cc: stable@vger.kernel.org
Reported-by: Zheng Zhang <zheng.zhang@email.ucr.edu>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e8c5d45f82 upstream.
In verity_end_io(), if bi_status is not BLK_STS_OK, it can be return
directly. But if FEC configured, it is desired to correct the data page
through verity_verify_io. And the return value will be converted to
blk_status and passed to verity_finish_io().
BTW, when a bit is set in v->validated_blocks, verity_verify_io() skips
verification regardless of I/O error for the corresponding bio. In this
case, the I/O error could not be returned properly, and as a result,
there is a problem that abnormal data could be read for the
corresponding block.
To fix this problem, when an I/O error occurs, do not skip verification
even if the bit related is set in v->validated_blocks.
Fixes: 843f38d382 ("dm verity: add 'check_at_most_once' option to only validate hashes once")
Cc: stable@vger.kernel.org
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c82729e066 upstream.
While using the vdpa device with vIOMMU enabled
in the guest VM, when the vdpa device bind to vfio-pci and run testpmd
then system will fail to unmap.
The test process is
Load guest VM --> attach to virtio driver--> bind to vfio-pci driver
So the mapping process is
1)batched mode map to normal MR
2)batched mode unmapped the normal MR
3)unmapped all the memory
4)mapped to iommu MR
This error happened in step 3). The iotlb was freed in step 2)
and the function vhost_vdpa_process_iotlb_msg will return fail
Which causes failure.
To fix this, we will not remove the AS while the iotlb->nmaps is 0.
This will free in the vhost_vdpa_clean
Cc: stable@vger.kernel.org
Fixes: aaca8373c4 ("vhost-vdpa: support ASID based IOTLB API")
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20230420151734.860168-1-lulu@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 00ca0f2e86 upstream.
The refactoring in commit f4e9e0e694 ("mm/mempolicy: fix use-after-free
of VMA iterator") introduces a subtle bug which arises when attempting to
apply a new NUMA policy across a range of VMAs in mbind_range().
The refactoring passes a **prev pointer to keep track of the previous VMA
in order to reduce duplication, and in all but one case it keeps this
correctly updated.
The bug arises when a VMA within the specified range has an equivalent
policy as determined by mpol_equal() - which unlike other cases, does not
update prev.
This can result in a situation where, later in the iteration, a VMA is
found whose policy does need to change. At this point, vma_merge() is
invoked with prev pointing to a VMA which is before the previous VMA.
Since vma_merge() discovers the curr VMA by looking for the one
immediately after prev, it will now be in a situation where this VMA is
incorrect and the merge will not proceed correctly.
This is checked in the VM_WARN_ON() invariant case with end >
curr->vm_end, which, if a merge is possible, results in a warning (if
CONFIG_DEBUG_VM is specified).
I note that vma_merge() performs these invariant checks only after
merge_prev/merge_next are checked, which is debatable as it hides this
issue if no merge is possible even though a buggy situation has arisen.
The solution is simply to update the prev pointer even when policies are
equal.
This caused a bug to arise in the 6.2.y stable tree, and this patch
resolves this bug.
Link: https://lkml.kernel.org/r/83f1d612acb519d777bebf7f3359317c4e7f4265.1682866629.git.lstoakes@gmail.com
Fixes: f4e9e0e694 ("mm/mempolicy: fix use-after-free of VMA iterator")
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202304292203.44ddeff6-oliver.sang@intel.com
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>