Structures passed to subdev pad operations are all zero-initialized, but
not always with the same kind of code constructs. While most drivers
used designated initializers, which zero all the fields that are not
specified, when declaring variables, some use memset(). Those two
methods lead to the same end result, and, depending on compiler
optimizations, may even be completely equivalent, but they're not
consistent.
Improve coding style consistency by using designated initializers
instead of calling memset(). Where applicable, also move the variables
to inner scopes of for loops to ensure correct initialization in all
iterations.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Lad Prabhakar <prabhakar.csengg@gmail.com> # For am437x
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Several drivers call subdev pad operations, passing structures that are
not fully zeroed. While the drivers initialize the fields they care
about explicitly, this results in reserved fields having uninitialized
values. Future kernel API changes that make use of those fields thus
risk breaking proper driver operation in ways that could be hard to
detect.
To avoid this, make the code more robust by zero-initializing all the
structures passed to subdev pad operation. Maintain a consistent coding
style by preferring designated initializers (which zero-initialize all
the fields that are not specified) over memset() where possible, and
make variable declarations local to inner scopes where applicable. One
notable exception to this rule is in the ipu3 driver, where a memset()
is needed as the structure is not a local variable but a function
parameter provided by the caller.
Not all fields of those structures can be initialized when declaring the
variables, as the values for those fields are computed later in the
code. Initialize the 'which' field in all cases, and other fields when
the variable declaration is so close to the v4l2_subdev_call() call that
it keeps all the context easily visible when reading the code, to avoid
hindering readability.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org> # For vimc
Reviewed-by: Lad Prabhakar <prabhakar.csengg@gmail.com> # For am437x
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> # For drivers/staging/media/imx/
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Commit 0d346d2a6f ("media: v4l2-subdev: add subdev-wide state struct")
applied a large media tree-wide change produced by coccinelle. It was so
large that a set of identical indentation issues went unnoticed during
review. Fix them.
While at it, and because it's easy to review both changes together, add
a trailing comma for the last (and only) struct member initialization of
the related structures, to avoid future changes should new fields need
to be initialized.
Fixes: 0d346d2a6f ("media: v4l2-subdev: add subdev-wide state struct")
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Jonathan writes:
1st set of IIO new device support, features and cleanups for the 6.4 cycle.
New device support
* bosch,bmp280
- Add support for BMP580 - includes significant refactoring and general
driver cleanup + support for non-volatile memory for trimming and config
parameters.
* rohm BU27034
- New driver for this 3 channel ambient light sensor.
- New support library for devices where both integration time and
amplifier gain are configurable. In these cases a scale change
may require changing bother underlying values. This library module
provides code to help with this.
* st,accel
- Add support for IIS328DQ (ID only as compatible wtih LIS331DL)
* st,lsm6dsx
- Add support for ASM330LHB automotive MEMS sensor.
* ti,ads1100, ads1000
- New driver for these 16 bit ADCs.
* ti,tmp117
- Add support for older tmp116 device. Includes some general driver cleanup.
Staging driver drops
* adi,ade7854
- Driver was a very long way from compliant with IIO infrastructure and ABI.
If anyone wants a non staging version of this driver they are better off
starting from scratch. Hence drop it and the associated meter.h header.
Features
* adi,ad7441r
- Add DT binding to set sink current for digital input.
* semtech,sx9324,9360
- Support older register mapping from firmware designed for windows.
Core improvements.
* Move iio_trigger_poll() docs to next to the implementation and add a note
on expected caller context.
* Rename iio_trigger_poll_chained() to iio_trigger_poll_nested() so
as to use more standard / common terminology.
* Improve main ABI docs references to offset and scale for raw values by
making them consistent and clear.
Cleanups and minor fixes:
* adi,ad5592r
- Add GPIO names - useful for debug.
* adi,ad7441r
- Fix current input, loop powered mode configuration setup.
* adi,adis16475
- Fix wrong commented value for minimum advised lower rate.
* adi,admv1013
- Use devm_clk_get_enabled() to reduce boilerplate.
* adi,ads1210
- Fix wrong bits for writing config register (late fix and has
been broken a long time so not rushed upstream)
* amlogic,meson-saradc
- Improve cleanup in error handling if BL30 handshake fails.
* apex-embedded,stx104
- Migrate to regmap and use regmap_read_poll_timeout() to neatly handle
retries.
- Add local mutex to close various races.
- Use define U16_MAX rather than value for limit.
- Improve code readability with minor reorganization.
* atmel,ad91-sama5d2
- Drop trivial dead code.
* kionix,kx022a
- Drop unused structure element.
* linear,ltc2983
- Reorganize bindings doc to enable unevaluatedProperties to be set
in one place for all child nodes.
- Make binding for adi,custom-thermocouple accept signed values.
* maxim,max44000
- Add OF Device matching (of_match_table was not correctly set).
* maxim,max5522
- Missing static
* measurement-computing,cio-dac
- Fix wrong part name in comments.
- Migrate to regmap.
- Improve includes by replacing bitops.h with more direct bits.h
* qcom,pm8xxx-xoadc
- Remove a check that can never fail.
* renesas,rcar-gyroadc
- DT binding documentation improvements.
- Tidy up an unused warning with __maybe_unused.
* semtech,sx_common
- Drop docs for a structure element that doesn't exist.
* semtech,sx9500
- Drop ACPI_PTR() and of_match_ptr() protections that just complicate
the code / block some firmware registration types that would otherwise
work.
* sensiron,sps30
- Comment formatting tidy up.
* st,sensors
- Drop duplicate text in DT binding.
* st,stm32-adc
- Add some missing static markings.
* ti,ads1100
- Use correct return code in dev_err_probe() call.
* x-powers,axp20x_adc - precursor series to simplify addition of AXP192.
- General code cleanup / minor refactoring for better readabilty of code.
- Switch from boolean value to mask for adc_en2 field to avoid hard coding
a mask that will be different in AXP192
* tag 'iio-for-6.4a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (63 commits)
MAINTAINERS: Add ROHM BU27034
iio: light: ROHM BU27034 Ambient Light Sensor
dt-bindings: iio: light: Support ROHM BU27034
MAINTAINERS: Add IIO gain-time-scale helpers
iio: light: Add gain-time-scale helpers
doc: Make sysfs-bus-iio doc more exact
iio: dac: set variable max5522_channels storage-class-specifier to static
dt-bindings: iio: temperature: ltc2983: Make 'adi,custom-thermocouple' signed
dt-bindings: iio: temperature: ltc2983: Fix child node unevaluated properties
iio: addac: stx104: Use regmap_read_poll_timeout() for conversion poll
iio: addac: stx104: Migrate to the regmap API
iio: addac: stx104: Improve indentation in stx104_write_raw()
iio: addac: stx104: Use define rather than hardcoded limit for write val
iio: addac: stx104: Fix race condition when converting analog-to-digital
iio: addac: stx104: Fix race condition for stx104_write_raw()
dt-bindings: iio: st-sensors: Fix repeated text
staging: iio: resolver: ads1210: fix config mode
iio: adc: ti-ads1100: fix error code in probe()
iio: accel: add support for IIS328DQ variant
dt-bindings: iio: st-sensors: Add IIS328DQ accelerometer
...
Vladimir Oltean says:
====================
DSA trace events
This series introduces the "dsa" trace event class, with the following
events:
$ trace-cmd list | grep dsa
dsa
dsa:dsa_fdb_add_hw
dsa:dsa_mdb_add_hw
dsa:dsa_fdb_del_hw
dsa:dsa_mdb_del_hw
dsa:dsa_fdb_add_bump
dsa:dsa_mdb_add_bump
dsa:dsa_fdb_del_drop
dsa:dsa_mdb_del_drop
dsa:dsa_fdb_del_not_found
dsa:dsa_mdb_del_not_found
dsa:dsa_lag_fdb_add_hw
dsa:dsa_lag_fdb_add_bump
dsa:dsa_lag_fdb_del_hw
dsa:dsa_lag_fdb_del_drop
dsa:dsa_lag_fdb_del_not_found
dsa:dsa_vlan_add_hw
dsa:dsa_vlan_del_hw
dsa:dsa_vlan_add_bump
dsa:dsa_vlan_del_drop
dsa:dsa_vlan_del_not_found
These are useful to debug refcounting issues on CPU and DSA ports, where
entries may remain lingering, or may be removed too soon, depending on
bugs in higher layers of the network stack.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These are not as critical as the FDB/MDB trace points (I'm not aware of
outstanding VLAN related bugs), but maybe they are useful to somebody,
either debugging something or simply trying to learn more.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA performs non-trivial housekeeping of unicast and multicast addresses
on shared (CPU and DSA) ports, and puts a bit of pressure on higher
layers, requiring them to behave correctly (remove these addresses
exactly as many times as they were added). Otherwise, either addresses
linger around forever, or DSA returns -ENOENT complaining that entries
that were already deleted must be deleted again.
To aid debugging, introduce some trace points specifically for FDB and
MDB - that's where some of the bugs still are right now.
Some bugs I have seen were also due to race conditions, see:
- 630fd4822a ("net: dsa: flush switchdev workqueue on bridge join error path")
- a2614140dc ("net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN")
so it would be good to not disturb the timing too much, hence the choice
to use trace points vs regular dev_dbg().
I've had these for some time on my computer in a less polished form, and
they've proven useful. What I found most useful was to enable
CONFIG_BOOTTIME_TRACING, add "trace_event=dsa" to the kernel cmdline,
and run "cat /sys/kernel/debug/tracing/trace". This is to debug more
complex environments with network managers started by the init system,
things like that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 9593126dae ("media: venus: Add a handling of QC08C compressed
format") and commit cef92b14e6 ("media: venus: Add a handling of QC10C
compressed format") added support for the QC08C and QC10C compressed
formats respectively.
But these also caused a regression, because the new formats where added
at the beginning of the vdec_formats[] array and the vdec_inst_init()
function sets the default format output and capture using fixed indexes
of that array:
static void vdec_inst_init(struct venus_inst *inst)
{
...
inst->fmt_out = &vdec_formats[8];
inst->fmt_cap = &vdec_formats[0];
...
}
Since now V4L2_PIX_FMT_NV12 is not the first entry in the array anymore,
the default capture format is not set to that as it was done before.
Both commits changed the first index to keep inst->fmt_out default format
set to V4L2_PIX_FMT_H264, but did not update the latter to keep .fmt_out
default format set to V4L2_PIX_FMT_NV12.
Rather than updating the index to the current V4L2_PIX_FMT_NV12 position,
let's reorder the entries so that this format is the first entry again.
This would also make VIDIOC_ENUM_FMT report the V4L2_PIX_FMT_NV12 format
with an index 0 as it did before the QC08C and QC10C formats were added.
Fixes: 9593126dae ("media: venus: Add a handling of QC08C compressed format")
Fixes: cef92b14e6 ("media: venus: Add a handling of QC10C compressed format")
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
The decoder driver should clear the last_buffer_dequeued flag of the
capture queue upon receiving V4L2_DEC_CMD_START.
The last_buffer_dequeued flag is set upon receiving EOS (which always
happens upon receiving V4L2_DEC_CMD_STOP).
Without this patch, after issuing the V4L2_DEC_CMD_STOP and
V4L2_DEC_CMD_START, the vb2_dqbuf() function will always fail, even if
the buffers are completed by the hardware.
Fixes: beac82904a ("media: venus: make decoder compliant with stateful codec API")
Signed-off-by: Michał Krawczyk <mk@semihalf.com>
Signed-off-by: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Static code analyzer complains to unchecked return value.
The result of pci_reset_function() is unchecked.
Despite, the issue is on the FLR supported code path and in that
case reset can be done with pcie_flr(), the patch uses less invasive
approach by adding the result check of pci_reset_function().
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 7e2cf4feba ("qlcnic: change driver hardware interface mechanism")
Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver was intended from the start to be a wake-up source for the
system, however due to the absence of a suitable call to
device_set_wakeup_capable(), the device_may_wakeup() call used to decide
whether to enable the GPIO interrupt as a wake-up source would never
happen. Lookup the DT standard "wakeup-source" property and call
device_init_wakeup() to ensure the device is flagged as being wakeup
capable.
Reported-by: Matthew Lear <matthew.lear@broadcom.com>
Fixes: fd0f6851eb ("[media] rc: Add support for GPIO based IR Receiver driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
During the system suspend, vote for minimal interconnect bandwidth (1KiB)
to keep the interconnect path active for config access and also turn OFF
the resources like clock and PHY if there are no active devices connected
to the controller. For the controllers with active devices, the resources
are kept ON as removing the resources will trigger access violation during
the late end of suspend cycle as kernel tries to access the config space of
PCIe devices to mask the MSIs.
Also, it is not desirable to put the link into L2/L3 state as that
implies VDD supply will be removed and the devices may go into powerdown
state. This will affect the lifetime of storage devices like NVMe.
And finally, during resume, turn ON the resources if the controller was
truly suspended (resources OFF) and update the interconnect bandwidth
based on PCIe Gen speed.
Suggested-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Link: https://lore.kernel.org/r/20230403154922.20704-2-manivannan.sadhasivam@linaro.org
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Acked-by: Dhruva Gole <d-gole@ti.com>
syzbot detected a crash during log recovery:
XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791
XFS (loop0): Torn write (CRC failure) detected at log block 0x180. Truncating head block from 0x200.
XFS (loop0): Starting recovery (logdev: internal)
==================================================================
BUG: KASAN: slab-out-of-bounds in xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813
Read of size 8 at addr ffff88807e89f258 by task syz-executor132/5074
CPU: 0 PID: 5074 Comm: syz-executor132 Not tainted 6.2.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106
print_address_description+0x74/0x340 mm/kasan/report.c:306
print_report+0x107/0x1f0 mm/kasan/report.c:417
kasan_report+0xcd/0x100 mm/kasan/report.c:517
xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813
xfs_btree_lookup+0x346/0x12c0 fs/xfs/libxfs/xfs_btree.c:1913
xfs_btree_simple_query_range+0xde/0x6a0 fs/xfs/libxfs/xfs_btree.c:4713
xfs_btree_query_range+0x2db/0x380 fs/xfs/libxfs/xfs_btree.c:4953
xfs_refcount_recover_cow_leftovers+0x2d1/0xa60 fs/xfs/libxfs/xfs_refcount.c:1946
xfs_reflink_recover_cow+0xab/0x1b0 fs/xfs/xfs_reflink.c:930
xlog_recover_finish+0x824/0x920 fs/xfs/xfs_log_recover.c:3493
xfs_log_mount_finish+0x1ec/0x3d0 fs/xfs/xfs_log.c:829
xfs_mountfs+0x146a/0x1ef0 fs/xfs/xfs_mount.c:933
xfs_fs_fill_super+0xf95/0x11f0 fs/xfs/xfs_super.c:1666
get_tree_bdev+0x400/0x620 fs/super.c:1282
vfs_get_tree+0x88/0x270 fs/super.c:1489
do_new_mount+0x289/0xad0 fs/namespace.c:3145
do_mount fs/namespace.c:3488 [inline]
__do_sys_mount fs/namespace.c:3697 [inline]
__se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f89fa3f4aca
Code: 83 c4 08 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd5fb5ef8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00646975756f6e2c RCX: 00007f89fa3f4aca
RDX: 0000000020000100 RSI: 0000000020009640 RDI: 00007fffd5fb5f10
RBP: 00007fffd5fb5f10 R08: 00007fffd5fb5f50 R09: 000000000000970d
R10: 0000000000200800 R11: 0000000000000206 R12: 0000000000000004
R13: 0000555556c6b2c0 R14: 0000000000200800 R15: 00007fffd5fb5f50
</TASK>
The fuzzed image contains an AGF with an obviously garbage
agf_refcount_level value of 32, and a dirty log with a buffer log item
for that AGF. The ondisk AGF has a higher LSN than the recovered log
item. xlog_recover_buf_commit_pass2 reads the buffer, compares the
LSNs, and decides to skip replay because the ondisk buffer appears to be
newer.
Unfortunately, the ondisk buffer is corrupt, but recovery just read the
buffer with no buffer ops specified:
error = xfs_buf_read(mp->m_ddev_targp, buf_f->blf_blkno,
buf_f->blf_len, buf_flags, &bp, NULL);
Skipping the buffer leaves its contents in memory unverified. This sets
us up for a kernel crash because xfs_refcount_recover_cow_leftovers
reads the buffer (which is still around in XBF_DONE state, so no read
verification) and creates a refcountbt cursor of height 32. This is
impossible so we run off the end of the cursor object and crash.
Fix this by invoking the verifier on all skipped buffers and aborting
log recovery if the ondisk buffer is corrupt. It might be smarter to
force replay the log item atop the buffer and then see if it'll pass the
write verifier (like ext4 does) but for now let's go with the
conservative option where we stop immediately.
Link: https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994e
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
While fuzzing the data fork extent count on a btree-format directory
with xfs/375, I observed the following (excerpted) splat:
XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_bmap.c, line: 1208
------------[ cut here ]------------
WARNING: CPU: 0 PID: 43192 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
Call Trace:
<TASK>
xfs_iread_extents+0x1af/0x210 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xchk_dir_walk+0xb8/0x190 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xchk_parent_count_parent_dentries+0x41/0x80 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xchk_parent_validate+0x199/0x2e0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xchk_parent+0xdf/0x130 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xfs_scrub_metadata+0x2b8/0x730 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xfs_scrubv_metadata+0x38b/0x4d0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xfs_ioc_scrubv_metadata+0x111/0x160 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
xfs_file_ioctl+0x367/0xf50 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
__x64_sys_ioctl+0x82/0xa0
do_syscall_64+0x2b/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
The cause of this is a race condition in xfs_ilock_data_map_shared,
which performs an unlocked access to the data fork to guess which lock
mode it needs:
Thread 0 Thread 1
xfs_need_iread_extents
<observe no iext tree>
xfs_ilock(..., ILOCK_EXCL)
xfs_iread_extents
<observe no iext tree>
<check ILOCK_EXCL>
<load bmbt extents into iext>
<notice iext size doesn't
match nextents>
xfs_need_iread_extents
<observe iext tree>
xfs_ilock(..., ILOCK_SHARED)
<tear down iext tree>
xfs_iunlock(..., ILOCK_EXCL)
xfs_iread_extents
<observe no iext tree>
<check ILOCK_EXCL>
*BOOM*
Fix this race by adding a flag to the xfs_ifork structure to indicate
that we have not yet read in the extent records and changing the
predicate to look at the flag state, not if_height. The memory barrier
ensures that the flag will not be set until the very end of the
function.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
In commit fe08cc5044 we reworked the valid superblock version
checks. If it is a V5 filesystem, it is always valid, then we
checked if the version was less than V4 (reject) and then checked
feature fields in the V4 flags to determine if it was valid.
What we missed was that if the version is not V4 at this point,
we shoudl reject the fs. i.e. the check current treats V6+
filesystems as if it was a v4 filesystem. Fix this.
cc: stable@vger.kernel.org
Fixes: fe08cc5044 ("xfs: open code sb verifier feature checks")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Tony Nguyen says:
====================
iavf: fix racing in VLANs
Ahmed Zaki says:
This patchset mainly fixes a racing issue in the iavf where the number of
VLANs in the vlan_filter_list might be more than the PF limit. To fix that,
we get rid of the cvlans and svlans bitmaps and keep all the required info
in the list.
The second patch adds two new states that are needed so that we keep the
VLAN info while the interface goes DOWN:
-- DISABLE (notify PF, but keep the filter in the list)
-- INACTIVE (dev is DOWN, filter is removed from PF)
Finally, the current code keeps each state in a separate bit field, which
is error prone. The first patch refactors that by replacing all bits with
a single enum. The changes are minimal where each bit change is replaced
with the new state value.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: remove active_cvlans and active_svlans bitmaps
iavf: refactor VLAN filter states
====================
Link: https://lore.kernel.org/r/20230407210730.3046149-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix not setting Dath Path for broadcast sink
- Fix not cleaning up on LE Connection failure
- SCO: Fix possible circular locking dependency
- L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
- Fix race condition in hidp_session_thread
- btbcm: Fix logic error in forming the board name
- btbcm: Fix use after free in btsdio_remove
* tag 'for-net-2023-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
Bluetooth: Set ISO Data Path on broadcast sink
Bluetooth: hci_conn: Fix possible UAF
Bluetooth: SCO: Fix possible circular locking dependency sco_sock_getsockopt
Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm
bluetooth: btbcm: Fix logic error in forming the board name.
Bluetooth: btsdio: fix use after free bug in btsdio_remove due to race condition
Bluetooth: Fix race condition in hidp_session_thread
Bluetooth: Fix printing errors if LE Connection times out
Bluetooth: hci_conn: Fix not cleaning up on LE Connection failure
====================
Link: https://lore.kernel.org/r/20230410172718.4067798-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a new ICM area for that memory, so we need to handle it as we
did for the others ICM types.
The patch added that specific pool with its requirements and management.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The send engine should be ready to handle more opcodes
in addition to RDMA_WRITE/RDMA_READ.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add new WQE type: FLOW_TBL_ACCESS, which will be used for
writing modify header arguments.
This type has specific control segment and special data segment.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add enum value for modify-header argument object and mlx5_bits
for the related capabilities.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In STEv1 counter action can be set either by filling counter ID on STE, in
which case it is executed before other actions on this STE, or as a single
action, in which case it is executed in accordance with the actions order.
FW steering on STEv1 devices implements counter as counter ID on STE, and
this counter is set on the last STE.
Fix SMFS to be consistent with this behaviour - move TX counter to the
last STE, this way the counter will include all actions of the previous STEs
that might have changed packet headers length, e.g. encap, vlan push, etc.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Create a new profile for SFs in order to disable the command cache.
Each function command cache consumes ~500KB of memory, when using a
large number of SFs this savings is notable on memory constarined
systems.
Use a new profile to provide for future differences between SFs and PFs.
The mr_cache not used for non-PF functions, so it is excluded from the
new profile.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Pass target struct net_device to mdb attach/detach handler in order to
expose the port name to the new tracepoints. Implemented following
tracepoints:
- Attach mdb to port.
- Detach mdb from port.
Usage example:
># cd /sys/kernel/debug/tracing
># echo mlx5:mlx5_esw_bridge_port_mdb_attach >> set_event
># cat trace
...
kworker/0:0-19071 [000] ..... 259004.253848: mlx5_esw_bridge_port_mdb_attach: net_device=enp8s0f0_0 addr=33:33:ff:00:00:01 vid=0 num_ports=1 offloaded=1
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Implement support for add/del SWITCHDEV_OBJ_ID_PORT_MDB events. For mdb
destination addresses configure egress table rules to replicate to per-port
multicast tables of all ports that are member of the multicast group as
illustrated by 'MDB1' rule in the following diagram:
+--------+--+
+---------------------------------------> Port 1 | |
| +-^------+--+
| |
| |
+-----------------------------------------+ | +---------------------------+ |
| EGRESS table | | +--> PORT 1 multicast table | |
+----------------------------------+ +-----------------------------------------+ | | +---------------------------+ |
| INGRESS table | | | | | | | |
+----------------------------------+ | dst_mac=P1,vlan=X -> pop vlan, goto P1 +--+ | | FG0: | |
| | | dst_mac=P1,vlan=Y -> pop vlan, goto P1 | | | src_port=dst_port -> drop | |
| src_mac=M1,vlan=X -> goto egress +---> dst_mac=P2,vlan=X -> pop vlan, goto P2 +--+ | | FG1: | |
| ... | | dst_mac=P2,vlan=Y -> goto P2 | | | | VLAN X -> pop, goto port | |
| | | dst_mac=MDB1,vlan=Y -> goto mcast P1,P2 +-----+ | ... | |
+----------------------------------+ | | | | | VLAN Y -> pop, goto port +-------+
+-----------------------------------------+ | | | FG3: |
| | | matchall -> goto port |
| | | |
| | +---------------------------+
| |
| |
| | +--------+--+
+---------------------------------------> Port 2 | |
| +-^------+--+
| |
| |
| +---------------------------+ |
+--> PORT 2 multicast table | |
+---------------------------+ |
| | |
| FG0: | |
| src_port=dst_port -> drop | |
| FG1: | |
| VLAN X -> pop, goto port | |
| ... | |
| | |
| FG3: | |
| matchall -> goto port +-------+
| |
+---------------------------+
MDB is managed by extending mlx5 bridge to store an entry in
mlx5_esw_bridge->mdb_list linked list (used to iterate over all offloaded
MDBs) and mlx5_esw_bridge->mdb_ht hash table (used to lookup existing MDB
by MAC+VLAN). Every MDB entry can be attached to arbitrary amount of bridge
ports that are stored in mlx5_esw_bridge_mdb_entry->ports xarray in order
to allow both efficient lookup of the port and also iteration over all
ports that the entry is attached to. Every time MDB is attached/detached
to/from a port, the hardware rule is recreated with list of destinations
corresponding to all attached ports. When the entry is detached from the
last port it is removed from mdb and destroyed which means that the ports
xarray also acts as implicit reference counting mechanism.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When VLAN with 'untagged' flag is created on port also provision the
per-port multicast table rule to pop the VLAN during packet replication.
This functionality must be in per-port table because some subset of ports
that are member of multicast group can require just a match on VLAN (trunk
mode) while other subset can be configured to remove the VLAN tag from
packets received on the ports (access mode).
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Multicast replication requires adding one more level of FDB_BR_OFFLOAD
priority flow tables. The new level is used for per-port multicast-specific
tables that have following flow groups structure (flow highest to lowest
priority):
- Flow group of size one that matches on source port metadata. This will
have a static single rule that prevent packets from being replicated to
their source port.
- Flow group of size one that matches all packets and forwards them to the
port that owns the table.
Initialize the table dynamically on all bridge ports when adding a port to
the bridge that has multicast enabled and on all existing bridge ports when
receiving multicast enable notification.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Handle SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED attribute notification to
dynamically toggle bridge multicast offload. Set new
MLX5_ESW_BRIDGE_MCAST_FLAG bridge flag when multicast offload is enabled.
Put multicast-specific code into new bridge_mcast.c file.
When initializing bridge multicast pipeline create a static rule for
snooping on IGMP traffic and three rules for snooping on MLD traffic (for
query, report and done message types). Note that matching MLD traffic
requires having flexparser MLX5_FLEX_PROTO_ICMPV6 capability enabled.
By default Linux bridge is created with multicast enabled which can be
modified by 'mcast_snooping' argument:
$ ip link set name my_bridge type bridge mcast_snooping 0
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The pattern when function looks up a port by vport_num+vhca_id tuple in
order to just obtain its parent bridge is repeated multiple times in
bridge.c file. Further commits in this series use the pattern even more.
Extract the pattern to standalone mlx5_esw_bridge_from_port_lookup()
function to improve code readability.
This commits doesn't change functionality.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Following patches in series will require accessing flow tables and groups
sizes, table levels and struct mlx5_esw_bridge from new the new source file
dedicated to multicast code. Expose these data in bridge_priv.h to reduce
clutter in following patches that will implement the actual functionality.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Bridge ingress and egress tables got more flow groups recently for QinQ
support and will get more in following patches of this series. Increase the
sizes of the tables to allow offloading more flows in each mode.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Commit 56124d6c87 ("fsverity: support enabling with tree block size <
PAGE_SIZE") changed FS_IOC_ENABLE_VERITY to use __kernel_read() to read
the file's data, instead of direct pagecache accesses.
An unintended consequence of this is that the
'WARN_ON_ONCE(!(file->f_mode & FMODE_READ))' in __kernel_read() became
reachable by fuzz tests. This happens if FS_IOC_ENABLE_VERITY is called
on a fd opened with access mode 3, which means "ioctl access only".
Arguably, FS_IOC_ENABLE_VERITY should work on ioctl-only fds. But
ioctl-only fds are a weird Linux extension that is rarely used and that
few people even know about. (The documentation for FS_IOC_ENABLE_VERITY
even specifically says it requires O_RDONLY.) It's probably not
worthwhile to make the ioctl internally open a new fd just to handle
this case. Thus, just reject the ioctl on such fds for now.
Fixes: 56124d6c87 ("fsverity: support enabling with tree block size < PAGE_SIZE")
Reported-by: syzbot+51177e4144d764827c45@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=2281afcbbfa8fdb92f9887479cc0e4180f1c6b28
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230406215106.235829-1-ebiggers@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
The new Merkle tree construction algorithm is a bit fragile in that it
may overflow the 'root_hash' array if the tree actually generated does
not match the calculated tree parameters.
This should never happen unless there is a filesystem bug that allows
the file size to change despite deny_write_access(), or a bug in the
Merkle tree logic itself. Regardless, it's fairly easy to check for
buffer overflow here, so let's do so.
This is a robustness improvement only; this case is not currently known
to be reachable. I've added a Fixes tag anyway, since I recommend that
this be included in kernels that have the mentioned commit.
Fixes: 56124d6c87 ("fsverity: support enabling with tree block size < PAGE_SIZE")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230328041505.110162-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
This feature is a mess -- the hash function has been broken for the
entire 15 years of its existence if you create names with extended ascii
bytes; metadump name obfuscation has silently failed for just as long;
and the feature clashes horribly with the UTF8 encodings that most
systems use today. There is exactly one fstest for this feature.
In other words, this feature is crap. Let's deprecate it now so we can
remove it from the codebase in 2030.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Now that we've made kernel and userspace use the same tolower code for
computing directory index hashes, add that to the selftest code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Back in the old days, the "ascii-ci" feature was created to implement
case-insensitive directory entry lookups for latin1-encoded names and
remove the large overhead of Samba's case-insensitive lookup code. UTF8
names were not allowed, but nobody explicitly wrote in the documentation
that this was only expected to work if the system used latin1 names.
The kernel tolower function was selected to prepare names for hashed
lookups.
There's a major discrepancy in the function that computes directory entry
hashes for filesystems that have ASCII case-insensitive lookups enabled.
The root of this is that the kernel and glibc's tolower implementations
have differing behavior for extended ASCII accented characters. I wrote
a program to spit out characters for which the tolower() return value is
different from the input:
glibc tolower:
65:A 66:B 67:C 68:D 69:E 70:F 71:G 72:H 73:I 74:J 75:K 76:L 77:M 78:N
79:O 80:P 81:Q 82:R 83:S 84:T 85:U 86:V 87:W 88:X 89:Y 90:Z
kernel tolower:
65:A 66:B 67:C 68:D 69:E 70:F 71:G 72:H 73:I 74:J 75:K 76:L 77:M 78:N
79:O 80:P 81:Q 82:R 83:S 84:T 85:U 86:V 87:W 88:X 89:Y 90:Z 192:À 193:Á
194:Â 195:Ã 196:Ä 197:Å 198:Æ 199:Ç 200:È 201:É 202:Ê 203:Ë 204:Ì 205:Í
206:Î 207:Ï 208:Ð 209:Ñ 210:Ò 211:Ó 212:Ô 213:Õ 214:Ö 215:× 216:Ø 217:Ù
218:Ú 219:Û 220:Ü 221:Ý 222:Þ
Which means that the kernel and userspace do not agree on the hash value
for a directory filename that contains those higher values. The hash
values are written into the leaf index block of directories that are
larger than two blocks in size, which means that xfs_repair will flag
these directories as having corrupted hash indexes and rewrite the index
with hash values that the kernel now will not recognize.
Because the ascii-ci feature is not frequently enabled and the kernel
touches filesystems far more frequently than xfs_repair does, fix this
by encoding the kernel's toupper predicate and tolower functions into
libxfs. Give the new functions less provocative names to make it really
obvious that this is a pre-hash name preparation function, and nothing
else. This change makes userspace's behavior consistent with the
kernel.
Found by auditing obfuscate_name in xfs_metadump as part of working on
parent pointers, wondering how it could possibly work correctly with ci
filesystems, writing a test tool to create a directory with
hash-colliding names, and watching xfs_repair flag it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Strengthen the rmap btree record checker a little more by comparing
OWN_REFCBT reverse mappings against the refcount btrees.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>