Commit Graph

1185157 Commits

Author SHA1 Message Date
Laurent Pinchart
e3a69496a1 media: Prefer designated initializers over memset for subdev pad ops
Structures passed to subdev pad operations are all zero-initialized, but
not always with the same kind of code constructs. While most drivers
used designated initializers, which zero all the fields that are not
specified, when declaring variables, some use memset(). Those two
methods lead to the same end result, and, depending on compiler
optimizations, may even be completely equivalent, but they're not
consistent.

Improve coding style consistency by using designated initializers
instead of calling memset(). Where applicable, also move the variables
to inner scopes of for loops to ensure correct initialization in all
iterations.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Lad Prabhakar <prabhakar.csengg@gmail.com> # For am437x
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-04-12 09:46:06 +02:00
Laurent Pinchart
ecefa105cc media: Zero-initialize all structures passed to subdev pad operations
Several drivers call subdev pad operations, passing structures that are
not fully zeroed. While the drivers initialize the fields they care
about explicitly, this results in reserved fields having uninitialized
values. Future kernel API changes that make use of those fields thus
risk breaking proper driver operation in ways that could be hard to
detect.

To avoid this, make the code more robust by zero-initializing all the
structures passed to subdev pad operation. Maintain a consistent coding
style by preferring designated initializers (which zero-initialize all
the fields that are not specified) over memset() where possible, and
make variable declarations local to inner scopes where applicable. One
notable exception to this rule is in the ipu3 driver, where a memset()
is needed as the structure is not a local variable but a function
parameter provided by the caller.

Not all fields of those structures can be initialized when declaring the
variables, as the values for those fields are computed later in the
code. Initialize the 'which' field in all cases, and other fields when
the variable declaration is so close to the v4l2_subdev_call() call that
it keeps all the context easily visible when reading the code, to avoid
hindering readability.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org> # For vimc
Reviewed-by: Lad Prabhakar <prabhakar.csengg@gmail.com> # For am437x
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> # For drivers/staging/media/imx/
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-04-12 09:46:06 +02:00
Laurent Pinchart
a8ca0cf1ff media: imx-jpeg: Fix incorrect indentation
Variable initialization code in notify_src_chg() is incorrectly
indented. Fix it.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-04-12 09:46:06 +02:00
Laurent Pinchart
a1e259872f media: Fix indentation issues introduced by subdev-wide state struct
Commit 0d346d2a6f ("media: v4l2-subdev: add subdev-wide state struct")
applied a large media tree-wide change produced by coccinelle. It was so
large that a set of identical indentation issues went unnoticed during
review. Fix them.

While at it, and because it's easy to review both changes together, add
a trailing comma for the last (and only) struct member initialization of
the related structures, to avoid future changes should new fields need
to be initialized.

Fixes: 0d346d2a6f ("media: v4l2-subdev: add subdev-wide state struct")
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-04-12 09:46:06 +02:00
Greg Kroah-Hartman
fba51482b6 Merge tag 'iio-for-6.4a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next
Jonathan writes:

1st set of IIO new device support, features and cleanups for the 6.4 cycle.

New device support
* bosch,bmp280
  - Add support for BMP580 - includes significant refactoring and general
    driver cleanup + support for non-volatile memory for trimming and config
    parameters.
* rohm BU27034
  - New driver for this 3 channel ambient light sensor.
  - New support library for devices where both integration time and
    amplifier gain are configurable.  In these cases a scale change
    may require changing bother underlying values. This library module
    provides code to help with this.
* st,accel
  - Add support for IIS328DQ (ID only as compatible wtih LIS331DL)
* st,lsm6dsx
  - Add support for ASM330LHB automotive MEMS sensor.
* ti,ads1100, ads1000
  - New driver for these 16 bit ADCs.
* ti,tmp117
  - Add support for older tmp116 device. Includes some general driver cleanup.

Staging driver drops
* adi,ade7854
  - Driver was a very long way from compliant with IIO infrastructure and ABI.
    If anyone wants a non staging version of this driver they are better off
    starting from scratch. Hence drop it and the associated meter.h header.

Features
* adi,ad7441r
  - Add DT binding to set sink current for digital input.
* semtech,sx9324,9360
  - Support older register mapping from firmware designed for windows.

Core improvements.
* Move iio_trigger_poll() docs to next to the implementation and add a note
  on expected caller context.
* Rename iio_trigger_poll_chained() to iio_trigger_poll_nested() so
  as to use more standard / common terminology.
* Improve main ABI docs references to offset and scale for raw values by
  making them consistent and clear.

Cleanups and minor fixes:
* adi,ad5592r
  - Add GPIO names - useful for debug.
* adi,ad7441r
  - Fix current input, loop powered mode configuration setup.
* adi,adis16475
  - Fix wrong commented value for minimum advised lower rate.
* adi,admv1013
  - Use devm_clk_get_enabled() to reduce boilerplate.
* adi,ads1210
  - Fix wrong bits for writing config register (late fix and has
    been broken a long time so not rushed upstream)
* amlogic,meson-saradc
  - Improve cleanup in error handling if BL30 handshake fails.
* apex-embedded,stx104
  - Migrate to regmap and use regmap_read_poll_timeout() to neatly handle
    retries.
  - Add local mutex to close various races.
  - Use define U16_MAX rather than value for limit.
  - Improve code readability with minor reorganization.
* atmel,ad91-sama5d2
  - Drop trivial dead code.
* kionix,kx022a
  - Drop unused structure element.
* linear,ltc2983
  - Reorganize bindings doc to enable unevaluatedProperties to be set
    in one place for all child nodes.
  - Make binding for adi,custom-thermocouple accept signed values.
* maxim,max44000
  - Add OF Device matching (of_match_table was not correctly set).
* maxim,max5522
  - Missing static
* measurement-computing,cio-dac
  - Fix wrong part name in comments.
  - Migrate to regmap.
  - Improve includes by replacing bitops.h with more direct bits.h
* qcom,pm8xxx-xoadc
  - Remove a check that can never fail.
* renesas,rcar-gyroadc
  - DT binding documentation improvements.
  - Tidy up an unused warning with __maybe_unused.
* semtech,sx_common
  - Drop docs for a structure element that doesn't exist.
* semtech,sx9500
  - Drop ACPI_PTR() and of_match_ptr() protections that just complicate
    the code / block some firmware registration types that would otherwise
    work.
* sensiron,sps30
  - Comment formatting tidy up.
* st,sensors
  - Drop duplicate text in DT binding.
* st,stm32-adc
  - Add some missing static markings.
* ti,ads1100
  - Use correct return code in dev_err_probe() call.
* x-powers,axp20x_adc - precursor series to simplify addition of AXP192.
  - General code cleanup / minor refactoring for better readabilty of code.
  - Switch from boolean value to mask for adc_en2 field to avoid hard coding
    a mask that will be different in AXP192

* tag 'iio-for-6.4a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (63 commits)
  MAINTAINERS: Add ROHM BU27034
  iio: light: ROHM BU27034 Ambient Light Sensor
  dt-bindings: iio: light: Support ROHM BU27034
  MAINTAINERS: Add IIO gain-time-scale helpers
  iio: light: Add gain-time-scale helpers
  doc: Make sysfs-bus-iio doc more exact
  iio: dac: set variable max5522_channels storage-class-specifier to static
  dt-bindings: iio: temperature: ltc2983: Make 'adi,custom-thermocouple' signed
  dt-bindings: iio: temperature: ltc2983: Fix child node unevaluated properties
  iio: addac: stx104: Use regmap_read_poll_timeout() for conversion poll
  iio: addac: stx104: Migrate to the regmap API
  iio: addac: stx104: Improve indentation in stx104_write_raw()
  iio: addac: stx104: Use define rather than hardcoded limit for write val
  iio: addac: stx104: Fix race condition when converting analog-to-digital
  iio: addac: stx104: Fix race condition for stx104_write_raw()
  dt-bindings: iio: st-sensors: Fix repeated text
  staging: iio: resolver: ads1210: fix config mode
  iio: adc: ti-ads1100: fix error code in probe()
  iio: accel: add support for IIS328DQ variant
  dt-bindings: iio: st-sensors: Add IIS328DQ accelerometer
  ...
2023-04-12 09:45:34 +02:00
David S. Miller
bbda0f0d15 Merge branch 'dsa-trace-events'
Vladimir Oltean says:

====================
DSA trace events

This series introduces the "dsa" trace event class, with the following
events:

$ trace-cmd list | grep dsa
dsa
dsa:dsa_fdb_add_hw
dsa:dsa_mdb_add_hw
dsa:dsa_fdb_del_hw
dsa:dsa_mdb_del_hw
dsa:dsa_fdb_add_bump
dsa:dsa_mdb_add_bump
dsa:dsa_fdb_del_drop
dsa:dsa_mdb_del_drop
dsa:dsa_fdb_del_not_found
dsa:dsa_mdb_del_not_found
dsa:dsa_lag_fdb_add_hw
dsa:dsa_lag_fdb_add_bump
dsa:dsa_lag_fdb_del_hw
dsa:dsa_lag_fdb_del_drop
dsa:dsa_lag_fdb_del_not_found
dsa:dsa_vlan_add_hw
dsa:dsa_vlan_del_hw
dsa:dsa_vlan_add_bump
dsa:dsa_vlan_del_drop
dsa:dsa_vlan_del_not_found

These are useful to debug refcounting issues on CPU and DSA ports, where
entries may remain lingering, or may be removed too soon, depending on
bugs in higher layers of the network stack.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12 08:36:07 +01:00
Vladimir Oltean
02020bd70f net: dsa: add trace points for VLAN operations
These are not as critical as the FDB/MDB trace points (I'm not aware of
outstanding VLAN related bugs), but maybe they are useful to somebody,
either debugging something or simply trying to learn more.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12 08:36:07 +01:00
Vladimir Oltean
9538ebce88 net: dsa: add trace points for FDB/MDB operations
DSA performs non-trivial housekeeping of unicast and multicast addresses
on shared (CPU and DSA) ports, and puts a bit of pressure on higher
layers, requiring them to behave correctly (remove these addresses
exactly as many times as they were added). Otherwise, either addresses
linger around forever, or DSA returns -ENOENT complaining that entries
that were already deleted must be deleted again.

To aid debugging, introduce some trace points specifically for FDB and
MDB - that's where some of the bugs still are right now.

Some bugs I have seen were also due to race conditions, see:
- 630fd4822a ("net: dsa: flush switchdev workqueue on bridge join error path")
- a2614140dc ("net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN")

so it would be good to not disturb the timing too much, hence the choice
to use trace points vs regular dev_dbg().

I've had these for some time on my computer in a less polished form, and
they've proven useful. What I found most useful was to enable
CONFIG_BOOTTIME_TRACING, add "trace_event=dsa" to the kernel cmdline,
and run "cat /sys/kernel/debug/tracing/trace". This is to debug more
complex environments with network managers started by the init system,
things like that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12 08:36:07 +01:00
Fritz Koenig
a47a3ae5fc media: venus: Correct P010 buffer alignment
According to msm_media_info.h the correct alignment
for the stride of P010 buffers is 128.

Signed-off-by: Fritz Koenig <frkoenig@chromium.org>
Reviewed-by: Vikash Garodia <quic_vgarodia@quicinc.com>
Signed-off-by: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-04-12 09:35:44 +02:00
Dikshita Agarwal
7493db46e4 venus: venc: add handling for VIDIOC_ENCODER_CMD
Add handling for below commands in encoder:
1. V4L2_ENC_CMD_STOP
2. V4L2_ENC_CMD_START

Signed-off-by: Dikshita Agarwal <quic_dikshita@quicinc.com>
Signed-off-by: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-04-12 09:35:44 +02:00
Viswanath Boma
90655e2e79 venus: Add support for min/max qp range.
Currently QP range set from client is not communicated to video firmware.
Add support for the QP range HFI to set the same to firmware.

Signed-off-by: Viswanath Boma <quic_vboma@quicinc.com>
Signed-off-by: Vikash Garodia <quic_vgarodia@quicinc.com>
Signed-off-by: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-04-12 09:35:44 +02:00
Javier Martinez Canillas
a9d45ec74c media: venus: dec: Fix capture formats enumeration order
Commit 9593126dae ("media: venus: Add a handling of QC08C compressed
format") and commit cef92b14e6 ("media: venus: Add a handling of QC10C
compressed format") added support for the QC08C and QC10C compressed
formats respectively.

But these also caused a regression, because the new formats where added
at the beginning of the vdec_formats[] array and the vdec_inst_init()
function sets the default format output and capture using fixed indexes
of that array:

static void vdec_inst_init(struct venus_inst *inst)
{
...
	inst->fmt_out = &vdec_formats[8];
	inst->fmt_cap = &vdec_formats[0];
...
}

Since now V4L2_PIX_FMT_NV12 is not the first entry in the array anymore,
the default capture format is not set to that as it was done before.

Both commits changed the first index to keep inst->fmt_out default format
set to V4L2_PIX_FMT_H264, but did not update the latter to keep .fmt_out
default format set to V4L2_PIX_FMT_NV12.

Rather than updating the index to the current V4L2_PIX_FMT_NV12 position,
let's reorder the entries so that this format is the first entry again.

This would also make VIDIOC_ENUM_FMT report the V4L2_PIX_FMT_NV12 format
with an index 0 as it did before the QC08C and QC10C formats were added.

Fixes: 9593126dae ("media: venus: Add a handling of QC08C compressed format")
Fixes: cef92b14e6 ("media: venus: Add a handling of QC10C compressed format")
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-04-12 09:35:44 +02:00
Viswanath Boma
bfe1326573 venus: Fix for H265 decoding failure.
Aligned the mismatch of persist1 and scratch1 buffer calculation,
as per the firmware requirements.

Signed-off-by: Vikash Garodia <vgarodia@qti.qualcomm.com>
Signed-off-by: Viswanath Boma <quic_vboma@quicinc.com>
Tested-by: Nathan Hebert <nhebert@chromium.org>
Signed-off-by: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-04-12 09:35:44 +02:00
Michał Krawczyk
50248ad9f1 media: venus: dec: Fix handling of the start cmd
The decoder driver should clear the last_buffer_dequeued flag of the
capture queue upon receiving V4L2_DEC_CMD_START.

The last_buffer_dequeued flag is set upon receiving EOS (which always
happens upon receiving V4L2_DEC_CMD_STOP).

Without this patch, after issuing the V4L2_DEC_CMD_STOP and
V4L2_DEC_CMD_START, the vb2_dqbuf() function will always fail, even if
the buffers are completed by the hardware.

Fixes: beac82904a ("media: venus: make decoder compliant with stateful codec API")

Signed-off-by: Michał Krawczyk <mk@semihalf.com>
Signed-off-by: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-04-12 09:35:44 +02:00
Denis Plotnikov
7573099e10 qlcnic: check pci_reset_function result
Static code analyzer complains to unchecked return value.
The result of pci_reset_function() is unchecked.
Despite, the issue is on the FLR supported code path and in that
case reset can be done with pcie_flr(), the patch uses less invasive
approach by adding the result check of pci_reset_function().

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 7e2cf4feba ("qlcnic: change driver hardware interface mechanism")
Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12 08:32:38 +01:00
Florian Fainelli
9c592f8ab1 media: rc: gpio-ir-recv: Fix support for wake-up
The driver was intended from the start to be a wake-up source for the
system, however due to the absence of a suitable call to
device_set_wakeup_capable(), the device_may_wakeup() call used to decide
whether to enable the GPIO interrupt as a wake-up source would never
happen. Lookup the DT standard "wakeup-source" property and call
device_init_wakeup() to ensure the device is flagged as being wakeup
capable.

Reported-by: Matthew Lear <matthew.lear@broadcom.com>
Fixes: fd0f6851eb ("[media] rc: Add support for GPIO based IR Receiver driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-04-12 09:19:05 +02:00
Florian Fainelli
9fffbf9ca0 dt-bindings: media: gpio-ir-receiver: Document wakeup-souce property
The GPIO IR receiver can be used as a wake-up source for the system,
document that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-04-12 09:19:05 +02:00
Manivannan Sadhasivam
a4c716706f dt-bindings: PCI: qcom: Update maintainers entry
Stan is no longer working with MMSOL and expressed his interest to not
continue maintaining Qcom PCIe driver. Since I took over the driver
maintainership, I'm stepping in to maintain the binding also.

Link: https://lore.kernel.org/r/20230308082424.140224-2-manivannan.sadhasivam@linaro.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2023-04-12 08:49:04 +02:00
Manivannan Sadhasivam
c0e1eb441b PCI: qcom: Enable async probe by default
Qcom PCIe RC driver waits for the PHY link to be up during the probe;
this consumes several milliseconds during boot.

Enable async probe by default so that other drivers can load in parallel
while this driver waits for the link to be up.

Suggested-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230320064644.5217-1-manivannan.sadhasivam@linaro.org
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
2023-04-12 08:49:04 +02:00
Manivannan Sadhasivam
ad9b9b6e36 PCI: qcom: Add support for system suspend and resume
During the system suspend, vote for minimal interconnect bandwidth (1KiB)
to keep the interconnect path active for config access and also turn OFF
the resources like clock and PHY if there are no active devices connected
to the controller. For the controllers with active devices, the resources
are kept ON as removing the resources will trigger access violation during
the late end of suspend cycle as kernel tries to access the config space of
PCIe devices to mask the MSIs.

Also, it is not desirable to put the link into L2/L3 state as that
implies VDD supply will be removed and the devices may go into powerdown
state. This will affect the lifetime of storage devices like NVMe.

And finally, during resume, turn ON the resources if the controller was
truly suspended (resources OFF) and update the interconnect bandwidth
based on PCIe Gen speed.

Suggested-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Link: https://lore.kernel.org/r/20230403154922.20704-2-manivannan.sadhasivam@linaro.org
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Acked-by: Dhruva Gole <d-gole@ti.com>
2023-04-12 08:48:53 +02:00
Ye Bin
8ee81ed581 xfs: fix BUG_ON in xfs_getbmap()
There's issue as follows:
XFS: Assertion failed: (bmv->bmv_iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c, line: 329
------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:102!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 14612 Comm: xfs_io Not tainted 6.3.0-rc2-next-20230315-00006-g2729d23ddb3b-dirty #422
RIP: 0010:assfail+0x96/0xa0
RSP: 0018:ffffc9000fa178c0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff888179a18000
RDX: 0000000000000000 RSI: ffff888179a18000 RDI: 0000000000000002
RBP: 0000000000000000 R08: ffffffff8321aab6 R09: 0000000000000000
R10: 0000000000000001 R11: ffffed1105f85139 R12: ffffffff8aacc4c0
R13: 0000000000000149 R14: ffff888269f58000 R15: 000000000000000c
FS:  00007f42f27a4740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000b92388 CR3: 000000024f006000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 xfs_getbmap+0x1a5b/0x1e40
 xfs_ioc_getbmap+0x1fd/0x5b0
 xfs_file_ioctl+0x2cb/0x1d50
 __x64_sys_ioctl+0x197/0x210
 do_syscall_64+0x39/0xb0
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Above issue may happen as follows:
         ThreadA                       ThreadB
do_shared_fault
 __do_fault
  xfs_filemap_fault
   __xfs_filemap_fault
    filemap_fault
                             xfs_ioc_getbmap -> Without BMV_IF_DELALLOC flag
			      xfs_getbmap
			       xfs_ilock(ip, XFS_IOLOCK_SHARED);
			       filemap_write_and_wait
 do_page_mkwrite
  xfs_filemap_page_mkwrite
   __xfs_filemap_fault
    xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
    iomap_page_mkwrite
     ...
     xfs_buffered_write_iomap_begin
      xfs_bmapi_reserve_delalloc -> Allocate delay extent
                              xfs_ilock_data_map_shared(ip)
	                      xfs_getbmap_report_one
			       ASSERT((bmv->bmv_iflags & BMV_IF_DELALLOC) != 0)
	                        -> trigger BUG_ON

As xfs_filemap_page_mkwrite() only hold XFS_MMAPLOCK_SHARED lock, there's
small window mkwrite can produce delay extent after file write in xfs_getbmap().
To solve above issue, just skip delalloc extents.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-04-12 15:49:44 +10:00
Darrick J. Wong
22ed903eee xfs: verify buffer contents when we skip log replay
syzbot detected a crash during log recovery:

XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791
XFS (loop0): Torn write (CRC failure) detected at log block 0x180. Truncating head block from 0x200.
XFS (loop0): Starting recovery (logdev: internal)
==================================================================
BUG: KASAN: slab-out-of-bounds in xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813
Read of size 8 at addr ffff88807e89f258 by task syz-executor132/5074

CPU: 0 PID: 5074 Comm: syz-executor132 Not tainted 6.2.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106
 print_address_description+0x74/0x340 mm/kasan/report.c:306
 print_report+0x107/0x1f0 mm/kasan/report.c:417
 kasan_report+0xcd/0x100 mm/kasan/report.c:517
 xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813
 xfs_btree_lookup+0x346/0x12c0 fs/xfs/libxfs/xfs_btree.c:1913
 xfs_btree_simple_query_range+0xde/0x6a0 fs/xfs/libxfs/xfs_btree.c:4713
 xfs_btree_query_range+0x2db/0x380 fs/xfs/libxfs/xfs_btree.c:4953
 xfs_refcount_recover_cow_leftovers+0x2d1/0xa60 fs/xfs/libxfs/xfs_refcount.c:1946
 xfs_reflink_recover_cow+0xab/0x1b0 fs/xfs/xfs_reflink.c:930
 xlog_recover_finish+0x824/0x920 fs/xfs/xfs_log_recover.c:3493
 xfs_log_mount_finish+0x1ec/0x3d0 fs/xfs/xfs_log.c:829
 xfs_mountfs+0x146a/0x1ef0 fs/xfs/xfs_mount.c:933
 xfs_fs_fill_super+0xf95/0x11f0 fs/xfs/xfs_super.c:1666
 get_tree_bdev+0x400/0x620 fs/super.c:1282
 vfs_get_tree+0x88/0x270 fs/super.c:1489
 do_new_mount+0x289/0xad0 fs/namespace.c:3145
 do_mount fs/namespace.c:3488 [inline]
 __do_sys_mount fs/namespace.c:3697 [inline]
 __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f89fa3f4aca
Code: 83 c4 08 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd5fb5ef8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00646975756f6e2c RCX: 00007f89fa3f4aca
RDX: 0000000020000100 RSI: 0000000020009640 RDI: 00007fffd5fb5f10
RBP: 00007fffd5fb5f10 R08: 00007fffd5fb5f50 R09: 000000000000970d
R10: 0000000000200800 R11: 0000000000000206 R12: 0000000000000004
R13: 0000555556c6b2c0 R14: 0000000000200800 R15: 00007fffd5fb5f50
 </TASK>

The fuzzed image contains an AGF with an obviously garbage
agf_refcount_level value of 32, and a dirty log with a buffer log item
for that AGF.  The ondisk AGF has a higher LSN than the recovered log
item.  xlog_recover_buf_commit_pass2 reads the buffer, compares the
LSNs, and decides to skip replay because the ondisk buffer appears to be
newer.

Unfortunately, the ondisk buffer is corrupt, but recovery just read the
buffer with no buffer ops specified:

	error = xfs_buf_read(mp->m_ddev_targp, buf_f->blf_blkno,
			buf_f->blf_len, buf_flags, &bp, NULL);

Skipping the buffer leaves its contents in memory unverified.  This sets
us up for a kernel crash because xfs_refcount_recover_cow_leftovers
reads the buffer (which is still around in XBF_DONE state, so no read
verification) and creates a refcountbt cursor of height 32.  This is
impossible so we run off the end of the cursor object and crash.

Fix this by invoking the verifier on all skipped buffers and aborting
log recovery if the ondisk buffer is corrupt.  It might be smarter to
force replay the log item atop the buffer and then see if it'll pass the
write verifier (like ext4 does) but for now let's go with the
conservative option where we stop immediately.

Link: https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994e
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-04-12 15:49:23 +10:00
Darrick J. Wong
c95356ca88 xfs: _{attr,data}_map_shared should take ILOCK_EXCL until iread_extents is completely done
While fuzzing the data fork extent count on a btree-format directory
with xfs/375, I observed the following (excerpted) splat:

XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_bmap.c, line: 1208
------------[ cut here ]------------
WARNING: CPU: 0 PID: 43192 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
Call Trace:
 <TASK>
 xfs_iread_extents+0x1af/0x210 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xchk_dir_walk+0xb8/0x190 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xchk_parent_count_parent_dentries+0x41/0x80 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xchk_parent_validate+0x199/0x2e0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xchk_parent+0xdf/0x130 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xfs_scrub_metadata+0x2b8/0x730 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xfs_scrubv_metadata+0x38b/0x4d0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xfs_ioc_scrubv_metadata+0x111/0x160 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xfs_file_ioctl+0x367/0xf50 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 __x64_sys_ioctl+0x82/0xa0
 do_syscall_64+0x2b/0x80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

The cause of this is a race condition in xfs_ilock_data_map_shared,
which performs an unlocked access to the data fork to guess which lock
mode it needs:

Thread 0                          Thread 1

xfs_need_iread_extents
<observe no iext tree>
xfs_ilock(..., ILOCK_EXCL)
xfs_iread_extents
<observe no iext tree>
<check ILOCK_EXCL>
<load bmbt extents into iext>
<notice iext size doesn't
 match nextents>
                                  xfs_need_iread_extents
                                  <observe iext tree>
                                  xfs_ilock(..., ILOCK_SHARED)
<tear down iext tree>
xfs_iunlock(..., ILOCK_EXCL)
                                  xfs_iread_extents
                                  <observe no iext tree>
                                  <check ILOCK_EXCL>
                                  *BOOM*

Fix this race by adding a flag to the xfs_ifork structure to indicate
that we have not yet read in the extent records and changing the
predicate to look at the flag state, not if_height.  The memory barrier
ensures that the flag will not be set until the very end of the
function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-04-12 15:49:10 +10:00
Dave Chinner
4b827b3f30 xfs: remove WARN when dquot cache insertion fails
It just creates unnecessary bot noise these days.

Reported-by: syzbot+6ae213503fb12e87934f@syzkaller.appspotmail.com
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-04-12 15:48:59 +10:00
Dave Chinner
aa88019851 xfs: don't consider future format versions valid
In commit fe08cc5044 we reworked the valid superblock version
checks. If it is a V5 filesystem, it is always valid, then we
checked if the version was less than V4 (reject) and then checked
feature fields in the V4 flags to determine if it was valid.

What we missed was that if the version is not V4 at this point,
we shoudl reject the fs. i.e. the check current treats V6+
filesystems as if it was a v4 filesystem. Fix this.

cc: stable@vger.kernel.org
Fixes: fe08cc5044 ("xfs: open code sb verifier feature checks")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-04-12 15:48:50 +10:00
Jakub Kicinski
adacf21f1c Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
iavf: fix racing in VLANs

Ahmed Zaki says:

This patchset mainly fixes a racing issue in the iavf where the number of
VLANs in the vlan_filter_list might be more than the PF limit. To fix that,
we get rid of the cvlans and svlans bitmaps and keep all the required info
in the list.

The second patch adds two new states that are needed so that we keep the
VLAN info while the interface goes DOWN:
    -- DISABLE    (notify PF, but keep the filter in the list)
    -- INACTIVE   (dev is DOWN, filter is removed from PF)

Finally, the current code keeps each state in a separate bit field, which
is error prone. The first patch refactors that by replacing all bits with
a single enum. The changes are minimal where each bit change is replaced
with the new state value.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  iavf: remove active_cvlans and active_svlans bitmaps
  iavf: refactor VLAN filter states
====================

Link: https://lore.kernel.org/r/20230407210730.3046149-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-11 21:37:53 -07:00
Jakub Kicinski
160c13175e Merge tag 'for-net-2023-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fix not setting Dath Path for broadcast sink
 - Fix not cleaning up on LE Connection failure
 - SCO: Fix possible circular locking dependency
 - L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
 - Fix race condition in hidp_session_thread
 - btbcm: Fix logic error in forming the board name
 - btbcm: Fix use after free in btsdio_remove

* tag 'for-net-2023-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
  Bluetooth: Set ISO Data Path on broadcast sink
  Bluetooth: hci_conn: Fix possible UAF
  Bluetooth: SCO: Fix possible circular locking dependency sco_sock_getsockopt
  Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm
  bluetooth: btbcm: Fix logic error in forming the board name.
  Bluetooth: btsdio: fix use after free bug in btsdio_remove due to race condition
  Bluetooth: Fix race condition in hidp_session_thread
  Bluetooth: Fix printing errors if LE Connection times out
  Bluetooth: hci_conn: Fix not cleaning up on LE Connection failure
====================

Link: https://lore.kernel.org/r/20230410172718.4067798-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-11 21:18:23 -07:00
Andrew Lunn
18bb56ab44 net: dsa: mv88e6xxx: Correct cmode to PHY_INTERFACE_
The switch can either take the MAC or the PHY role in an MII or RMII
link. There are distinct PHY_INTERFACE_ macros for these two roles.
Correct the mapping so that the `REV` version is used for the PHY
role.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230411023541.2372609-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-11 21:17:51 -07:00
Yevgeny Kliteynik
108ff8215b net/mlx5: DR, Add modify-header-pattern ICM pool
There is a new ICM area for that memory, so we need to handle it as we
did for the others ICM types.
The patch added that specific pool with its requirements and management.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:38 -07:00
Yevgeny Kliteynik
1e5cc7369b net/mlx5: DR, Prepare sending new WQE type
The send engine should be ready to handle more opcodes
in addition to RDMA_WRITE/RDMA_READ.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:38 -07:00
Yevgeny Kliteynik
977c4a3e7c net/mlx5: Add new WQE for updating flow table
Add new WQE type: FLOW_TBL_ACCESS, which will be used for
writing modify header arguments.
This type has specific control segment and special data segment.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:37 -07:00
Yevgeny Kliteynik
9fa7f1de3d net/mlx5: Add mlx5_ifc bits for modify header argument
Add enum value for modify-header argument object and mlx5_bits
for the related capabilities.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:37 -07:00
Yevgeny Kliteynik
cee6484edd net/mlx5: DR, Set counter ID on the last STE for STEv1 TX
In STEv1 counter action can be set either by filling counter ID on STE, in
which case it is executed before other actions on this STE, or as a single
action, in which case it is executed in accordance with the actions order.
FW steering on STEv1 devices implements counter as counter ID on STE, and
this counter is set on the last STE.
Fix SMFS to be consistent with this behaviour - move TX counter to the
last STE, this way the counter will include all actions of the previous STEs
that might have changed packet headers length, e.g. encap, vlan push, etc.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:37 -07:00
Parav Pandit
9df839a711 net/mlx5: Create a new profile for SFs
Create a new profile for SFs in order to disable the command cache.
Each function command cache consumes ~500KB of memory, when using a
large number of SFs this savings is notable on memory constarined
systems.

Use a new profile to provide for future differences between SFs and PFs.

The mr_cache not used for non-PF functions, so it is excluded from the
new profile.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:37 -07:00
Vlad Buslov
55f3e740f7 net/mlx5: Bridge, add tracepoints for multicast
Pass target struct net_device to mdb attach/detach handler in order to
expose the port name to the new tracepoints. Implemented following
tracepoints:

- Attach mdb to port.
- Detach mdb from port.

Usage example:

># cd /sys/kernel/debug/tracing
># echo mlx5:mlx5_esw_bridge_port_mdb_attach >> set_event
># cat trace
...
     kworker/0:0-19071   [000] ..... 259004.253848: mlx5_esw_bridge_port_mdb_attach: net_device=enp8s0f0_0 addr=33:33:ff:00:00:01 vid=0 num_ports=1 offloaded=1

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:37 -07:00
Vlad Buslov
70f0302b3f net/mlx5: Bridge, implement mdb offload
Implement support for add/del SWITCHDEV_OBJ_ID_PORT_MDB events. For mdb
destination addresses configure egress table rules to replicate to per-port
multicast tables of all ports that are member of the multicast group as
illustrated by 'MDB1' rule in the following diagram:

                                                                                                                            +--------+--+
                                                                                    +---------------------------------------> Port 1 |  |
                                                                                    |                                       +-^------+--+
                                                                                    |                                         |
                                                                                    |                                         |
                                       +-----------------------------------------+  |     +---------------------------+       |
                                       | EGRESS table                            |  |  +--> PORT 1 multicast table    |       |
+----------------------------------+   +-----------------------------------------+  |  |  +---------------------------+       |
| INGRESS table                    |   |                                         |  |  |  |                           |       |
+----------------------------------+   | dst_mac=P1,vlan=X -> pop vlan, goto P1  +--+  |  | FG0:                      |       |
|                                  |   | dst_mac=P1,vlan=Y -> pop vlan, goto P1  |     |  | src_port=dst_port -> drop |       |
| src_mac=M1,vlan=X -> goto egress +---> dst_mac=P2,vlan=X -> pop vlan, goto P2  +--+  |  | FG1:                      |       |
| ...                              |   | dst_mac=P2,vlan=Y -> goto P2            |  |  |  | VLAN X -> pop, goto port  |       |
|                                  |   | dst_mac=MDB1,vlan=Y -> goto mcast P1,P2 +-----+  | ...                       |       |
+----------------------------------+   |                                         |  |  |  | VLAN Y -> pop, goto port  +-------+
                                       +-----------------------------------------+  |  |  | FG3:                      |
                                                                                    |  |  | matchall -> goto port     |
                                                                                    |  |  |                           |
                                                                                    |  |  +---------------------------+
                                                                                    |  |
                                                                                    |  |
                                                                                    |  |                                    +--------+--+
                                                                                    +---------------------------------------> Port 2 |  |
                                                                                       |                                    +-^------+--+
                                                                                       |                                      |
                                                                                       |                                      |
                                                                                       |  +---------------------------+       |
                                                                                       +--> PORT 2 multicast table    |       |
                                                                                          +---------------------------+       |
                                                                                          |                           |       |
                                                                                          | FG0:                      |       |
                                                                                          | src_port=dst_port -> drop |       |
                                                                                          | FG1:                      |       |
                                                                                          | VLAN X -> pop, goto port  |       |
                                                                                          | ...                       |       |
                                                                                          |                           |       |
                                                                                          | FG3:                      |       |
                                                                                          | matchall -> goto port     +-------+
                                                                                          |                           |
                                                                                          +---------------------------+

MDB is managed by extending mlx5 bridge to store an entry in
mlx5_esw_bridge->mdb_list linked list (used to iterate over all offloaded
MDBs) and mlx5_esw_bridge->mdb_ht hash table (used to lookup existing MDB
by MAC+VLAN). Every MDB entry can be attached to arbitrary amount of bridge
ports that are stored in mlx5_esw_bridge_mdb_entry->ports xarray in order
to allow both efficient lookup of the port and also iteration over all
ports that the entry is attached to. Every time MDB is attached/detached
to/from a port, the hardware rule is recreated with list of destinations
corresponding to all attached ports. When the entry is detached from the
last port it is removed from mdb and destroyed which means that the ports
xarray also acts as implicit reference counting mechanism.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:37 -07:00
Vlad Buslov
b5e80625d1 net/mlx5: Bridge, support multicast VLAN pop
When VLAN with 'untagged' flag is created on port also provision the
per-port multicast table rule to pop the VLAN during packet replication.
This functionality must be in per-port table because some subset of ports
that are member of multicast group can require just a match on VLAN (trunk
mode) while other subset can be configured to remove the VLAN tag from
packets received on the ports (access mode).

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:36 -07:00
Vlad Buslov
272ecfc92f net/mlx5: Bridge, add per-port multicast replication tables
Multicast replication requires adding one more level of FDB_BR_OFFLOAD
priority flow tables. The new level is used for per-port multicast-specific
tables that have following flow groups structure (flow highest to lowest
priority):

- Flow group of size one that matches on source port metadata. This will
have a static single rule that prevent packets from being replicated to
their source port.

- Flow group of size one that matches all packets and forwards them to the
port that owns the table.

Initialize the table dynamically on all bridge ports when adding a port to
the bridge that has multicast enabled and on all existing bridge ports when
receiving multicast enable notification.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:36 -07:00
Vlad Buslov
18c2916cee net/mlx5: Bridge, snoop igmp/mld packets
Handle SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED attribute notification to
dynamically toggle bridge multicast offload. Set new
MLX5_ESW_BRIDGE_MCAST_FLAG bridge flag when multicast offload is enabled.
Put multicast-specific code into new bridge_mcast.c file.

When initializing bridge multicast pipeline create a static rule for
snooping on IGMP traffic and three rules for snooping on MLD traffic (for
query, report and done message types). Note that matching MLD traffic
requires having flexparser MLX5_FLEX_PROTO_ICMPV6 capability enabled.

By default Linux bridge is created with multicast enabled which can be
modified by 'mcast_snooping' argument:

$ ip link set name my_bridge type bridge mcast_snooping 0

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:36 -07:00
Vlad Buslov
b99c4ef29e net/mlx5: Bridge, extract code to lookup parent bridge of port
The pattern when function looks up a port by vport_num+vhca_id tuple in
order to just obtain its parent bridge is repeated multiple times in
bridge.c file. Further commits in this series use the pattern even more.
Extract the pattern to standalone mlx5_esw_bridge_from_port_lookup()
function to improve code readability.

This commits doesn't change functionality.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:36 -07:00
Vlad Buslov
6767c97d7a net/mlx5: Bridge, move additional data structures to priv header
Following patches in series will require accessing flow tables and groups
sizes, table levels and struct mlx5_esw_bridge from new the new source file
dedicated to multicast code. Expose these data in bridge_priv.h to reduce
clutter in following patches that will implement the actual functionality.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:36 -07:00
Vlad Buslov
9071b423c3 net/mlx5: Bridge, increase bridge tables sizes
Bridge ingress and egress tables got more flow groups recently for QinQ
support and will get more in following patches of this series. Increase the
sizes of the tables to allow offloading more flows in each mode.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:36 -07:00
Vlad Buslov
e5688f6fb9 net/mlx5: Add mlx5_ifc definitions for bridge multicast support
Add the required hardware definitions to mlx5_ifc: fdb_uplink_hairpin,
fdb_multi_path_any_table_limit_regc, fdb_multi_path_any_table.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:35 -07:00
Eric Biggers
0483913921 fsverity: reject FS_IOC_ENABLE_VERITY on mode 3 fds
Commit 56124d6c87 ("fsverity: support enabling with tree block size <
PAGE_SIZE") changed FS_IOC_ENABLE_VERITY to use __kernel_read() to read
the file's data, instead of direct pagecache accesses.

An unintended consequence of this is that the
'WARN_ON_ONCE(!(file->f_mode & FMODE_READ))' in __kernel_read() became
reachable by fuzz tests.  This happens if FS_IOC_ENABLE_VERITY is called
on a fd opened with access mode 3, which means "ioctl access only".

Arguably, FS_IOC_ENABLE_VERITY should work on ioctl-only fds.  But
ioctl-only fds are a weird Linux extension that is rarely used and that
few people even know about.  (The documentation for FS_IOC_ENABLE_VERITY
even specifically says it requires O_RDONLY.)  It's probably not
worthwhile to make the ioctl internally open a new fd just to handle
this case.  Thus, just reject the ioctl on such fds for now.

Fixes: 56124d6c87 ("fsverity: support enabling with tree block size < PAGE_SIZE")
Reported-by: syzbot+51177e4144d764827c45@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=2281afcbbfa8fdb92f9887479cc0e4180f1c6b28
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230406215106.235829-1-ebiggers@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-04-11 19:23:23 -07:00
Eric Biggers
39049b69ec fsverity: explicitly check for buffer overflow in build_merkle_tree()
The new Merkle tree construction algorithm is a bit fragile in that it
may overflow the 'root_hash' array if the tree actually generated does
not match the calculated tree parameters.

This should never happen unless there is a filesystem bug that allows
the file size to change despite deny_write_access(), or a bug in the
Merkle tree logic itself.  Regardless, it's fairly easy to check for
buffer overflow here, so let's do so.

This is a robustness improvement only; this case is not currently known
to be reachable.  I've added a Fixes tag anyway, since I recommend that
this be included in kernels that have the mentioned commit.

Fixes: 56124d6c87 ("fsverity: support enabling with tree block size < PAGE_SIZE")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230328041505.110162-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-04-11 19:23:23 -07:00
Eric Biggers
8eb8af4b3d fsverity: use WARN_ON_ONCE instead of WARN_ON
As per Linus's suggestion
(https://lore.kernel.org/r/CAHk-=whefxRGyNGzCzG6BVeM=5vnvgb-XhSeFJVxJyAxAF8XRA@mail.gmail.com),
use WARN_ON_ONCE instead of WARN_ON.  This barely adds any extra
overhead, and it makes it so that if any of these ever becomes reachable
(they shouldn't, but that's the point), the logs can't be flooded.

Link: https://lore.kernel.org/r/20230406181542.38894-1-ebiggers@kernel.org
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-04-11 19:23:15 -07:00
Darrick J. Wong
7ba83850ca xfs: deprecate the ascii-ci feature
This feature is a mess -- the hash function has been broken for the
entire 15 years of its existence if you create names with extended ascii
bytes; metadump name obfuscation has silently failed for just as long;
and the feature clashes horribly with the UTF8 encodings that most
systems use today.  There is exactly one fstest for this feature.

In other words, this feature is crap.  Let's deprecate it now so we can
remove it from the codebase in 2030.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-04-11 19:05:19 -07:00
Darrick J. Wong
6db09a8d03 xfs: test the ascii case-insensitive hash
Now that we've made kernel and userspace use the same tolower code for
computing directory index hashes, add that to the selftest code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-04-11 19:05:05 -07:00
Darrick J. Wong
a9248538fa xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation
Back in the old days, the "ascii-ci" feature was created to implement
case-insensitive directory entry lookups for latin1-encoded names and
remove the large overhead of Samba's case-insensitive lookup code.  UTF8
names were not allowed, but nobody explicitly wrote in the documentation
that this was only expected to work if the system used latin1 names.
The kernel tolower function was selected to prepare names for hashed
lookups.

There's a major discrepancy in the function that computes directory entry
hashes for filesystems that have ASCII case-insensitive lookups enabled.
The root of this is that the kernel and glibc's tolower implementations
have differing behavior for extended ASCII accented characters.  I wrote
a program to spit out characters for which the tolower() return value is
different from the input:

glibc tolower:
65:A 66:B 67:C 68:D 69:E 70:F 71:G 72:H 73:I 74:J 75:K 76:L 77:M 78:N
79:O 80:P 81:Q 82:R 83:S 84:T 85:U 86:V 87:W 88:X 89:Y 90:Z

kernel tolower:
65:A 66:B 67:C 68:D 69:E 70:F 71:G 72:H 73:I 74:J 75:K 76:L 77:M 78:N
79:O 80:P 81:Q 82:R 83:S 84:T 85:U 86:V 87:W 88:X 89:Y 90:Z 192:À 193:Á
194:Â 195:Ã 196:Ä 197:Å 198:Æ 199:Ç 200:È 201:É 202:Ê 203:Ë 204:Ì 205:Í
206:Î 207:Ï 208:Ð 209:Ñ 210:Ò 211:Ó 212:Ô 213:Õ 214:Ö 215:× 216:Ø 217:Ù
218:Ú 219:Û 220:Ü 221:Ý 222:Þ

Which means that the kernel and userspace do not agree on the hash value
for a directory filename that contains those higher values.  The hash
values are written into the leaf index block of directories that are
larger than two blocks in size, which means that xfs_repair will flag
these directories as having corrupted hash indexes and rewrite the index
with hash values that the kernel now will not recognize.

Because the ascii-ci feature is not frequently enabled and the kernel
touches filesystems far more frequently than xfs_repair does, fix this
by encoding the kernel's toupper predicate and tolower functions into
libxfs.  Give the new functions less provocative names to make it really
obvious that this is a pre-hash name preparation function, and nothing
else.  This change makes userspace's behavior consistent with the
kernel.

Found by auditing obfuscate_name in xfs_metadump as part of working on
parent pointers, wondering how it could possibly work correctly with ci
filesystems, writing a test tool to create a directory with
hash-colliding names, and watching xfs_repair flag it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-04-11 19:05:04 -07:00
Darrick J. Wong
4f5e304248 xfs: cross-reference rmap records with refcount btrees
Strengthen the rmap btree record checker a little more by comparing
OWN_REFCBT reverse mappings against the refcount btrees.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 19:00:39 -07:00