Commit Graph

1136096 Commits

Author SHA1 Message Date
James Morse
1d81d15db3 x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a counter. Currently the function returns
the MBM values in chunks directly from hardware. When reading a bandwidth
counter, mbm_overflow_count() must be used to correct for any possible
overflow.

mbm_overflow_count() is architecture specific, its behaviour should
be part of resctrl_arch_rmid_read().

Move the mbm_overflow_count() calls into resctrl_arch_rmid_read().
This allows the resctrl filesystems's prev_msr to be removed in
favour of the architecture private version.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-18-james.morse@arm.com
2022-09-23 14:22:20 +02:00
James Morse
8286618aca x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()
resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a hardware register. Currently the function
returns the MBM values in chunks directly from hardware.

To convert this to bytes, some correction and overflow calculations
are needed. These depend on the resource and domain structures.
Overflow detection requires the old chunks value. None of this
is available to resctrl_arch_rmid_read(). MPAM requires the
resource and domain structures to find the MMIO device that holds
the registers.

Pass the resource and domain to resctrl_arch_rmid_read(). This makes
rmid_dirty() too big. Instead merge it with its only caller, and the
name is kept as a local variable.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-17-james.morse@arm.com
2022-09-23 14:21:25 +02:00
Randy Dunlap
1e6989a335 ARM: sunplus: fix serial console kconfig and build problems
Fix kconfig dependency warnings and subsequent build errors:

WARNING: unmet direct dependencies detected for SERIAL_SUNPLUS
  Depends on [n]: TTY [=n] && HAS_IOMEM [=y] && (ARCH_SUNPLUS [=y] || COMPILE_TEST [=n])
  Selected by [y]:
  - SOC_SP7021 [=y] && ARCH_SUNPLUS [=y]

WARNING: unmet direct dependencies detected for SERIAL_SUNPLUS_CONSOLE
  Depends on [n]: TTY [=n] && HAS_IOMEM [=y] && SERIAL_SUNPLUS [=y]
  Selected by [y]:
  - SOC_SP7021 [=y] && ARCH_SUNPLUS [=y]

(samples, not all:)
drivers/tty/serial/sunplus-uart.c:342: undefined reference to `uart_get_baud_rate'
arm-linux-gnueabi-ld: drivers/tty/serial/sunplus-uart.c:379: undefined reference to `uart_update_timeout'
drivers/tty/serial/sunplus-uart.c:526: undefined reference to `uart_console_write'
arm-linux-gnueabi-ld: drivers/tty/serial/sunplus-uart.c:274: undefined reference to `tty_flip_buffer_push'
arm-linux-gnueabi-ld: drivers/tty/serial/sunplus-uart.o:(.data+0xa8): undefined reference to `uart_console_device'
drivers/tty/serial/sunplus-uart.c:720: undefined reference to `uart_register_driver'
arm-linux-gnueabi-ld: drivers/tty/serial/sunplus-uart.c:726: undefined reference to `uart_unregister_driver'
drivers/tty/serial/sunplus-uart.c:551: undefined reference to `uart_parse_options'
arm-linux-gnueabi-ld: drivers/tty/serial/sunplus-uart.c:553: undefined reference to `uart_set_options'

This is the same technique that is used 2 times in
arch/arm/mach-versatile/Kconfig.

Fixes: 0aa94eea8d ("ARM: sunplus: Add initial support for Sunplus SP7021 SoC")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Qin Jian <qinjian@cqplus1.com>
Cc: Necip Fazil Yildiran <fazilyildiran@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: patches@armlinux.org.uk
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-09-23 14:20:00 +02:00
James Morse
4d044c521a x86/resctrl: Abstract __rmid_read()
__rmid_read() selects the specified eventid and returns the counter
value from the MSR. The error handling is architecture specific, and
handled by the callers, rdtgroup_mondata_show() and __mon_event_count().

Error handling should be handled by architecture specific code, as
a different architecture may have different requirements. MPAM's
counters can report that they are 'not ready', requiring a second
read after a short delay. This should be hidden from resctrl.

Make __rmid_read() the architecture specific function for reading
a counter. Rename it resctrl_arch_rmid_read() and move the error
handling into it.

A read from a counter that hardware supports but resctrl does not
now returns -EINVAL instead of -EIO from the default case in
__mon_event_count(). It isn't possible for user-space to see this
change as resctrl doesn't expose counters it doesn't support.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-16-james.morse@arm.com
2022-09-23 14:17:20 +02:00
Shang XiaoJing
6eed756408 can: ctucanfd: Remove redundant dev_err call
devm_ioremap_resource() prints error message in itself. Remove the
dev_err call to avoid redundant error message.

Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Link: https://lore.kernel.org/all/20220923095835.14647-1-shangxiaojing@huawei.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-23 13:55:01 +02:00
Vasanth Sadhasivan
62f102c0d1 can: gs_usb: remove dma allocations
DMA allocated buffers are a precious resource. If there is no need for
DMA allocations, then it might be worth to use non-dma allocated
buffers.

After testing the gs_usb driver with and without DMA allocation, there
does not seem to be a significant change in latency or CPU utilization
either way. Therefore, DMA allocation is not necessary and removed.

Internal buffers used within urbs were managed and freed manually.
These buffers are no longer needed to be managed by the driver. The
URB_FREE_BUFFER flag, allows for the buffers in question to be
automatically freed.

Co-developed-by: Rhett Aultman <rhett.aultman@samsara.com>
Signed-off-by: Rhett Aultman <rhett.aultman@samsara.com>
Signed-off-by: Vasanth Sadhasivan <vasanth.sadhasivan@samsara.com>
Link: https://lore.kernel.org/all/20220920154724.861093-2-rhett.aultman@samsara.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-23 13:55:00 +02:00
Marc Kleine-Budde
906e0e6886 can: gs_usb: add switchable termination support
The candleLight community is working on switchable termination support
for the candleLight firmware. As the the Linux CAN framework supports
switchable termination add this feature to the gs_usb driver.

Devices supporting the feature should set the
GS_CAN_FEATURE_TERMINATION and implement the
GS_USB_BREQ_SET_TERMINATION and GS_USB_BREQ_GET_TERMINATION control
messages.

For now the driver assumes for activated termination the standard
termination value of 120Ω.

Link: https://lore.kernel.org/all/20220923074114.662045-1-mkl@pengutronix.de
Link: https://github.com/candle-usb/candleLight_fw/issues/92
Link: https://github.com/candle-usb/candleLight_fw/pull/109
Link: https://github.com/candle-usb/candleLight_fw/pull/108
Cc: Daniel Trevitz <daniel.trevitz@wika.com>
Cc: Ryan Edwards <ryan.edwards@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-23 13:55:00 +02:00
Marc Kleine-Budde
68822f4e74 can: gs_usb: gs_make_candev(): clean up error handling
Introduce a label to free the allocated candev in case of an error and
make use of if. Fix a memory leak if the extended bit timing cannot be
read. Extend the error messages to print the number of the failing
channel and the symbolic error name.

Link: https://lore.kernel.org/all/20220921193902.575416-4-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-23 13:55:00 +02:00
Marc Kleine-Budde
3814ed2754 can: gs_usb: convert from usb_control_msg() to usb_control_msg_{send,recv}()
Convert the driver to use usb_control_msg_{send,recv}() instead of
usb_control_msg(). These functions allow the data to be placed on the
stack. This makes the driver a lot easier as we don't have to deal
with dynamically allocated memory.

Link: https://lore.kernel.org/all/20220921193902.575416-3-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-23 13:55:00 +02:00
Marc Kleine-Budde
0024675186 can: gs_usb: gs_cmd_reset(): rename variable holding struct gs_can pointer to dev
Most of the driver uses the variable "dev" to point to the struct
gs_can. Use the same name in gs_cmd_reset(), too. Rename gsdev to dev.

Fixes: d08e973a77 ("can: gs_usb: Added support for the GS_USB CAN devices")
Link: https://lore.kernel.org/all/20220921193902.575416-2-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-23 13:55:00 +02:00
Marc Kleine-Budde
103108cb96 can: gs_usb: gs_can_open(): initialize time counter before starting device
On busy networks the CAN controller might receive CAN frames directly
after starting it but before the timecounter is setup. This will lead
to NULL pointer deref while converting the converting the CAN frame's
timestamp with the timecounter.

Close the race window by setting up the timecounter before starting
the CAN controller.

Fixes: 45dfa45f52 ("can: gs_usb: add RX and TX hardware timestamp support")
Link: https://lore.kernel.org/all/20220921081329.385509-1-mkl@pengutronix.de
Cc: John Whittington <git@jbrengineering.co.uk
Tested-by: John Whittington <git@jbrengineering.co.uk>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-23 13:55:00 +02:00
Marc Kleine-Budde
29a8c9ec90 can: gs_usb: add missing lock to protect struct timecounter::cycle_last
The struct timecounter::cycle_last is a 64 bit variable, read by
timecounter_cyc2time(), and written by timecounter_read(). On 32 bit
architectures this is not atomic.

Add a spinlock to protect access to struct timecounter::cycle_last. In
the gs_usb_timestamp_read() callback the lock is dropped to execute a
sleeping synchronous USB transfer. This is safe, as the variable we
want to protect is accessed during this call.

Fixes: 45dfa45f52 ("can: gs_usb: add RX and TX hardware timestamp support")
Link: https://lore.kernel.org/all/20220920100416.959226-3-mkl@pengutronix.de
Cc: John Whittington <git@jbrengineering.co.uk>
Tested-by: John Whittington <git@jbrengineering.co.uk>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-23 13:55:00 +02:00
Marc Kleine-Budde
593b5e2f5a can: gs_usb: gs_usb_get_timestamp(): fix endpoint parameter for usb_control_msg_recv()
The 2nd argument of usb_control_msg_recv() is the "endpoint",
usb_control_msg_recv() will internally convert the endpoint into a
pipe with usb_rcvctrlpipe().

In gs_usb_get_timestamp() not the endpoint "0" is passed, but the
pipe. This worked by accident as endpoint is a __u8 and the lowest 8
bits of the pipe are 0. Fix this copy/paste error by using the correct
endpoint of "0".

Fixes: 45dfa45f52 ("can: gs_usb: add RX and TX hardware timestamp support")
Link: https://lore.kernel.org/all/20220920100416.959226-2-mkl@pengutronix.de
Cc: John Whittington <git@jbrengineering.co.uk>
Tested-by: John Whittington <git@jbrengineering.co.uk>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-23 13:55:00 +02:00
Marc Kleine-Budde
86c223ffc8 Merge patch series "can: bcm: can: bcm: random optimizations"
Ziyang Xuan <william.xuanziyang@huawei.com> says:

Do some small optimization for can_bcm.

v1: https://lore.kernel.org/all/cover.1662606045.git.william.xuanziyang@huawei.com

Link: https://lore.kernel.org/all/cover.1663206163.git.william.xuanziyang@huawei.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-23 13:53:58 +02:00
Ziyang Xuan
3fd7bfd28c can: bcm: check the result of can_send() in bcm_can_tx()
If can_send() fail, it should not update frames_abs counter
in bcm_can_tx(). Add the result check for can_send() in bcm_can_tx().

Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Suggested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Link: https://lore.kernel.org/all/9851878e74d6d37aee2f1ee76d68361a46f89458.1663206163.git.william.xuanziyang@huawei.com
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-23 13:53:10 +02:00
Ziyang Xuan
edd1a7e42f can: bcm: registration process optimization in bcm_module_init()
Now, register_netdevice_notifier() and register_pernet_subsys() are both
after can_proto_register(). It can create CAN_BCM socket and process socket
once can_proto_register() successfully, so it is possible missing notifier
event or proc node creation because notifier or bcm proc directory is not
registered or created yet. Although this is a low probability scenario, it
is not impossible.

Move register_pernet_subsys() and register_netdevice_notifier() to the
front of can_proto_register(). In addition, register_pernet_subsys() and
register_netdevice_notifier() may fail, check their results are necessary.

Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Link: https://lore.kernel.org/all/823cff0ebec33fa9389eeaf8b8ded3217c32cb38.1663206163.git.william.xuanziyang@huawei.com
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-23 13:53:02 +02:00
Kees Cook
712f210a45 x86/microcode/AMD: Track patch allocation size explicitly
In preparation for reducing the use of ksize(), record the actual
allocation size for later memcpy(). This avoids copying extra
(uninitialized!) bytes into the patch buffer when the requested
allocation size isn't exactly the size of a kmalloc bucket.
Additionally, fix potential future issues where runtime bounds checking
will notice that the buffer was allocated to a smaller value than
returned by ksize().

Fixes: 757885e94a ("x86, microcode, amd: Early microcode patch loading support for AMD")
Suggested-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/lkml/CA+DvKQ+bp7Y7gmaVhacjv9uF6Ar-o4tet872h4Q8RPYPJjcJQA@mail.gmail.com/
2022-09-23 13:46:26 +02:00
Radhey Shyam Pandey
f22bd29ba1 net: macb: Fix ZynqMP SGMII non-wakeup source resume failure
When GEM is in SGMII mode and disabled as a wakeup source, the power
management controller can power down the entire full power domain(FPD)
if none of the FPD devices are in use.

Incase of FPD off, there are below ethernet link up issues on non-wakeup
suspend/resume. To fix it add phy_exit() in suspend and phy_init() in the
resume path which reinitializes PS GTR SGMII lanes.

$ echo +20 > /sys/class/rtc/rtc0/wakealarm
$ echo mem > /sys/power/state

After resume:

$ ifconfig eth0 up
xilinx-psgtr fd400000.phy: lane 0 (type 10, protocol 5): PLL lock timeout
phy phy-fd400000.phy.0: phy poweron failed --> -110
xilinx-psgtr fd400000.phy: lane 0 (type 10, protocol 5): PLL lock timeout
SIOCSIFFLAGS: Connection timed out
phy phy-fd400000.phy.0: phy poweron failed --> -110

Fixes: 8b73fa3ae0 ("net: macb: Added ZynqMP-specific initialization")
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 12:32:49 +01:00
David S. Miller
3aba35bb20 Merge branch 'lan966x-mqprio-taprio'
Horatiu Vultur says:

====================
net: lan966x: Add mqprio and taprio support

Add support for offloading QoS features with tc command to lan966x. The
offloaded QoS features are mqprio and taprio.

v1->v2:
- fix compilation warning
- rename lan966x_taprio_enable/disable to lan966x_taprio_add/del
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 12:31:27 +01:00
Horatiu Vultur
e462b27173 net: lan966x: Add offload support for taprio
Lan966x switch supports time-based egress shaping in hardware
according to IEEE 802.1Qbv. Add support for TAS configuration on
egress port of lan966x switch.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 12:31:27 +01:00
Horatiu Vultur
2a252a0bd2 net: lan966x: Add registers used by taprio
Add registers that are used by taprio to configure the HW.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 12:31:27 +01:00
Horatiu Vultur
3c83431f07 net: lan966x: Add offload support for mqprio
Implement mqprio qdisc support using tc command.
The HW supports 8 priority queues from highest (7) to lowest (0).

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 12:31:27 +01:00
Horatiu Vultur
644ffce5f1 net: lan966x: Add define for number of priority queues NUM_PRIO_QUEUES
Add a define for the number of priority queues on lan966x. Because there
will be more checks for this, so instead of using hardcoded value all
over the place add a define for this.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 12:31:26 +01:00
Dan Carpenter
37a7844576 virtio-gpu: fix shift wrapping bug in virtio_gpu_fence_event_create()
The ->ring_idx_mask variable is a u64 so static checkers, Smatch in
this case, complain if the BIT() is not also a u64.

drivers/gpu/drm/virtio/virtgpu_ioctl.c:50 virtio_gpu_fence_event_create()
warn: should '(1 << ring_idx)' be a 64 bit type?

Fixes: cd7f5ca335 ("drm/virtio: implement context init: add virtio_gpu_fence_event")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/YygN7jY0GdUSQSy0@kili
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-09-23 13:16:37 +02:00
Zongmin Zhou
461a4df2a8 drm/qxl: drop set_prod_notify parameter from qxl_ring_create
Since qxl_io_reset(qdev) will be called immediately
after qxl_ring_create() been called,
and parameter like notify_on_prod will be set to default value.
So the call to qxl_ring_init_hdr() before becomes meaningless.

Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
Suggested-by: Ming Xie<xieming@kylinos.cn>
Link: http://patchwork.freedesktop.org/patch/msgid/20220920065023.1633303-1-zhouzongmin@kylinos.cn
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-09-23 13:16:37 +02:00
Minghao Chi
f948ac2313 xen-netback: use kstrdup instead of open-coding it
use kstrdup instead of open-coding it.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 12:03:35 +01:00
Patrick Rohr
195624d9c2 tun: support not enabling carrier in TUNSETIFF
This change adds support for not enabling carrier during TUNSETIFF
interface creation by specifying the IFF_NO_CARRIER flag.

Our tests make heavy use of tun interfaces. In some scenarios, the test
process creates the interface but another process brings it up after the
interface is discovered via netlink notification. In that case, it is
not possible to create a tun/tap interface with carrier off without it
racing against the bring up. Immediately setting carrier off via
TUNSETCARRIER is still too late.

Signed-off-by: Patrick Rohr <prohr@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 12:02:03 +01:00
Daniel Golle
e19de30d20 net: dsa: mt7530: add support for in-band link status
Read link status from SGMII PCS for in-band managed 2500Base-X and
1000Base-X connection on a MAC port of the MT7531. This is needed to
get the SFP cage working which is connected to SGMII interface of
port 5 of the MT7531 switch IC on the Bananapi BPi-R3 board.
While at it also handle an_complete for both the autoneg and the
non-autoneg codepath.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 11:58:37 +01:00
David S. Miller
793cc3c78e Merge branch 'phy-rate-matching'
Sean Anderson says:

====================
net: phy: Add support for rate matching

This adds support for phy rate matching: when a phy adapts between
differing phy interface and link speeds. It was originally submitted as
part of [1], which is considered "v1" of this series.

Several past discussions [2-4] around adding rate adaptation provide
some context.

Although in earlier versions of this series, userspace could disable
rate matching, now it is only possible to determine the current rate
adaptation type. Disabling or otherwise configuring rate adaptation has
been left for future work. However, because currently only
RATE_MATCH_PAUSE is implemented, it is possible to disable rate
adaptation by modifying the advertisement appropriately.

[1] https://lore.kernel.org/netdev/20220715215954.1449214-1-sean.anderson@seco.com/T/#t
[2] https://lore.kernel.org/netdev/1579701573-6609-1-git-send-email-madalin.bucur@oss.nxp.com/
[3] https://lore.kernel.org/netdev/1580137671-22081-1-git-send-email-madalin.bucur@oss.nxp.com/
[4] https://lore.kernel.org/netdev/20200116181933.32765-1-olteanv@gmail.com/

Changes in v6:
- Don't announce that we've enabled pause frames for rate adaptation
- Merry Christmas
- Rename rate adaptation to rate matching
- Reword documentation, (hopefully) taking into account feedback

Changes in v5:
- Break off patch "net: phy: Add 1000BASE-KX interface mode" for
  separate submission.
- Document phy_rate_adaptation_to_str
- Drop patch "Add some helpers for working with mac caps"; it has been
  incorperated into the autonegotiation patch.
- Move phylink_cap_from_speed_duplex to this commit
- Rebase onto net-next/master
- Remove unnecessary comma

Changes in v4:
- Export phy_rate_adaptation_to_str
- Remove phylink_interface_max_speed, which was accidentally added
- Split off the LS1046ARDB 1G fix

Changes in v3:
- Add phylink_cap_from_speed_duplex to look up the mac capability
  corresponding to the interface's speed.
- Document MAC_(A)SYM_PAUSE
- Include RATE_ADAPT_CRS; it's a few lines and it doesn't hurt.
- Modify link settings directly in phylink_link_up, instead of doing
  things more indirectly via link_*.
- Move unused defines to next commit (where they will be used)
- Remove "Support differing link/interface speed/duplex". It has been
  rendered unnecessary due to simplification of the rate adaptation
  patches. Thanks Russell!
- Rewrite cover letter to better reflect the opinions of the developers
  involved

Changes in v2:
- Add (read-only) ethtool support for rate adaptation
- Add comments clarifying the register defines
- Add locking to phy_get_rate_adaptation
- Always use the rate adaptation setting to determine the interface
  speed/duplex (instead of sometimes using the interface mode).
- Determine the interface speed and max mac speed directly instead of
  guessing based on the caps.
- Move part of commit message to cover letter, as it gives a good
  overview of the whole series, and allows this patch to focus more on
  the specifics.
- Reorder variables in aqr107_read_rate
- Use int/defines instead of enum to allow for use in ioctls/netlink
- Use the phy's rate adaptation setting to determine whether to use its
  link speed/duplex or the MAC's speed/duplex with MLO_AN_INBAND.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 11:56:36 +01:00
Sean Anderson
3c42563b30 net: phy: aquantia: Add support for rate matching
This adds support for rate matching for phys similar to the AQR107. We
assume that all phys using aqr107_read_status support rate matching.
However, it could be possible to determine support based on the firmware
revision if there are phys discovered which do not support rate
matching.  However, as rate matching is advertised in the datasheets for
these phys, I suspect it is supported most boards.

Despite the name, the "config" registers are updated with the current
rate matching method (if any). Because they appear to be updated
automatically, I don't know if these registers can be used to disable
rate matching.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 11:56:18 +01:00
Anup Patel
cfadbb9df8 cpuidle: riscv-sbi: Fix CPU_PM_CPU_IDLE_ENTER_xyz() macro usage
Currently, we are using CPU_PM_CPU_IDLE_ENTER_PARAM() for all SBI HSM
suspend types so retentive suspend types are also treated non-retentive
and kernel will do redundant additional work for these states.

The BIT[31] of SBI HSM suspend types allows us to differentiate between
retentive and non-retentive suspend types so we should use this BIT
to call appropriate CPU_PM_CPU_IDLE_ENTER_xyz() macro.

Fixes: 6abf32f1d9 ("cpuidle: Add RISC-V SBI CPU idle driver")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20220718084553.2056169-1-apatel@ventanamicro.com/
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-09-23 03:56:00 -07:00
Sean Anderson
7de26bf144 net: phy: aquantia: Add some additional phy interfaces
These are documented in the AQR115 register reference. I haven't tested
them, but perhaps they'll be useful to someone.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 11:55:36 +01:00
Sean Anderson
b7e9294885 net: phylink: Adjust advertisement based on rate matching
This adds support for adjusting the advertisement for pause-based rate
matching. This may result in a lossy link, since the final link settings
are not adjusted. Asymmetric pause support is necessary. It would be
possible for a MAC supporting only symmetric pause to use pause-based rate
adaptation, but only if pause reception was enabled as well.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 11:55:36 +01:00
Sean Anderson
ae0e4bb2a0 net: phylink: Adjust link settings based on rate matching
If the phy is configured to use pause-based rate matching, ensure that
the link is full duplex with pause frame reception enabled. As
suggested, if pause-based rate matching is enabled by the phy, then
pause reception is unconditionally enabled.

The interface duplex is determined based on the rate matching type.
When rate matching is enabled, so is the speed. We assume the maximum
interface speed is used. This is only relevant for MLO_AN_PHY. For
MLO_AN_INBAND, the MAC/PCS's view of the interface speed will be used.

Although there are no RATE_ADAPT_CRS phys in-tree, it has been added for
comparison (and the implementation is quite simple).

Co-developed-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 11:55:35 +01:00
Sean Anderson
0c3e10cb44 net: phy: Add support for rate matching
This adds support for rate matching (also known as rate adaptation) to
the phy subsystem. The general idea is that the phy interface runs at
one speed, and the MAC throttles the rate at which it sends packets to
the link speed. There's a good overview of several techniques for
achieving this at [1]. This patch adds support for three: pause-frame
based (such as in Aquantia phys), CRS-based (such as in 10PASS-TS and
2BASE-TL), and open-loop-based (such as in 10GBASE-W).

This patch makes a few assumptions and a few non assumptions about the
types of rate matching available. First, it assumes that different phys
may use different forms of rate matching. Second, it assumes that phys
can use rate matching for any of their supported link speeds (e.g. if a
phy supports 10BASE-T and XGMII, then it can adapt XGMII to 10BASE-T).
Third, it does not assume that all interface modes will use the same
form of rate matching. Fourth, it does not assume that all phy devices
will support rate matching (even if some do). Relaxing or strengthening
these (non-)assumptions could result in a different API. For example, if
all interface modes were assumed to use the same form of rate matching,
then a bitmask of interface modes supportting rate matching would
suffice.

For some better visibility into the process, the current rate matching
mode is exposed as part of the ethtool ksettings. For the moment, only
read access is supported. I'm not sure what userspace might want to
configure yet (disable it altogether, disable just one mode, specify the
mode to use, etc.). For the moment, since only pause-based rate
adaptation support is added in the next few commits, rate matching can
be disabled altogether by adjusting the advertisement.

802.3 calls this feature "rate adaptation" in clause 49 (10GBASE-R) and
"rate matching" in clause 61 (10PASS-TL and 2BASE-TS). Aquantia also calls
this feature "rate adaptation". I chose "rate matching" because it is
shorter, and because Russell doesn't think "adaptation" is correct in this
context.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 11:55:35 +01:00
Sean Anderson
3e6eab8f3e net: phylink: Generate caps and convert to linkmodes separately
If we call phylink_caps_to_linkmodes directly from
phylink_get_linkmodes, it is difficult to re-use this functionality in
MAC drivers. This is because MAC drivers must then work with an ethtool
linkmode bitmap, instead of with mac capabilities. Instead, let the
caller of phylink_get_linkmodes do the conversion. To reflect this
change, rename the function to phylink_get_capabilities.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 11:55:35 +01:00
Sean Anderson
606116529a net: phylink: Export phylink_caps_to_linkmodes
This function is convenient for MAC drivers. They can use it to add or
remove particular link modes based on capabilities (such as if half
duplex is not supported for a particular interface mode).

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 11:55:35 +01:00
Sean Anderson
72bc36956f net: phylink: Document MAC_(A)SYM_PAUSE
This documents the possible MLO_PAUSE_* settings which can result from
different combinations of MAC_(A)SYM_PAUSE. Special note is paid to
settings which can result from user configuration (MLO_PAUSE_AN). The
autonegotiation results are more-or-less a direct consequence of IEEE
802.3 Table 28B-2.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 11:55:35 +01:00
James Morse
fea62d370d x86/resctrl: Allow per-rmid arch private storage to be reset
To abstract the rmid counters into a helper that returns the number
of bytes counted, architecture specific per-rmid state is needed.

It needs to be possible to reset this hidden state, as the values
may outlive the life of an rmid, or the mount time of the filesystem.

mon_event_read() is called with first = true when an rmid is first
allocated in mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid()
and call it from __mon_event_count()'s rr->first check.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-15-james.morse@arm.com
2022-09-23 12:49:04 +02:00
wangjianli
635b241d93 scsi: storvsc: remove an extraneous "to" in a comment
Signed-off-by: wangjianli <wangjianli@cdjrlc.com>
Link: https://lore.kernel.org/r/20220908130754.34999-1-wangjianli@cdjrlc.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-23 10:41:40 +00:00
Stanislav Kinsburskiy
f7ac541e18 Drivers: hv: vmbus: Don't wait for the ACPI device upon initialization
Waiting to 5 seconds in case of missing VMBus ACPI device is redundant as the
device is either present already or won't be available at all.

This patch enforces synchronous probing to make sure the bus traversal,
happening upon driver registering will either find the device (if present) or
not spend any additional time if device is absent.

Signed-off-by: Stanislav Kinsburskiy <stanislav.kinsburskiy@gmail.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Stephen Hemminger <sthemmin@microsoft.com>
CC: Wei Liu <wei.liu@kernel.org>
CC: Dexuan Cui <decui@microsoft.com>
CC: linux-hyperv@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/166378554568.581670.1124852716698789244.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-23 10:40:09 +00:00
Feng Tang
6edf2576a6 mm/slub: enable debugging memory wasting of kmalloc
kmalloc's API family is critical for mm, with one nature that it will
round up the request size to a fixed one (mostly power of 2). Say
when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes
could be allocated, so in worst case, there is around 50% memory
space waste.

The wastage is not a big issue for requests that get allocated/freed
quickly, but may cause problems with objects that have longer life
time.

We've met a kernel boot OOM panic (v5.10), and from the dumped slab
info:

    [   26.062145] kmalloc-2k            814056KB     814056KB

From debug we found there are huge number of 'struct iova_magazine',
whose size is 1032 bytes (1024 + 8), so each allocation will waste
1016 bytes. Though the issue was solved by giving the right (bigger)
size of RAM, it is still nice to optimize the size (either use a
kmalloc friendly size or create a dedicated slab for it).

And from lkml archive, there was another crash kernel OOM case [1]
back in 2019, which seems to be related with the similar slab waste
situation, as the log is similar:

    [    4.332648] iommu: Adding device 0000:20:02.0 to group 16
    [    4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0
    ...
    [    4.857565] kmalloc-2048           59164KB      59164KB

The crash kernel only has 256M memory, and 59M is pretty big here.
(Note: the related code has been changed and optimised in recent
kernel [2], these logs are just picked to demo the problem, also
a patch changing its size to 1024 bytes has been merged)

So add an way to track each kmalloc's memory waste info, and
leverage the existing SLUB debug framework (specifically
SLUB_STORE_USER) to show its call stack of original allocation,
so that user can evaluate the waste situation, identify some hot
spots and optimize accordingly, for a better utilization of memory.

The waste info is integrated into existing interface:
'/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of
'kmalloc-4k' after boot is:

 126 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] waste=233856/1856 age=280763/281414/282065 pid=1330 cpus=32 nodes=1
     __kmem_cache_alloc_node+0x11f/0x4e0
     __kmalloc_node+0x4e/0x140
     ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe]
     ixgbe_init_interrupt_scheme+0x2ae/0xc90 [ixgbe]
     ixgbe_probe+0x165f/0x1d20 [ixgbe]
     local_pci_probe+0x78/0xc0
     work_for_cpu_fn+0x26/0x40
     ...

which means in 'kmalloc-4k' slab, there are 126 requests of
2240 bytes which got a 4KB space (wasting 1856 bytes each
and 233856 bytes in total), from ixgbe_alloc_q_vector().

And when system starts some real workload like multiple docker
instances, there could are more severe waste.

[1]. https://lkml.org/lkml/2019/8/12/266
[2]. https://lore.kernel.org/lkml/2920df89-9975-5785-f79b-257d3052dfaf@huawei.com/

[Thanks Hyeonggon for pointing out several bugs about sorting/format]
[Thanks Vlastimil for suggesting way to reduce memory usage of
 orig_size and keep it only for kmalloc objects]

Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-23 12:32:45 +02:00
Jason A. Donenfeld
e78a802a7b random: clamp credited irq bits to maximum mixed
Since the most that's mixed into the pool is sizeof(long)*2, don't
credit more than that many bytes of entropy.

Fixes: e3e33fc2ea ("random: do not use input pool from hard IRQs")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-23 12:31:05 +02:00
Easwar Hariharan
a99aaf2e3b Drivers: hv: vmbus: Use PCI_VENDOR_ID_MICROSOFT for better discoverability
pci_ids.h already defines PCI_VENDOR_ID_MICROSOFT, and is included via
linux/pci.h. Use the define instead of the magic number.

Signed-off-by: Easwar Hariharan <easwar.hariharan@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1663625084-2518-2-git-send-email-eahariha@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-23 10:30:37 +00:00
Jiapeng Chong
e1a863cddb Drivers: hv: vmbus: Fix kernel-doc
drivers/hv/vmbus_drv.c:1587: warning: expecting prototype for __vmbus_child_driver_register(). Prototype was for __vmbus_driver_register() instead.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2210
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220919063815.1881-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-23 10:29:08 +00:00
Johan Jonker
f878a26a2a dt-bindings: clock: convert rockchip,rk3128-cru.txt to YAML
Convert rockchip,rk3128-cru.txt to YAML.

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/4e69a06d-7b53-ab48-1e50-2b29ff3a54e6@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2022-09-23 12:28:49 +02:00
Jason A. Donenfeld
d775335e35 random: throttle hwrng writes if no entropy is credited
If a hwrng source does not provide an entropy estimate, it currently
does not contribute at all to the CRNG. In order to help fix this, in
case add_hwgenerator_randomness() is called with the entropy parameter
set to zero, go to sleep until one reseed interval has passed.

While the hwrng thread currently only runs under conditions where this
is non-zero, this change is not harmful and prepares for future updates
to the hwrng core.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-23 12:27:57 +02:00
Dominik Brodowski
745558f958 random: use hwgenerator randomness more frequently at early boot
Mix in randomness from hw-rng sources more frequently during early
boot, approximately once for every rng reseed.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-23 12:27:57 +02:00
Jason A. Donenfeld
cd4f24ae94 random: restore O_NONBLOCK support
Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would
return -EAGAIN if there was no entropy. When the pools were unified in
5.6, this was lost. The post 5.6 behavior of blocking until the pool is
initialized, and ignoring O_NONBLOCK in the process, went unnoticed,
with no reports about the regression received for two and a half years.
However, eventually this indeed did break somebody's userspace.

So we restore the old behavior, by returning -EAGAIN if the pool is not
initialized. Unlike the old /dev/random, this can only occur during
early boot, after which it never blocks again.

In order to make this O_NONBLOCK behavior consistent with other
expectations, also respect users reading with preadv2(RWF_NOWAIT) and
similar.

Fixes: 30c08efec8 ("random: make /dev/random be almost like /dev/urandom")
Reported-by: Guozihua <guozihua@huawei.com>
Reported-by: Zhongguohua <zhongguohua1@huawei.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-23 12:27:57 +02:00
Chris Morgan
e18d9b0930 arm64: dts: rockchip: Add DSI and DSI-DPHY nodes to rk356x
This adds the DSI controller nodes and DSI-DPHY controller nodes to the
rk356x device tree.

Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Acked-by: Michael Riesch <michael.riesch@wolfvision.net>
Link: https://lore.kernel.org/r/20220919164616.12492-4-macroalpha82@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2022-09-23 12:27:18 +02:00