Commit Graph

1015639 Commits

Author SHA1 Message Date
Jason Gunthorpe
fdcebbc2ac Merge tag 'v5.13-rc7' into rdma.git for-next
Linux 5.13-rc7

Needed for dependencies in following patches. Merge conflict in rxe_cmop.c
resolved by compining both patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22 14:43:51 -03:00
Kees Cook
6d33cabf2b RDMA/core: Use flexible array for mad data
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally reading across neighboring array fields.

Without a flexible array, this looks like an attempt to perform a memcpy()
read beyond the end of the packet->mad.data array:

drivers/infiniband/core/user_mad.c:
	memcpy(packet->msg->mad, packet->mad.data, IB_MGMT_MAD_HDR);

Switch from [0] to [] to use the appropriately handled type for trailing
bytes.

Link: https://lore.kernel.org/r/20210616202615.1247242-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22 14:05:34 -03:00
Xiao Yang
20ec0a6d60 RDMA/rxe: Don't overwrite errno from ib_umem_get()
rxe_mr_init_user() always returns the fixed -EINVAL when ib_umem_get()
fails so it's hard for user to know which actual error happens in
ib_umem_get(). For example, ib_umem_get() will return -EOPNOTSUPP when
trying to pin pages on a DAX file.

Return actual error as mlx4/mlx5 does.

Link: https://lore.kernel.org/r/20210621071456.4259-1-ice_yangxiao@163.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 21:05:02 -03:00
Kees Cook
4bf5cc6319 IB/mlx4: Avoid field-overflowing memcpy()
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring array fields.

Use the ether_addr_copy() helper instead, as already done for smac.

Link: https://lore.kernel.org/r/20210616203744.1248551-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 21:04:54 -03:00
Jack Wang
7404bddeb4 rnbd/rtrs-clt: Query and use max_segments from rtrs-clt.
With fast memory registration on write request, rnbd-clt
can do bigger IO without split. rnbd-clt now can query
rtrs-clt to get the max_segments, instead of using
BMAX_SEGMENTS.

BMAX_SEGMENTS is not longer needed, so remove it.

Link: https://lore.kernel.org/r/20210621055340.11789-6-jinpu.wang@ionos.com
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 21:02:21 -03:00
Jack Wang
6fc4559650 RDMA/rtrs-clt: Raise MAX_SEGMENTS
As we can do fast memory registration on write, we can increase
the max_segments, default to 512K.

Link: https://lore.kernel.org/r/20210621055340.11789-5-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 21:02:21 -03:00
Jack Wang
df1885a755 RDMA/rtrs_clt: Alloc less memory with write path fast memory registration
With write path fast memory registration, we need less memory for
each request.

With fast memory registration, we can reduce max_send_sge to save
memory usage.

Also convert the kmalloc_array to kcalloc.

Link: https://lore.kernel.org/r/20210621055340.11789-4-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 21:02:21 -03:00
Jack Wang
2ece9ec62e RDMA/rtrs-clt: Write path fast memory registration
With fast memory registration in write path, we can reduce
the memory consumption by using less max_send_sge, support IO bigger
than 116 KB (29 segments * 4 KB) without splitting, and it also
make the IO path more symmetric.

To avoid some times MR reg failed, waiting for the invalidation to finish
before the new mr reg. Introduce a refcount, only finish the request
when both local invalidation and io reply are there.

Link: https://lore.kernel.org/r/20210621055340.11789-3-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Dima Stepanov <dmitrii.stepanov@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 21:02:21 -03:00
Jack Wang
630e438f04 RDMA/rtrs: Introduce head/tail wr
Introduce tail wr, we can send as the last wr, we want to send the local
invalidate wr after rdma wr in later patch.

While at it, also fix coding style issue.

Link: https://lore.kernel.org/r/20210621055340.11789-2-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 21:02:20 -03:00
Devesh Sharma
879740517d RDMA/bnxt_re: Update ABI to pass wqe-mode to user space
Changing ucontext ABI response structure to pass wqe_mode to user library.
A flag in comp_mask has been set to indicate presence of wqe_mode.

Moved wqe-mode ABI to uapi/rdma/bnxt_re-abi.h

Link: https://lore.kernel.org/r/20210616202817.1185276-1-devesh.sharma@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 20:58:52 -03:00
Anand Khoje
84dcd8c7ea IB/core: Shuffle locks in ib_port_data to save memory
pahole shows two 4-byte holes in struct ib_port_data after pkey_list_lock
and netdev_lock respectively.

Shuffling the netdev_lock to be after pkey_list_lock, this shaves off
eight bytes from the struct.

Link: https://lore.kernel.org/r/20210616154509.1047-3-anand.a.khoje@oracle.com
Suggested-by: Haakon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 20:49:32 -03:00
Anand Khoje
c5f8f2c5e5 IB/core: Removed port validity check from ib_get_cached_subnet_prefix
Removed port validity check from ib_get_cached_subnet_prefix() as this
check is not needed because "port_num" is valid.

Link: https://lore.kernel.org/r/20210616154509.1047-2-anand.a.khoje@oracle.com
Suggested-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com>
Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 20:49:32 -03:00
Leon Romanovsky
bf194997c7 RDMA: Fix kernel-doc warnings about wrong comment
Compilation with W=1 produces warnings similar to the below.

  drivers/infiniband/ulp/ipoib/ipoib_main.c:320: warning: This comment
	starts with '/**', but isn't a kernel-doc comment. Refer
	Documentation/doc-guide/kernel-doc.rst

All such occurrences were found with the following one line
 git grep -A 1 "\/\*\*" drivers/infiniband/

Link: https://lore.kernel.org/r/e57d5f4ddd08b7a19934635b44d6d632841b9ba7.1623823612.git.leonro@nvidia.com
Reviewed-by: Jack Wang <jinpu.wang@ionos.com> #rtrs
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 20:32:50 -03:00
Yangyang Li
da43b7bebc RDMA/hns: Use IDA interface to manage xrcd index
Switch xrcd index allocation and release from hns own bitmap interface
to IDA interface.

Link: https://lore.kernel.org/r/1623325814-55737-7-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:42:54 -03:00
Yangyang Li
645f059346 RDMA/hns: Use IDA interface to manage pd index
Switch pd index allocation and release from hns own bitmap interface
to IDA interface.

Link: https://lore.kernel.org/r/1623325814-55737-6-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:42:54 -03:00
Yangyang Li
d38936f010 RDMA/hns: Use IDA interface to manage mtpt index
Switch mtpt index allocation and release from hns own bitmap interface
to IDA interface.

Link: https://lore.kernel.org/r/1623325814-55737-5-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:42:53 -03:00
Yangyang Li
38e375b771 RDMA/hns: Remove unused RR mechanism
Round-robin (RR) is no longer used in the allocation of the bitmap table,
and all the function input parameters that use this mechanism are
BITMAP_NO_RR. The code that defines and uses the RR needs to be deleted.

Link: https://lore.kernel.org/r/1623325814-55737-4-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:42:53 -03:00
Yangyang Li
1bc530c79d RDMA/hns: Remove the unused hns_roce_bitmap_free_range function
hns_roce_bitmap_free_range() is only called inside hns_roce_bitmap_free(),
and the input parameter "cnt" is set to a constant 1. In addition, the
driver does not use alloc_range scenarios, so free_range does not need to
exist.

Link: https://lore.kernel.org/r/1623325814-55737-3-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:42:53 -03:00
Yangyang Li
24977edbb5 RDMA/hns: Remove the unused hns_roce_bitmap_alloc_range function
The function is no longer used.

Link: https://lore.kernel.org/r/1623325814-55737-2-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:42:53 -03:00
Wenpeng Liang
3cea7b4a7d RDMA/core: Fix incorrect print format specifier
There are some '%u' for 'int' and '%d' for 'unsigend int', they should be
fixed.

Link: https://lore.kernel.org/r/1623325232-30900-1-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:38:30 -03:00
Xi Wang
57dba89ad2 RDMA/hns: Clean SRQC structure definition
Remove unused members in srq context structure.

Link: https://lore.kernel.org/r/1624262443-24528-10-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:03:42 -03:00
Yixing Liu
2b035e7312 RDMA/hns: Use new interface to write DB related fields
Use hr_write_reg() instead of roce_set_field().

Link: https://lore.kernel.org/r/1624262443-24528-9-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:03:42 -03:00
Yixing Liu
fd9e3679af RDMA/hns: Use new interface to write FRMR fields
Use "hr_reg_write" to replace "roce_set_filed".

Link: https://lore.kernel.org/r/1624262443-24528-8-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:03:42 -03:00
Lang Cheng
f778bf1b8c RDMA/hns: Use new interface to get CQE fields
WQE_INDEX and OPCODE and QPN of CQE use redundant masks. Just remove them.

Link: https://lore.kernel.org/r/1624262443-24528-7-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:03:41 -03:00
Lang Cheng
f0cb411aad RDMA/hns: Use new interface to modify QP context
Fill all QPC fileds with hr_reg_*() instead of roce_set_*(). SQPN is used
for HIP08 ES only, it should be removed.

Link: https://lore.kernel.org/r/1624262443-24528-6-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:03:41 -03:00
Yixing Liu
f6fcd28d49 RDMA/hns: Use new interface to write CQ context.
Use hr_reg_*() to write CQ context, it's simpler than roce_set_*().

Link: https://lore.kernel.org/r/1624262443-24528-5-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:03:41 -03:00
Lang Cheng
a762fe656b RDMA/hns: Add hr_reg_write_bool()
In order to avoid to do bitwise operations on a boolean value, add a new
register interface to avoid sparse comlaint about "dubious: x & !y" when
calling hr_reg_write(ctx, field, !!val).

Fixes: dc50477440 ("RDMA/hns: Use new interface to set MPT related fields")
Fixes: 495c24808c ("RDMA/hns: Add XRC subtype in QPC and XRC type in SRQC")
Link: https://lore.kernel.org/r/1624262443-24528-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:03:41 -03:00
Weihang Li
fe331da0f2 RDMA/hns: Add a check to ensure integer mtu is positive
GCC may reports an running time assert error when a value calculated from
ib_mtu_enum_to_int() is using as 'val' in FIELD_PREDP:

include/linux/compiler_types.h:328:38: error: call to
'__compiletime_assert_1524' declared with attribute error: FIELD_PREP:
value too large for the field

So a check is added about whether integer mtu from ib_mtu_enum_to_int() is
negative to avoid this warning.

Link: https://lore.kernel.org/r/1624262443-24528-3-git-send-email-liweihang@huawei.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:03:41 -03:00
Weihang Li
78c1da5270 RDMA/hns: Do not use !! for values that are already bool when calling hr_reg_write()
There is no need to use "!!" before "eq->eqe_size ==
HNS_ROCE_V3_EQE_SIZE", or sparse will complain about "dubious: x & !y".

Fixes: 782832f254 ("RDMA/hns: Simplify the function config_eqc()")
Link: https://lore.kernel.org/r/1624262443-24528-2-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 15:03:40 -03:00
Avihai Horon
1477d44ce4 RDMA/mlx5: Enable Relaxed Ordering by default for kernel ULPs
Relaxed Ordering is a capability that can only benefit users that support
it. All kernel ULPs should support Relaxed Ordering, as they are designed
to read data only after observing the CQE and use the DMA API correctly.

Hence, implicitly enable Relaxed Ordering by default for MR transfers in
kernel ULPs.

Link: https://lore.kernel.org/r/b7e820aab7402b8efa63605f4ea465831b3b1e5e.1623236426.git.leonro@nvidia.com
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 12:33:08 -03:00
Linus Torvalds
13311e7425 Linux 5.13-rc7 2021-06-20 15:03:15 -07:00
Linus Torvalds
cba5e97280 Merge tag 'sched_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
 "A single fix to restore fairness between control groups with equal
  priority"

* tag 'sched_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Correctly insert cfs_rq's to list on unthrottle
2021-06-20 09:44:52 -07:00
Linus Torvalds
9df7f15ee9 Merge tag 'irq_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Borislav Petkov:
 "A single fix for GICv3 to not take an interrupt in an NMI context"

* tag 'irq_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3: Workaround inconsistent PMR setting on NMI entry
2021-06-20 09:38:14 -07:00
Linus Torvalds
8363e795eb Merge tag 'x86_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
 "A first set of urgent fixes to the FPU/XSTATE handling mess^W code.
  (There's a lot more in the pipe):

   - Prevent corruption of the XSTATE buffer in signal handling by
     validating what is being copied from userspace first.

   - Invalidate other task's preserved FPU registers on XRSTOR failure
     (#PF) because latter can still modify some of them.

   - Restore the proper PKRU value in case userspace modified it

   - Reset FPU state when signal restoring fails

  Other:

   - Map EFI boot services data memory as encrypted in a SEV guest so
     that the guest can access it and actually boot properly

   - Two SGX correctness fixes: proper resources freeing and a NUMA fix"

* tag 'x86_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Avoid truncating memblocks for SGX memory
  x86/sgx: Add missing xa_destroy() when virtual EPC is destroyed
  x86/fpu: Reset state for all signal restore failures
  x86/pkru: Write hardware init value to PKRU when xstate is init
  x86/process: Check PF_KTHREAD and not current->mm for kernel threads
  x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer
  x86/fpu: Prevent state corruption in __fpu__restore_sig()
  x86/ioremap: Map EFI-reserved memory as encrypted for SEV
2021-06-20 09:09:58 -07:00
Linus Torvalds
b84a7c286c Merge tag 'powerpc-5.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 "Fix initrd corruption caused by our recent change to use relative jump
  labels.

  Fix a crash using perf record on systems without a hardware PMU
  backend.

  Rework our 64-bit signal handling slighty to make it more closely
  match the old behaviour, after the recent change to use unsafe user
  accessors.

  Thanks to Anastasia Kovaleva, Athira Rajeev, Christophe Leroy, Daniel
  Axtens, Greg Kurz, and Roman Bolshakov"

* tag 'powerpc-5.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/perf: Fix crash in perf_instruction_pointer() when ppmu is not set
  powerpc: Fix initrd corruption with relative jump labels
  powerpc/signal64: Copy siginfo before changing regs->nip
  powerpc/mem: Add back missing header to fix 'no previous prototype' error
2021-06-19 16:50:23 -07:00
Linus Torvalds
913ec3c22e Merge tag 'perf-tools-fixes-for-v5.13-2021-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix refcount usage when processing PERF_RECORD_KSYMBOL.

 - 'perf stat' metric group fixes.

 - Fix 'perf test' non-bash issue with stat bpf counters.

 - Update unistd, in.h and socket.h with the kernel sources, silencing
   perf build warnings.

* tag 'perf-tools-fixes-for-v5.13-2021-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  tools headers UAPI: Sync linux/in.h copy with the kernel sources
  tools headers UAPI: Sync asm-generic/unistd.h with the kernel original
  perf beauty: Update copy of linux/socket.h with the kernel sources
  perf test: Fix non-bash issue with stat bpf counters
  perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL
  perf metricgroup: Return error code from metricgroup__add_metric_sys_event_iter()
  perf metricgroup: Fix find_evsel_group() event selector
2021-06-19 14:50:43 -07:00
Linus Torvalds
d9403d307d Merge tag 'riscv-for-linus-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:

 - A build fix to always build modules with the 'medany' code model, as
   the module loader doesn't support 'medlow'.

 - A Kconfig warning fix for the SiFive errata.

 - A pair of fixes that for regressions to the recent memory layout
   changes.

 - A fix for the FU740 device tree.

* tag 'riscv-for-linus-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: dts: fu740: fix cache-controller interrupts
  riscv: Ensure BPF_JIT_REGION_START aligned with PMD size
  riscv: kasan: Fix MODULES_VADDR evaluation due to local variables' name
  riscv: sifive: fix Kconfig errata warning
  riscv32: Use medany C model for modules
2021-06-19 08:45:34 -07:00
Linus Torvalds
e14c779ade Merge tag 's390-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:

 - Fix zcrypt ioctl hang due to AP queue msg counter dropping below 0
   when pending requests are purged.

 - Two fixes for the machine check handler in the entry code.

* tag 's390-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/ap: Fix hanging ioctl caused by wrong msg counter
  s390/mcck: fix invalid KVM guest condition check
  s390/mcck: fix calculation of SIE critical section size
2021-06-19 08:39:13 -07:00
Arnaldo Carvalho de Melo
1792a59eab tools headers UAPI: Sync linux/in.h copy with the kernel sources
To pick the changes in:

  3218274773 ("icmp: don't send out ICMP messages with a source address of 0.0.0.0")

That don't result in any change in tooling, as INADDR_ are not used to
generate id->string tables used by 'perf trace'.

This addresses this build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h'
  diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h

Cc: David S. Miller <davem@davemloft.net>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-19 10:15:22 -03:00
Arnaldo Carvalho de Melo
17d27fc314 tools headers UAPI: Sync asm-generic/unistd.h with the kernel original
To pick the changes in:

  8b1462b67f ("quota: finish disable quotactl_path syscall")

Those headers are used in some arches to generate the syscall table used
in 'perf trace' to translate syscall numbers into strings.

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
  diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h

Cc: Jan Kara <jack@suse.cz>
Cc: Marcin Juszkiewicz <marcin@juszkiewicz.com.pl>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-19 10:12:30 -03:00
Arnaldo Carvalho de Melo
ef83f9efe8 perf beauty: Update copy of linux/socket.h with the kernel sources
To pick the changes in:

  ea6932d70e ("net: make get_net_ns return error if NET_NS is disabled")

That don't result in any changes in the tables generated from that
header.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h'
  diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h

Cc: Changbin Du <changbin.du@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-19 10:09:08 -03:00
Ian Rogers
482698c2f8 perf test: Fix non-bash issue with stat bpf counters
$(( .. )) is a bash feature but the test's interpreter is !/bin/sh,
switch the code to use expr.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/20210617184216.2075588-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-19 10:06:46 -03:00
Riccardo Mancini
c087e9480c perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL
ASan reported a memory leak of BPF-related ksymbols map and dso. The
leak is caused by refount never reaching 0, due to missing __put calls
in the function machine__process_ksymbol_register.

Once the dso is inserted in the map, dso__put() should be called
(map__new2() increases the refcount to 2).

The same thing applies for the map when it's inserted into maps
(maps__insert() increases the refcount to 2).

  $ sudo ./perf record -- sleep 5
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.025 MB perf.data (8 samples) ]

  =================================================================
  ==297735==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 6992 byte(s) in 19 object(s) allocated from:
      #0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
      #1 0x8e4e53 in map__new2 /home/user/linux/tools/perf/util/map.c:216:20
      #2 0x8cf68c in machine__process_ksymbol_register /home/user/linux/tools/perf/util/machine.c:778:10
      [...]

  Indirect leak of 8702 byte(s) in 19 object(s) allocated from:
      #0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
      #1 0x8728d7 in dso__new_id /home/user/linux/tools/perf/util/dso.c:1256:20
      #2 0x872015 in dso__new /home/user/linux/tools/perf/util/dso.c:1295:9
      #3 0x8cf623 in machine__process_ksymbol_register /home/user/linux/tools/perf/util/machine.c:774:21
      [...]

  Indirect leak of 1520 byte(s) in 19 object(s) allocated from:
      #0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
      #1 0x87b3da in symbol__new /home/user/linux/tools/perf/util/symbol.c:269:23
      #2 0x888954 in map__process_kallsym_symbol /home/user/linux/tools/perf/util/symbol.c:710:8
      [...]

  Indirect leak of 1406 byte(s) in 19 object(s) allocated from:
      #0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
      #1 0x87b3da in symbol__new /home/user/linux/tools/perf/util/symbol.c:269:23
      #2 0x8cfbd8 in machine__process_ksymbol_register /home/user/linux/tools/perf/util/machine.c:803:8
      [...]

Signed-off-by: Riccardo Mancini <rickyman7@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tommi Rantala <tommi.t.rantala@nokia.com>
Link: http://lore.kernel.org/lkml/20210612173751.188582-1-rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-19 10:06:46 -03:00
John Garry
fe7a98b9d9 perf metricgroup: Return error code from metricgroup__add_metric_sys_event_iter()
The error code is not set at all in the sys event iter function.

This may lead to an uninitialized value of "ret" in
metricgroup__add_metric() when no CPU metric is added.

Fix by properly setting the error code.

It is not necessary to init "ret" to 0 in metricgroup__add_metric(), as
if we have no CPU or sys event metric matching, then "has_match" should
be 0 and "ret" is set to -EINVAL.

However gcc cannot detect that it may not have been set after the
map_for_each_metric() loop for CPU metrics, which is strange.

Fixes: be335ec28e ("perf metricgroup: Support adding metrics for system PMUs")
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/1623335580-187317-3-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-19 10:06:46 -03:00
John Garry
fc96ec4d5d perf metricgroup: Fix find_evsel_group() event selector
The following command segfaults on my x86 broadwell:

  $ ./perf stat  -M frontend_bound,retiring,backend_bound,bad_speculation sleep 1
  WARNING: grouped events cpus do not match, disabling group:
    anon group { raw 0x10e }
    anon group { raw 0x10e }
  perf: util/evsel.c:1596: get_group_fd: Assertion `!(!leader->core.fd)' failed.
  Aborted (core dumped)

The issue shows itself as a use-after-free in evlist__check_cpu_maps(),
whereby the leader of an event selector (evsel) has been deleted (yet we
still attempt to verify for an evsel).

Fundamentally the problem comes from metricgroup__setup_events() ->
find_evsel_group(), and has developed from the previous fix attempt in
commit 9c880c24cb ("perf metricgroup: Fix for metrics containing
duration_time").

The problem now is that the logic in checking if an evsel is in the same
group is subtly broken for the "cycles" event. For the "cycles" event,
the pmu_name is NULL; however the logic in find_evsel_group() may set an
event matched against "cycles" as used, when it should not be.

This leads to a condition where an evsel is set, yet its leader is not.

Fix the check for evsel pmu_name by not matching evsels when either has a
NULL pmu_name.

There is still a pre-existing metric issue whereby the ordering of the
metrics may break the 'stat' function, as discussed at:
https://lore.kernel.org/lkml/49c6fccb-b716-1bf0-18a6-cace1cdb66b9@huawei.com/

Fixes: 9c880c24cb ("perf metricgroup: Fix for metrics containing duration_time")
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> # On a Thinkpad T450S
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/1623335580-187317-2-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-19 10:06:46 -03:00
David Abdurachmanov
7ede12b01b riscv: dts: fu740: fix cache-controller interrupts
The order of interrupt numbers is incorrect.

The order for FU740 is: DirError, DataError, DataFail, DirFail

From SiFive FU740-C000 Manual:
19 - L2 Cache DirError
20 - L2 Cache DirFail
21 - L2 Cache DataError
22 - L2 Cache DataFail

Signed-off-by: David Abdurachmanov <david.abdurachmanov@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-19 00:11:53 -07:00
Jisheng Zhang
3a02764c37 riscv: Ensure BPF_JIT_REGION_START aligned with PMD size
Andreas reported commit fc8504765e ("riscv: bpf: Avoid breaking W^X")
breaks booting with one kind of defconfig, I reproduced a kernel panic
with the defconfig:

[    0.138553] Unable to handle kernel paging request at virtual address ffffffff81201220
[    0.139159] Oops [#1]
[    0.139303] Modules linked in:
[    0.139601] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #1
[    0.139934] Hardware name: riscv-virtio,qemu (DT)
[    0.140193] epc : __memset+0xc4/0xfc
[    0.140416]  ra : skb_flow_dissector_init+0x1e/0x82
[    0.140609] epc : ffffffff8029806c ra : ffffffff8033be78 sp : ffffffe001647da0
[    0.140878]  gp : ffffffff81134b08 tp : ffffffe001654380 t0 : ffffffff81201158
[    0.141156]  t1 : 0000000000000002 t2 : 0000000000000154 s0 : ffffffe001647dd0
[    0.141424]  s1 : ffffffff80a43250 a0 : ffffffff81201220 a1 : 0000000000000000
[    0.141654]  a2 : 000000000000003c a3 : ffffffff81201258 a4 : 0000000000000064
[    0.141893]  a5 : ffffffff8029806c a6 : 0000000000000040 a7 : ffffffffffffffff
[    0.142126]  s2 : ffffffff81201220 s3 : 0000000000000009 s4 : ffffffff81135088
[    0.142353]  s5 : ffffffff81135038 s6 : ffffffff8080ce80 s7 : ffffffff80800438
[    0.142584]  s8 : ffffffff80bc6578 s9 : 0000000000000008 s10: ffffffff806000ac
[    0.142810]  s11: 0000000000000000 t3 : fffffffffffffffc t4 : 0000000000000000
[    0.143042]  t5 : 0000000000000155 t6 : 00000000000003ff
[    0.143220] status: 0000000000000120 badaddr: ffffffff81201220 cause: 000000000000000f
[    0.143560] [<ffffffff8029806c>] __memset+0xc4/0xfc
[    0.143859] [<ffffffff8061e984>] init_default_flow_dissectors+0x22/0x60
[    0.144092] [<ffffffff800010fc>] do_one_initcall+0x3e/0x168
[    0.144278] [<ffffffff80600df0>] kernel_init_freeable+0x1c8/0x224
[    0.144479] [<ffffffff804868a8>] kernel_init+0x12/0x110
[    0.144658] [<ffffffff800022de>] ret_from_exception+0x0/0xc
[    0.145124] ---[ end trace f1e9643daa46d591 ]---

After some investigation, I think I found the root cause: commit
2bfc6cd81b ("move kernel mapping outside of linear mapping") moves
BPF JIT region after the kernel:

| #define BPF_JIT_REGION_START	PFN_ALIGN((unsigned long)&_end)

The &_end is unlikely aligned with PMD size, so the front bpf jit
region sits with part of kernel .data section in one PMD size mapping.
But kernel is mapped in PMD SIZE, when bpf_jit_binary_lock_ro() is
called to make the first bpf jit prog ROX, we will make part of kernel
.data section RO too, so when we write to, for example memset the
.data section, MMU will trigger a store page fault.

To fix the issue, we need to ensure the BPF JIT region is PMD size
aligned. This patch acchieve this goal by restoring the BPF JIT region
to original position, I.E the 128MB before kernel .text section. The
modification to kasan_init.c is inspired by Alexandre.

Fixes: fc8504765e ("riscv: bpf: Avoid breaking W^X")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-18 21:10:05 -07:00
Jisheng Zhang
314b781706 riscv: kasan: Fix MODULES_VADDR evaluation due to local variables' name
commit 2bfc6cd81b ("riscv: Move kernel mapping outside of linear
mapping") makes use of MODULES_VADDR to populate kernel, BPF, modules
mapping. Currently, MODULES_VADDR is defined as below for RV64:

| #define MODULES_VADDR   (PFN_ALIGN((unsigned long)&_end) - SZ_2G)

But kasan_init() has two local variables which are also named as _start,
_end, so MODULES_VADDR is evaluated with the local variable _end
rather than the global "_end" as we expected. Fix this issue by
renaming the two local variables.

Fixes: 2bfc6cd81b ("riscv: Move kernel mapping outside of linear mapping")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-18 21:09:56 -07:00
Linus Torvalds
9ed13a17e3 Merge tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.13-rc7, including fixes from wireless, bpf,
  bluetooth, netfilter and can.

  Current release - regressions:

   - mlxsw: spectrum_qdisc: Pass handle, not band number to find_class()
     to fix modifying offloaded qdiscs

   - lantiq: net: fix duplicated skb in rx descriptor ring

   - rtnetlink: fix regression in bridge VLAN configuration, empty info
     is not an error, bot-generated "fix" was not needed

   - libbpf: s/rx/tx/ typo on umem->rx_ring_setup_done to fix umem
     creation

  Current release - new code bugs:

   - ethtool: fix NULL pointer dereference during module EEPROM dump via
     the new netlink API

   - mlx5e: don't update netdev RQs with PTP-RQ, the special purpose
     queue should not be visible to the stack

   - mlx5e: select special PTP queue only for SKBTX_HW_TSTAMP skbs

   - mlx5e: verify dev is present in get devlink port ndo, avoid a panic

  Previous releases - regressions:

   - neighbour: allow NUD_NOARP entries to be force GCed

   - further fixes for fallout from reorg of WiFi locking (staging:
     rtl8723bs, mac80211, cfg80211)

   - skbuff: fix incorrect msg_zerocopy copy notifications

   - mac80211: fix NULL ptr deref for injected rate info

   - Revert "net/mlx5: Arm only EQs with EQEs" it may cause missed IRQs

  Previous releases - always broken:

   - bpf: more speculative execution fixes

   - netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local

   - udp: fix race between close() and udp_abort() resulting in a panic

   - fix out of bounds when parsing TCP options before packets are
     validated (in netfilter: synproxy, tc: sch_cake and mptcp)

   - mptcp: improve operation under memory pressure, add missing
     wake-ups

   - mptcp: fix double-lock/soft lookup in subflow_error_report()

   - bridge: fix races (null pointer deref and UAF) in vlan tunnel
     egress

   - ena: fix DMA mapping function issues in XDP

   - rds: fix memory leak in rds_recvmsg

  Misc:

   - vrf: allow larger MTUs

   - icmp: don't send out ICMP messages with a source address of 0.0.0.0

   - cdc_ncm: switch to eth%d interface naming"

* tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (139 commits)
  net: ethernet: fix potential use-after-free in ec_bhf_remove
  selftests/net: Add icmp.sh for testing ICMP dummy address responses
  icmp: don't send out ICMP messages with a source address of 0.0.0.0
  net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY
  net: ll_temac: Fix TX BD buffer overwrite
  net: ll_temac: Add memory-barriers for TX BD access
  net: ll_temac: Make sure to free skb when it is completely used
  MAINTAINERS: add Guvenc as SMC maintainer
  bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path
  bnxt_en: Fix TQM fastpath ring backing store computation
  bnxt_en: Rediscover PHY capabilities after firmware reset
  cxgb4: fix wrong shift.
  mac80211: handle various extensible elements correctly
  mac80211: reset profile_periodicity/ema_ap
  cfg80211: avoid double free of PMSR request
  cfg80211: make certificate generation more robust
  mac80211: minstrel_ht: fix sample time check
  net: qed: Fix memcpy() overflow of qed_dcbx_params()
  net: cdc_eem: fix tx fixup skb leak
  net: hamradio: fix memory leak in mkiss_close
  ...
2021-06-18 18:55:29 -07:00
Linus Torvalds
6fab154a33 Merge tag 'for-5.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
 "One more fix, for a space accounting bug in zoned mode. It happens
  when a block group is switched back rw->ro and unusable bytes (due to
  zoned constraints) are subtracted twice.

  It has user visible effects so I consider it important enough for late
  -rc inclusion and backport to stable"

* tag 'for-5.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: fix negative space_info->bytes_readonly
2021-06-18 16:39:03 -07:00