Commit Graph

139753 Commits

Author SHA1 Message Date
Christian Brauner
0c5fd887d2 acl: move idmapped mount fixup into vfs_{g,s}etxattr()
This cycle we added support for mounting overlayfs on top of idmapped mounts.
Recently I've started looking into potential corner cases when trying to add
additional tests and I noticed that reporting for POSIX ACLs is currently wrong
when using idmapped layers with overlayfs mounted on top of it.

I'm going to give a rather detailed explanation to both the origin of the
problem and the solution.

Let's assume the user creates the following directory layout and they have a
rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you would
expect files on your host system to be owned. For example, ~/.bashrc for your
regular user would be owned by 1000:1000 and /root/.bashrc would be owned by
0:0. IOW, this is just regular boring filesystem tree on an ext4 or xfs
filesystem.

The user chooses to set POSIX ACLs using the setfacl binary granting the user
with uid 4 read, write, and execute permissions for their .bashrc file:

        setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc

Now they to expose the whole rootfs to a container using an idmapped mount. So
they first create:

        mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap}
        mkdir -pv /vol/contpool/ctrover/{over,work}
        chown 10000000:10000000 /vol/contpool/ctrover/{over,work}

The user now creates an idmapped mount for the rootfs:

        mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \
                                      /var/lib/lxc/c2/rootfs \
                                      /vol/contpool/lowermap

This for example makes it so that /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
which is owned by uid and gid 1000 as being owned by uid and gid 10001000 at
/vol/contpool/lowermap/home/ubuntu/.bashrc.

Assume the user wants to expose these idmapped mounts through an overlayfs
mount to a container.

       mount -t overlay overlay                      \
             -o lowerdir=/vol/contpool/lowermap,     \
                upperdir=/vol/contpool/overmap/over, \
                workdir=/vol/contpool/overmap/work   \
             /vol/contpool/merge

The user can do this in two ways:

(1) Mount overlayfs in the initial user namespace and expose it to the
    container.
(2) Mount overlayfs on top of the idmapped mounts inside of the container's
    user namespace.

Let's assume the user chooses the (1) option and mounts overlayfs on the host
and then changes into a container which uses the idmapping 0:10000000:65536
which is the same used for the two idmapped mounts.

Now the user tries to retrieve the POSIX ACLs using the getfacl command

        getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc

and to their surprise they see:

        # file: vol/contpool/merge/home/ubuntu/.bashrc
        # owner: 1000
        # group: 1000
        user::rw-
        user:4294967295:rwx
        group::r--
        mask::rwx
        other::r--

indicating the the uid wasn't correctly translated according to the idmapped
mount. The problem is how we currently translate POSIX ACLs. Let's inspect the
callchain in this example:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                  |> vfs_getxattr()
                  |  -> __vfs_getxattr()
                  |     -> handler->get == ovl_posix_acl_xattr_get()
                  |        -> ovl_xattr_get()
                  |           -> vfs_getxattr()
                  |              -> __vfs_getxattr()
                  |                 -> handler->get() /* lower filesystem callback */
                  |> posix_acl_fix_xattr_to_user()
                     {
                              4 = make_kuid(&init_user_ns, 4);
                              4 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 4);
                              /* FAILURE */
                             -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                     }

If the user chooses to use option (2) and mounts overlayfs on top of idmapped
mounts inside the container things don't look that much better:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:10000000:65536

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                  |> vfs_getxattr()
                  |  -> __vfs_getxattr()
                  |     -> handler->get == ovl_posix_acl_xattr_get()
                  |        -> ovl_xattr_get()
                  |           -> vfs_getxattr()
                  |              -> __vfs_getxattr()
                  |                 -> handler->get() /* lower filesystem callback */
                  |> posix_acl_fix_xattr_to_user()
                     {
                              4 = make_kuid(&init_user_ns, 4);
                              4 = mapped_kuid_fs(&init_user_ns, 4);
                              /* FAILURE */
                             -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                     }

As is easily seen the problem arises because the idmapping of the lower mount
isn't taken into account as all of this happens in do_gexattr(). But
do_getxattr() is always called on an overlayfs mount and inode and thus cannot
possible take the idmapping of the lower layers into account.

This problem is similar for fscaps but there the translation happens as part of
vfs_getxattr() already. Let's walk through an fscaps overlayfs callchain:

        setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc

The expected outcome here is that we'll receive the cap_net_raw capability as
we are able to map the uid associated with the fscap to 0 within our container.
IOW, we want to see 0 as the result of the idmapping translations.

If the user chooses option (1) we get the following callchain for fscaps:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                   -> vfs_getxattr()
                      -> xattr_getsecurity()
                         -> security_inode_getsecurity()                                       ________________________________
                            -> cap_inode_getsecurity()                                         |                              |
                               {                                                               V                              |
                                        10000000 = make_kuid(0:0:4k /* overlayfs idmapping */, 10000000);                     |
                                        10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                  |
                                               /* Expected result is 0 and thus that we own the fscap. */                     |
                                               0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);            |
                               }                                                                                              |
                               -> vfs_getxattr_alloc()                                                                        |
                                  -> handler->get == ovl_other_xattr_get()                                                    |
                                     -> vfs_getxattr()                                                                        |
                                        -> xattr_getsecurity()                                                                |
                                           -> security_inode_getsecurity()                                                    |
                                              -> cap_inode_getsecurity()                                                      |
                                                 {                                                                            |
                                                                0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);               |
                                                         10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); |
                                                         10000000 = from_kuid(0:0:4k /* overlayfs idmapping */, 10000000);    |
                                                         |____________________________________________________________________|
                                                 }
                                                 -> vfs_getxattr_alloc()
                                                    -> handler->get == /* lower filesystem callback */

And if the user chooses option (2) we get:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:10000000:65536

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                   -> vfs_getxattr()
                      -> xattr_getsecurity()
                         -> security_inode_getsecurity()                                                _______________________________
                            -> cap_inode_getsecurity()                                                  |                             |
                               {                                                                        V                             |
                                       10000000 = make_kuid(0:10000000:65536 /* overlayfs idmapping */, 0);                           |
                                       10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                           |
                                               /* Expected result is 0 and thus that we own the fscap. */                             |
                                              0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);                     |
                               }                                                                                                      |
                               -> vfs_getxattr_alloc()                                                                                |
                                  -> handler->get == ovl_other_xattr_get()                                                            |
                                    |-> vfs_getxattr()                                                                                |
                                        -> xattr_getsecurity()                                                                        |
                                           -> security_inode_getsecurity()                                                            |
                                              -> cap_inode_getsecurity()                                                              |
                                                 {                                                                                    |
                                                                 0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);                      |
                                                          10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0);        |
                                                                 0 = from_kuid(0:10000000:65536 /* overlayfs idmapping */, 10000000); |
                                                                 |____________________________________________________________________|
                                                 }
                                                 -> vfs_getxattr_alloc()
                                                    -> handler->get == /* lower filesystem callback */

We can see how the translation happens correctly in those cases as the
conversion happens within the vfs_getxattr() helper.

For POSIX ACLs we need to do something similar. However, in contrast to fscaps
we cannot apply the fix directly to the kernel internal posix acl data
structure as this would alter the cached values and would also require a rework
of how we currently deal with POSIX ACLs in general which almost never take the
filesystem idmapping into account (the noteable exception being FUSE but even
there the implementation is special) and instead retrieve the raw values based
on the initial idmapping.

The correct values are then generated right before returning to userspace. The
fix for this is to move taking the mount's idmapping into account directly in
vfs_getxattr() instead of having it be part of posix_acl_fix_xattr_to_user().

To this end we split out two small and unexported helpers
posix_acl_getxattr_idmapped_mnt() and posix_acl_setxattr_idmapped_mnt(). The
former to be called in vfs_getxattr() and the latter to be called in
vfs_setxattr().

Let's go back to the original example. Assume the user chose option (1) and
mounted overlayfs on top of idmapped mounts on the host:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                  |> vfs_getxattr()
                  |  |> __vfs_getxattr()
                  |  |  -> handler->get == ovl_posix_acl_xattr_get()
                  |  |     -> ovl_xattr_get()
                  |  |        -> vfs_getxattr()
                  |  |           |> __vfs_getxattr()
                  |  |           |  -> handler->get() /* lower filesystem callback */
                  |  |           |> posix_acl_getxattr_idmapped_mnt()
                  |  |              {
                  |  |                              4 = make_kuid(&init_user_ns, 4);
                  |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                  |  |                       10000004 = from_kuid(&init_user_ns, 10000004);
                  |  |                       |_______________________
                  |  |              }                               |
                  |  |                                              |
                  |  |> posix_acl_getxattr_idmapped_mnt()           |
                  |     {                                           |
                  |                                                 V
                  |             10000004 = make_kuid(&init_user_ns, 10000004);
                  |             10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004);
                  |             10000004 = from_kuid(&init_user_ns, 10000004);
                  |     }       |_________________________________________________
                  |                                                              |
                  |                                                              |
                  |> posix_acl_fix_xattr_to_user()                               |
                     {                                                           V
                                 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                        /* SUCCESS */
                                        4 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000004);
                     }

And similarly if the user chooses option (1) and mounted overayfs on top of
idmapped mounts inside the container:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:10000000:65536

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                  |> vfs_getxattr()
                  |  |> __vfs_getxattr()
                  |  |  -> handler->get == ovl_posix_acl_xattr_get()
                  |  |     -> ovl_xattr_get()
                  |  |        -> vfs_getxattr()
                  |  |           |> __vfs_getxattr()
                  |  |           |  -> handler->get() /* lower filesystem callback */
                  |  |           |> posix_acl_getxattr_idmapped_mnt()
                  |  |              {
                  |  |                              4 = make_kuid(&init_user_ns, 4);
                  |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                  |  |                       10000004 = from_kuid(&init_user_ns, 10000004);
                  |  |                       |_______________________
                  |  |              }                               |
                  |  |                                              |
                  |  |> posix_acl_getxattr_idmapped_mnt()           |
                  |     {                                           V
                  |             10000004 = make_kuid(&init_user_ns, 10000004);
                  |             10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004);
                  |             10000004 = from_kuid(0(&init_user_ns, 10000004);
                  |             |_________________________________________________
                  |     }                                                        |
                  |                                                              |
                  |> posix_acl_fix_xattr_to_user()                               |
                     {                                                           V
                                 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                        /* SUCCESS */
                                        4 = from_kuid(0:10000000:65536 /* caller's idmappings */, 10000004);
                     }

The last remaining problem we need to fix here is ovl_get_acl(). During
ovl_permission() overlayfs will call:

        ovl_permission()
        -> generic_permission()
           -> acl_permission_check()
              -> check_acl()
                 -> get_acl()
                    -> inode->i_op->get_acl() == ovl_get_acl()
                        > get_acl() /* on the underlying filesystem)
                          ->inode->i_op->get_acl() == /*lower filesystem callback */
                 -> posix_acl_permission()

passing through the get_acl request to the underlying filesystem. This will
retrieve the acls stored in the lower filesystem without taking the idmapping
of the underlying mount into account as this would mean altering the cached
values for the lower filesystem. So we block using ACLs for now until we
decided on a nice way to fix this. Note this limitation both in the
documentation and in the code.

The most straightforward solution would be to have ovl_get_acl() simply
duplicate the ACLs, update the values according to the idmapped mount and
return it to acl_permission_check() so it can be used in posix_acl_permission()
forgetting them afterwards. This is a bit heavy handed but fairly
straightforward otherwise.

Link: https://github.com/brauner/mount-idmapped/issues/9
Link: https://lore.kernel.org/r/20220708090134.385160-2-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: linux-unionfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-07-15 22:08:59 +02:00
Christian Brauner
c9fa2b07fa mnt_idmapping: add vfs[g,u]id_into_k[g,u]id()
Add two tiny helpers to conver a vfsuid into a kuid.

Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-07-15 22:08:01 +02:00
Christian Brauner
45598fd4e2 Merge tag 'ovl-fixes-5.19-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into fs.idmapped.overlay.acl
Bring in Miklos' tree which contains the temporary fix for POSIX ACLs
with overlayfs on top of idmapped layers. We will add a proper fix on
top of it and then revert the temporary fix.

Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-07-15 22:06:10 +02:00
Linus Torvalds
a8ebfcd33c Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
 "RISC-V:
   - Fix missing PAGE_PFN_MASK

   - Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()

  x86:
   - Fix for nested virtualization when TSC scaling is active

   - Estimate the size of fastcc subroutines conservatively, avoiding
     disastrous underestimation when return thunks are enabled

   - Avoid possible use of uninitialized fields of 'struct
     kvm_lapic_irq'

  Generic:
   - Mark as such the boolean values available from the statistics file
     descriptors

   - Clarify statistics documentation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: emulate: do not adjust size of fastop and setcc subroutines
  KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()
  Documentation: kvm: clarify histogram units
  kvm: stats: tell userspace which values are boolean
  x86/kvm: fix FASTOP_SIZE when return thunks are enabled
  KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1
  RISC-V: KVM: Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
  riscv: Fix missing PAGE_PFN_MASK
2022-07-15 10:31:46 -07:00
Linus Torvalds
1ce9d792e8 Merge tag 'ceph-for-5.19-rc7' of https://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
 "A folio locking fixup that Xiubo and David cooperated on, marked for
  stable. Most of it is in netfs but I picked it up into ceph tree on
  agreement with David"

* tag 'ceph-for-5.19-rc7' of https://github.com/ceph/ceph-client:
  netfs: do not unlock and put the folio twice
2022-07-15 10:27:28 -07:00
Lukasz Luba
5e0fd2026c firmware: arm_scmi: Get detailed power scale from perf
In SCMI v3.1 the power scale can be in micro-Watts. The upper layers, e.g.
cpufreq and EM should handle received power values properly (upscale when
needed). Thus, provide an interface which allows to check what is the
scale for power values. The old interface allowed to distinguish between
bogo-Watts and milli-Watts only (which was good for older SCMI spec).

Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-15 19:17:30 +02:00
Lukasz Luba
ae6ccaa650 PM: EM: convert power field to micro-Watts precision and align drivers
The milli-Watts precision causes rounding errors while calculating
efficiency cost for each OPP. This is especially visible in the 'simple'
Energy Model (EM), where the power for each OPP is provided from OPP
framework. This can cause some OPPs to be marked inefficient, while
using micro-Watts precision that might not happen.

Update all EM users which access 'power' field and assume the value is
in milli-Watts.

Solve also an issue with potential overflow in calculation of energy
estimation on 32bit machine. It's needed now since the power value
(thus the 'cost' as well) are higher.

Example calculation which shows the rounding error and impact:

power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz

power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000
power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18

power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961
power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21

max_freq = 2000MHz

cost_a_mW = 18 * 2000MHz/500MHz = 72
cost_a_uW = 18000 * 2000MHz/500MHz = 72000

cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better
cost_b_uW = 21961 * 2000MHz/600MHz = 73203

The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly
better that the 'cost_b_uW' (this patch uses micro-Watts) and such
would have impact on the 'inefficient OPPs' information in the Cpufreq
framework. This patch set removes the rounding issue.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-15 19:17:30 +02:00
Linus Torvalds
1c49f281c9 Merge tag 'soc-fixes-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
 "Most of the contents are bugfixes for the devicetree files:

   - A Qualcomm MSM8974 pin controller regression, caused by a cleanup
     patch that gets partially reverted here.

   - Missing properties for Broadcom BCM49xx to fix timer detection and
     SMP boot.

   - Fix touchscreen pinctrl for imx6ull-colibri board

   - Multiple fixes for Rockchip rk3399 based machines including the vdu
     clock-rate fix, otg port fix on Quartz64-A and ethernet on
     Quartz64-B

   - Fixes for misspelled DT contents causing minor problems on
     imx6qdl-ts7970m, orangepi-zero, sama5d2, kontron-kswitch-d10, and
     ls1028a

  And a couple of changes elsewhere:

   - Fix binding for Allwinner D1 display pipeline

   - Trivial code fixes to the TEE and reset controller driver
     subsystems and the rockchip platform code.

   - Multiple updates to the MAINTAINERS files, marking the Palm Treo
     support as orphaned, and fixing some entries for added or changed
     file names"

* tag 'soc-fixes-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (21 commits)
  arm64: dts: broadcom: bcm4908: Fix cpu node for smp boot
  arm64: dts: broadcom: bcm4908: Fix timer node for BCM4906 SoC
  ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero
  ARM: dts: at91: sama5d2: Fix typo in i2s1 node
  tee: tee_get_drvdata(): fix description of return value
  optee: Remove duplicate 'of' in two places.
  ARM: dts: kswitch-d10: use open drain mode for coma-mode pins
  ARM: dts: colibri-imx6ull: fix snvs pinmux group
  optee: smc_abi.c: fix wrong pointer passed to IS_ERR/PTR_ERR()
  MAINTAINERS: add polarfire rng, pci and clock drivers
  MAINTAINERS: mark ARM/PALM TREO SUPPORT orphan
  ARM: dts: imx6qdl-ts7970: Fix ngpio typo and count
  arm64: dts: ls1028a: Update SFP node to include clock
  dt-bindings: display: sun4i: Fix D1 pipeline count
  ARM: dts: qcom: msm8974: re-add missing pinctrl
  reset: Fix devm bulk optional exclusive control getter
  MAINTAINERS: rectify entry for SYNOPSYS AXS10x RESET CONTROLLER DRIVER
  ARM: rockchip: Add missing of_node_put() in rockchip_suspend_init()
  arm64: dts: rockchip: Assign RK3399 VDU clock rate
  arm64: dts: rockchip: Fix Quartz64-A dwc3 otg port behavior
  ...
2022-07-15 10:16:44 -07:00
Dmitry Osipenko
9b04369b06 drm/scheduler: Don't kill jobs in interrupt context
Interrupt context can't sleep. Drivers like Panfrost and MSM are taking
mutex when job is released, and thus, that code can sleep. This results
into "BUG: scheduling while atomic" if locks are contented while job is
freed. There is no good reason for releasing scheduler's jobs in IRQ
context, hence use normal context to fix the trouble.

Cc: stable@vger.kernel.org
Fixes: 542cff7893 ("drm/sched: Avoid lockdep spalt on killing a processes")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220411221536.283312-1-dmitry.osipenko@collabora.com
2022-07-15 10:09:15 -04:00
Kuniyuki Iwashima
08a75f1067 tcp: Fix data-races around sysctl_tcp_l3mdev_accept.
While reading sysctl_tcp_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 6dd9a14e92 ("net: Allow accepted sockets to be bound to l3mdev domain")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima
1a0008f9df tcp/dccp: Fix a data-race around sysctl_tcp_fwmark_accept.
While reading sysctl_tcp_fwmark_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 84f39b08d7 ("net: support marking accepting TCP sockets")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima
85d0b4dbd7 ip: Fix a data-race around sysctl_fwmark_reflect.
While reading sysctl_fwmark_reflect, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: e110861f86 ("net: add a sysctl to reflect the fwmark on replies")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima
289d3b21fb ip: Fix data-races around sysctl_ip_nonlocal_bind.
While reading sysctl_ip_nonlocal_bind, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima
60c158dc7b ip: Fix data-races around sysctl_ip_fwd_use_pmtu.
While reading sysctl_ip_fwd_use_pmtu, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: f87c10a8aa ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima
8281b7ec5c ip: Fix data-races around sysctl_ip_default_ttl.
While reading sysctl_ip_default_ttl, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15 11:49:55 +01:00
Linus Torvalds
9bd572ec7a Merge tag 'net-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter, bpf and wireless.

  Still no major regressions, the release continues to be calm. An
  uptick of fixes this time around due to trivial data race fixes and
  patches flowing down from subtrees.

  There has been a few driver fixes (particularly a few fixes for false
  positives due to 66e4c8d950 which went into -next in May!) that make
  me worry the wide testing is not exactly fully through.

  So "calm" but not "let's just cut the final ASAP" vibes over here.

  Current release - regressions:

   - wifi: rtw88: fix write to const table of channel parameters

  Current release - new code bugs:

   - mac80211: add gfp_t arg to ieeee80211_obss_color_collision_notify

   - mlx5:
      - TC, allow offload from uplink to other PF's VF
      - Lag, decouple FDB selection and shared FDB
      - Lag, correct get the port select mode str

   - bnxt_en: fix and simplify XDP transmit path

   - r8152: fix accessing unset transport header

  Previous releases - regressions:

   - conntrack: fix crash due to confirmed bit load reordering (after
     atomic -> refcount conversion)

   - stmmac: dwc-qos: disable split header for Tegra194

  Previous releases - always broken:

   - mlx5e: ring the TX doorbell on DMA errors

   - bpf: make sure mac_header was set before using it

   - mac80211: do not wake queues on a vif that is being stopped

   - mac80211: fix queue selection for mesh/OCB interfaces

   - ip: fix dflt addr selection for connected nexthop

   - seg6: fix skb checksums for SRH encapsulation/insertion

   - xdp: fix spurious packet loss in generic XDP TX path

   - bunch of sysctl data race fixes

   - nf_log: incorrect offset to network header

  Misc:

   - bpf: add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs"

* tag 'net-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
  nfp: flower: configure tunnel neighbour on cmsg rx
  net/tls: Check for errors in tls_device_init
  MAINTAINERS: Add an additional maintainer to the AMD XGBE driver
  xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue
  selftests/net: test nexthop without gw
  ip: fix dflt addr selection for connected nexthop
  net: atlantic: remove aq_nic_deinit() when resume
  net: atlantic: remove deep parameter on suspend/resume functions
  sfc: fix kernel panic when creating VF
  seg6: bpf: fix skb checksum in bpf_push_seg6_encap()
  seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors
  seg6: fix skb checksum evaluation in SRH encapsulation/insertion
  sfc: fix use after free when disabling sriov
  net: sunhme: output link status with a single print.
  r8152: fix accessing unset transport header
  net: stmmac: fix leaks in probe
  net: ftgmac100: Hold reference returned by of_get_child_by_name()
  nexthop: Fix data-races around nexthop_compat_mode.
  ipv4: Fix data-races around sysctl_ip_dynaddr.
  tcp: Fix a data-race around sysctl_tcp_ecn_fallback.
  ...
2022-07-14 12:48:07 -07:00
Linus Torvalds
4adfa865bb Merge tag 'integrity-v5.19-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity fixes from Mimi Zohar:
 "Here are a number of fixes for recently found bugs.

  Only 'ima: fix violation measurement list record' was introduced in
  the current release. The rest address existing bugs"

* tag 'integrity-v5.19-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  ima: Fix potential memory leak in ima_init_crypto()
  ima: force signature verification when CONFIG_KEXEC_SIG is configured
  ima: Fix a potential integer overflow in ima_appraise_measurement
  ima: fix violation measurement list record
  Revert "evm: Fix memleak in init_desc"
2022-07-14 12:15:42 -07:00
Bart Van Assche
ed4512590b fs/nilfs2: Use the enum req_op and blk_opf_t types
Improve static type checking by using the enum req_op type for variables
that represent a request operation and the new blk_opf_t type for
variables that represent request flags. Combine the 'mode' and
'mode_flags' arguments of nilfs_btnode_submit_block into a single
argument 'opf'.

Reviewed-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-59-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:33 -06:00
Bart Van Assche
6669797b0d fs/jbd2: Fix the documentation of the jbd2_write_superblock() callers
Commit 2a222ca992 ("fs: have submit_bh users pass in op and flags
separately") renamed the jbd2_write_superblock() 'write_op' argument into
'write_flags'. Propagate this change to the jbd2_write_superblock()
callers. Additionally, change the type of 'write_flags' into blk_opf_t.

Cc: Mike Christie <michael.christie@oracle.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-57-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:32 -06:00
Bart Van Assche
7649c873c1 fs/f2fs: Use the enum req_op and blk_opf_t types
Improve static type checking by using the enum req_op type for variables
that represent a request operation and the new blk_opf_t type for
variables that represent request flags.

Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-53-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:32 -06:00
Bart Van Assche
1420c4a549 fs/buffer: Combine two submit_bh() and ll_rw_block() arguments
Both submit_bh() and ll_rw_block() accept a request operation type and
request flags as their first two arguments. Micro-optimize these two
functions by combining these first two arguments into a single argument.
This patch does not change the behavior of any of the modified code.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Song Liu <song@kernel.org> (for the md changes)
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-48-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:32 -06:00
Bart Van Assche
3ae7286943 fs/buffer: Use the new blk_opf_t type
Improve static type checking by using the new blk_opf_t type for block layer
request flags. Change WRITE into REQ_OP_WRITE. This patch does not change
any functionality since REQ_OP_WRITE == WRITE == 1.

Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-47-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:32 -06:00
Bart Van Assche
f8e6e4bd9f mm: Use the new blk_opf_t type
Improve static type checking by using the new blk_opf_t type for block
layer request flags.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Stefan Roesch <shr@fb.com>
Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-46-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:32 -06:00
Bart Van Assche
2599cac57a scsi/core: Use the new blk_opf_t type
Use the new blk_opf_t type for arguments and variables that represent
request flags. Use the !! operator in scsi_noretry_cmd() to convert the
blk_opf_t type into a boolean. This patch does not change any functionality.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.garry@huawei.com>
Cc: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-42-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:32 -06:00
Bart Van Assche
ea957547e8 scsi/core: Improve static type checking
Improve static type checking by using the new blk_opf_t type for the
combination of a request operation and its flags.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.garry@huawei.com>
Cc: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-40-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:32 -06:00
Bart Van Assche
581075e4f6 dm/core: Reduce the size of struct dm_io_request
Combine the bi_op and bi_op_flags into the bi_opf member. Use the new
blk_opf_t type to improve static type checking. This patch does not
change any functionality.

Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-22-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:31 -06:00
Bart Van Assche
919dbca867 blktrace: Use the new blk_opf_t type
Improve static type checking by using the new blk_opf_t type for a function
argument that represents a combination of a request operation and request
flags. Rename that argument from 'op' into 'opf' to make its role more
clear.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-12-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:30 -06:00
Bart Van Assche
16458cf3bd block: Use the new blk_opf_t type
Use the new blk_opf_t type for arguments and variables that represent
request flags or a bitwise combination of a request operation and
request flags. Rename the function arguments and also a structure member
that hold a request operation and flags from 'rw' into 'opf'.

This patch does not change any functionality.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-7-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:30 -06:00
Bart Van Assche
342a72a334 block: Introduce the type blk_opf_t
Introduce the type blk_opf_t for the request operation and flags (REQ_OP_*
and REQ_*). This type will be used to improve documentation of the block
layer code and also to allow sparse to verify whether request flags are used
correctly.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-6-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:30 -06:00
Bart Van Assche
2d9b02be73 block: Change the type of req_op() and bio_op() into enum req_op
Improve static type checking by changing the type of the value returned by
req_op() and bio_op() from unsigned int into enum req_op. Insert
'default: break;' in switch statements on the enum req_op type to prevent
that the compiler warns about these switch statements.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-5-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:30 -06:00
Bart Van Assche
86947df3a9 block: Change the type of the last .rw_page() argument
All .rw_page() callers pass an enum req_op value as last argument. Make
this explicit by changing the type of the last argument into enum req_op.
See also commit 3f289dcb4b ("block: make bdev_ops->rw_page() take a
REQ_OP instead of bool").

Cc: Tejun Heo <tj@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-4-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:30 -06:00
Bart Van Assche
77e7ffd7ad block: Use enum req_op where appropriate
Change the type of the arguments that are used to pass a REQ_OP_* value
from int or unsigned int into enum req_op to improve static type
checking.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-3-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:30 -06:00
Bart Van Assche
ff07a02e9e treewide: Rename enum req_opf into enum req_op
The type name enum req_opf is misleading since it suggests that values of
this type include both an operation type and flags. Since values of this
type represent an operation only, change the type name into enum req_op.

Convert the enum req_op documentation into kernel-doc format. Move a few
definitions such that the enum req_op documentation occurs just above
the enum req_op definition.

The name "req_opf" was introduced by commit ef295ecf09 ("block: better op
and flags encoding").

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-2-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:30 -06:00
Tariq Toukan
3d8c51b25a net/tls: Check for errors in tls_device_init
Add missing error checks in tls_device_init.

Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220714070754.1428-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14 10:12:39 -07:00
Christoph Hellwig
900d156bac block: remove bdevname
Replace the remaining calls of bdevname with snprintf using the %pg
format specifier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220713055317.1888500-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 10:27:56 -06:00
Arnd Bergmann
e0a5925055 Merge tag 'qcom-arm64-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/dt
Qualcomm ARM64 DTS updates for v5.20

This introduces initial support for Lenovo ThinkPad X13s, Qualcomm 8cx
Gen 3 Compute Reference Device, SA8295P Automotive Development Platform,
Xiaomi Mi 5s Plus, five new SC7180 Chrome OS boards, Inforce IFC6560, LG
G7 ThinQ and LG V35 ThinQ.

With IPQ8074 gaining GDSC support, this was expressed in the gcc node
and defined for the USB nodes. The SDHCI reset line was defined to get
the storage devices into a known state.

For MSM8996 interconnect providers, the second DSI interface, resets for
SDHCI are introduced. Support for the Xiaomi Mi 5s Plus is introduced
and the Dragonboard 820c gains definitions for its LEDs.

The MSM8998 platform changes consists of a various cleanup patches, the
FxTec Pro1 is split out from using the MTP dts and Sony Xperia devices
on the "Yoshino" platform gains ToF sensor.

On SC7180 five new Trogdor based boards are added and the description of
keyboard and detachables is improved.

On the SC7280-based Herobrine board DisplayPort is enabled, SPI flash
clock rate is changed, WiFi is enabled and the modem firmware path is
updated. The Villager boards gains touchscreen, and keyboard backlight.

This introduces initial support for the SC8280XP (aka 8cx Gen 3) and
related automotive platforms are introduced, with support for the
Qualcomm reference board, the Lenovo Thinkpad X13s and the SA8295P
Automotive Development Platform.

In addition to a wide range of smaller fixes on the SDM630 and SDM660
platforms, support for the secondary high speed USB controller is
introduced and the Sony Xperia "Nile" platform gains support for the RGB
status LED. Support for the Inforce IFC6560 board is introduced.

On SDM845 the bandwidth monitor for the CPU subsystem is introduced, to
scale LLCC clock rate based on profiling. CPU and cluster idle states
are switched to OSI hierarchical states. DB845c and SHIFT 6mq gains LED
support and new support for the LG G7 ThinQ and LG V35 ThinQ boards are
added.

DLL/DDR configuration for SDHCI nodes are defined for SM6125.

On SM8250 the GPU per-process page tables is enabled and for RB5 the
Light Pulse Generator-based LEDs are added.

The display clock controller is introduced for SM8350.

On SM8450 this introduces the camera clock controller and the UART
typically used for Bluetooth. The interconnect path for the crypto
engine is added to the SCM node, to ensure this is adequately clocked.

The assigned-clock-rate for the display processor is dropped from
several platforms, now that the driver derrives the min and max from the
clock.

In addition to this a wide range of fixes for stylistic issues and
issues discovered through Devicetree binding validation across many
platforms and boards are introduced.

* tag 'qcom-arm64-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (193 commits)
  arm64: dts: qcom: sc8280xp: fix DP PHY node unit addresses
  arm64: dts: qcom: sc8280xp: fix usb_0 HS PHY ref clock
  arm64: dts: qcom: sc7280: fix PCIe clock reference
  docs: arm: index.rst: add google/chromebook-boot-flow
  arm64: dts: qcom: msm8996: clean up PCIe PHY node
  arm64: dts: qcom: msm8996: use non-empty ranges for PCIe PHYs
  arm64: dts: qcom: sm8450: drop UFS PHY clock-cells
  arm64: dts: qcom: sm8250: drop UFS PHY clock-cells
  arm64: dts: qcom: sc8280xp: drop UFS PHY clock-cells
  arm64: dts: qcom: sm8450: drop USB PHY clock index
  arm64: dts: qcom: sm8350: drop USB PHY clock index
  arm64: dts: qcom: msm8998: drop USB PHY clock index
  arm64: dts: qcom: ipq8074: drop USB PHY clock index
  arm64: dts: qcom: ipq6018: drop USB PHY clock index
  arm64: dts: qcom: sm8250: add missing PCIe PHY clock-cells
  arm64: dts: qcom: sc7280: drop PCIe PHY clock index
  Revert "arm64: dts: qcom: Fix 'reg-names' for sdhci nodes"
  arm64: dts: qcom: sc7180-idp: add vdds supply to the DSI PHY
  arm64: dts: qcom: sc7280: use constants for gpucc clocks and power-domains
  arm64: dts: qcom: msm8996: add missing DSI clock assignments
  ...

Link: https://lore.kernel.org/r/20220713203939.1431054-1-bjorn.andersson@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-07-14 17:02:12 +02:00
Ming Lei
0edb3696c1 ublk_drv: support to complete io command via task_work_add
Use task_work_add if it is available, since task_work_add can bring
up better performance, especially batching signaling ->ubq_daemon can
be done.

It is observed that task_work_add() can boost iops by +4% on random
4k io test. Also except for completing io command, all other code
paths are same with completing io command via
io_uring_cmd_complete_in_task.

Meantime add one flag of UBLK_F_URING_CMD_COMP_IN_TASK for comparing
the mode easily.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220713140711.97356-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 07:15:48 -06:00
Ming Lei
71f28f3136 ublk_drv: add io_uring based userspace block driver
This is the driver part of userspace block driver(ublk driver), the other
part is userspace daemon part(ublksrv)[1].

The two parts communicate by io_uring's IORING_OP_URING_CMD with one
shared cmd buffer for storing io command, and the buffer is read only for
ublksrv, each io command is indexed by io request tag directly, and is
written by ublk driver.

For example, when one READ io request is submitted to ublk block driver,
ublk driver stores the io command into cmd buffer first, then completes
one IORING_OP_URING_CMD for notifying ublksrv, and the URING_CMD is issued
to ublk driver beforehand by ublksrv for getting notification of any new
io request, and each URING_CMD is associated with one io request by tag.

After ublksrv gets the io command, it translates and handles the ublk io
request, such as, for the ublk-loop target, ublksrv translates the request
into same request on another file or disk, like the kernel loop block
driver. In ublksrv's implementation, the io is still handled by io_uring,
and share same ring with IORING_OP_URING_CMD command. When the target io
request is done, the same IORING_OP_URING_CMD is issued to ublk driver for
both committing io request result and getting future notification of new
io request.

Another thing done by ublk driver is to copy data between kernel io
request and ublksrv's io buffer:

1) before ubsrv handles WRITE request, copy the request's data into
   ublksrv's userspace io buffer, so that ublksrv can handle the write
   request

2) after ubsrv handles READ request, copy ublksrv's userspace io buffer
   into this READ request, then ublk driver can complete the READ request

Zero copy may be switched if mm is ready to support it.

ublk driver doesn't handle any logic of the specific user space driver,
so it is small/simple enough.

[1] ublksrv

https://github.com/ming1/ubdsrv

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220713140711.97356-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 07:15:28 -06:00
Paolo Bonzini
1b870fa557 kvm: stats: tell userspace which values are boolean
Some of the statistics values exported by KVM are always only 0 or 1.
It can be useful to export this fact to userspace so that it can track
them specially (for example by polling the value every now and then to
compute a % of time spent in a specific state).

Therefore, add "boolean value" as a new "unit".  While it is not exactly
a unit, it walks and quacks like one.  In particular, using the type
would be wrong because boolean values could be instantaneous or peak
values (e.g. "is the rmap allocated?") or even two-bucket histograms
(e.g. "number of posted vs. non-posted interrupt injections").

Suggested-by: Amneesh Singh <natto@weirdnatto.in>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14 08:01:59 -04:00
Xiubo Li
fac47b43c7 netfs: do not unlock and put the folio twice
check_write_begin() will unlock and put the folio when return
non-zero.  So we should avoid unlocking and putting it twice in
netfs layer.

Change the way ->check_write_begin() works in the following two ways:

 (1) Pass it a pointer to the folio pointer, allowing it to unlock and put
     the folio prior to doing the stuff it wants to do, provided it clears
     the folio pointer.

 (2) Change the return values such that 0 with folio pointer set means
     continue, 0 with folio pointer cleared means re-get and all error
     codes indicating an error (no special treatment for -EAGAIN).

[ bagasdotme: use Sphinx code text syntax for *foliop pointer ]

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/56423
Link: https://lore.kernel.org/r/cf169f43-8ee7-8697-25da-0204d1b4343e@redhat.com
Co-developed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-07-14 10:10:12 +02:00
Linus Torvalds
d0b97f3891 Merge tag 'cgroup-for-5.19-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
 "Fix an old and subtle bug in the migration path.

  css_sets are used to track tasks and migrations are tasks moving from
  a group of css_sets to another group of css_sets. The migration path
  pins all source and destination css_sets in the prep stage.

  Unfortunately, it was overloading the same list_head entry to track
  sources and destinations, which got confused for migrations which are
  partially identity leading to use-after-frees.

  Fixed by using dedicated list_heads for tracking sources and
  destinations"

* tag 'cgroup-for-5.19-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Use separate src/dst nodes when preloading css_sets for migration
2022-07-13 11:47:01 -07:00
Coiby Xu
af16df54b8 ima: force signature verification when CONFIG_KEXEC_SIG is configured
Currently, an unsigned kernel could be kexec'ed when IMA arch specific
policy is configured unless lockdown is enabled. Enforce kernel
signature verification check in the kexec_file_load syscall when IMA
arch specific policy is configured.

Fixes: 99d5cadfde ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE")
Reported-and-suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-13 10:13:41 -04:00
David S. Miller
67de8acdd3 Merge tag 'wireless-2022-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:

====================
A small set of fixes for
 * queue selection in mesh/ocb
 * queue handling on interface stop
 * hwsim virtio device vs. some other virtio changes
 * dt-bindings email addresses
 * color collision memory allocation
 * a const variable in rtw88
 * shared SKB transmit in the ethernet format path
 * P2P client port authorization
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-13 14:27:38 +01:00
Kuniyuki Iwashima
1dace01492 raw: Fix a data-race around sysctl_raw_l3mdev_accept.
While reading sysctl_raw_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 6897445fb1 ("net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-13 12:56:49 +01:00
John Keeping
401e4963bf sched/core: Always flush pending blk_plug
With CONFIG_PREEMPT_RT, it is possible to hit a deadlock between two
normal priority tasks (SCHED_OTHER, nice level zero):

	INFO: task kworker/u8:0:8 blocked for more than 491 seconds.
	      Not tainted 5.15.49-rt46 #1
	"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	task:kworker/u8:0    state:D stack:    0 pid:    8 ppid:     2 flags:0x00000000
	Workqueue: writeback wb_workfn (flush-7:0)
	[<c08a3a10>] (__schedule) from [<c08a3d84>] (schedule+0xdc/0x134)
	[<c08a3d84>] (schedule) from [<c08a65a0>] (rt_mutex_slowlock_block.constprop.0+0xb8/0x174)
	[<c08a65a0>] (rt_mutex_slowlock_block.constprop.0) from [<c08a6708>]
	+(rt_mutex_slowlock.constprop.0+0xac/0x174)
	[<c08a6708>] (rt_mutex_slowlock.constprop.0) from [<c0374d60>] (fat_write_inode+0x34/0x54)
	[<c0374d60>] (fat_write_inode) from [<c0297304>] (__writeback_single_inode+0x354/0x3ec)
	[<c0297304>] (__writeback_single_inode) from [<c0297998>] (writeback_sb_inodes+0x250/0x45c)
	[<c0297998>] (writeback_sb_inodes) from [<c0297c20>] (__writeback_inodes_wb+0x7c/0xb8)
	[<c0297c20>] (__writeback_inodes_wb) from [<c0297f24>] (wb_writeback+0x2c8/0x2e4)
	[<c0297f24>] (wb_writeback) from [<c0298c40>] (wb_workfn+0x1a4/0x3e4)
	[<c0298c40>] (wb_workfn) from [<c0138ab8>] (process_one_work+0x1fc/0x32c)
	[<c0138ab8>] (process_one_work) from [<c0139120>] (worker_thread+0x22c/0x2d8)
	[<c0139120>] (worker_thread) from [<c013e6e0>] (kthread+0x16c/0x178)
	[<c013e6e0>] (kthread) from [<c01000fc>] (ret_from_fork+0x14/0x38)
	Exception stack(0xc10e3fb0 to 0xc10e3ff8)
	3fa0:                                     00000000 00000000 00000000 00000000
	3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
	3fe0: 00000000 00000000 00000000 00000000 00000013 00000000

	INFO: task tar:2083 blocked for more than 491 seconds.
	      Not tainted 5.15.49-rt46 #1
	"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	task:tar             state:D stack:    0 pid: 2083 ppid:  2082 flags:0x00000000
	[<c08a3a10>] (__schedule) from [<c08a3d84>] (schedule+0xdc/0x134)
	[<c08a3d84>] (schedule) from [<c08a41b0>] (io_schedule+0x14/0x24)
	[<c08a41b0>] (io_schedule) from [<c08a455c>] (bit_wait_io+0xc/0x30)
	[<c08a455c>] (bit_wait_io) from [<c08a441c>] (__wait_on_bit_lock+0x54/0xa8)
	[<c08a441c>] (__wait_on_bit_lock) from [<c08a44f4>] (out_of_line_wait_on_bit_lock+0x84/0xb0)
	[<c08a44f4>] (out_of_line_wait_on_bit_lock) from [<c0371fb0>] (fat_mirror_bhs+0xa0/0x144)
	[<c0371fb0>] (fat_mirror_bhs) from [<c0372a68>] (fat_alloc_clusters+0x138/0x2a4)
	[<c0372a68>] (fat_alloc_clusters) from [<c0370b14>] (fat_alloc_new_dir+0x34/0x250)
	[<c0370b14>] (fat_alloc_new_dir) from [<c03787c0>] (vfat_mkdir+0x58/0x148)
	[<c03787c0>] (vfat_mkdir) from [<c0277b60>] (vfs_mkdir+0x68/0x98)
	[<c0277b60>] (vfs_mkdir) from [<c027b484>] (do_mkdirat+0xb0/0xec)
	[<c027b484>] (do_mkdirat) from [<c0100060>] (ret_fast_syscall+0x0/0x1c)
	Exception stack(0xc2e1bfa8 to 0xc2e1bff0)
	bfa0:                   01ee42f0 01ee4208 01ee42f0 000041ed 00000000 00004000
	bfc0: 01ee42f0 01ee4208 00000000 00000027 01ee4302 00000004 000dcb00 01ee4190
	bfe0: 000dc368 bed11924 0006d4b0 b6ebddfc

Here the kworker is waiting on msdos_sb_info::s_lock which is held by
tar which is in turn waiting for a buffer which is locked waiting to be
flushed, but this operation is plugged in the kworker.

The lock is a normal struct mutex, so tsk_is_pi_blocked() will always
return false on !RT and thus the behaviour changes for RT.

It seems that the intent here is to skip blk_flush_plug() in the case
where a non-preemptible lock (such as a spinlock) has been converted to
a rtmutex on RT, which is the case covered by the SM_RTLOCK_WAIT
schedule flag.  But sched_submit_work() is only called from schedule()
which is never called in this scenario, so the check can simply be
deleted.

Looking at the history of the -rt patchset, in fact this change was
present from v5.9.1-rt20 until being dropped in v5.13-rt1 as it was part
of a larger patch [1] most of which was replaced by commit b4bfa3fcfe
("sched/core: Rework the __schedule() preempt argument").

As described in [1]:

   The schedule process must distinguish between blocking on a regular
   sleeping lock (rwsem and mutex) and a RT-only sleeping lock (spinlock
   and rwlock):
   - rwsem and mutex must flush block requests (blk_schedule_flush_plug())
     even if blocked on a lock. This can not deadlock because this also
     happens for non-RT.
     There should be a warning if the scheduling point is within a RCU read
     section.

   - spinlock and rwlock must not flush block requests. This will deadlock
     if the callback attempts to acquire a lock which is already acquired.
     Similarly to being preempted, there should be no warning if the
     scheduling point is within a RCU read section.

and with the tsk_is_pi_blocked() in the scheduler path, we hit the first
issue.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0022-locking-rtmutex-Use-custom-scheduling-function-for-s.patch?h=linux-5.10.y-rt-patches

Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20220708162702.1758865-1-john@metanate.com
2022-07-13 11:29:17 +02:00
Linus Torvalds
b047602d57 Merge tag 'trace-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
 "Fixes and minor clean ups for tracing:

   - Fix memory leak by reverting what was thought to be a double free.

     A static tool had gave a false positive that a double free was
     possible in the error path, but it was actually a different
     location that confused the static analyzer (and those of us that
     reviewed it).

   - Move use of static buffers by ftrace_dump() to a location that can
     be used by kgdb's ftdump(), as it needs it for the same reasons.

   - Clarify in the Kconfig description that function tracing has
     negligible impact on x86, but may have a bit bigger impact on other
     architectures.

   - Remove unnecessary extra semicolon in trace event.

   - Make a local variable static that is used in the fprobes sample

   - Use KSYM_NAME_LEN for length of function in kprobe sample and get
     rid of unneeded macro for the same purpose"

* tag 'trace-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  samples: Use KSYM_NAME_LEN for kprobes
  fprobe/samples: Make sample_probe static
  blk-iocost: tracing: atomic64_read(&ioc->vtime_rate) is assigned an extra semicolon
  ftrace: Be more specific about arch impact when function tracer is enabled
  tracing: Fix sleeping while atomic in kdb ftdump
  tracing/histograms: Fix memory leak problem
2022-07-12 16:17:40 -07:00
Arnd Bergmann
9bc697091a Merge tag 'arm-soc/for-5.20/drivers' of https://github.com/Broadcom/stblinux into arm/drivers
This pull request contains Broadcom SoC drivers updatse for 5.20, please
pull the following:

- Julia fixes a typo in the Broadcom STB legacy power management code

- Liang fixes a device_node reference count leak in the Broadcom STB BIU
  driver code error path(s)

- Nicolas and Stefan provide updates to the BCM2835 power management
  driver allowing its use on BCM2711 (Raspberry Pi 4) and to enable the
  use of the V3D GPU driver on such platforms. This is a merge of an
  immutable branch from Lee Jones' MFD tree

- William removes the use of CONFIG_ARCH_BCM_63XX which is removed and
  replaces the dependencies with CONFIG_ARCH_BCMBCA which is how all of
  the DSL/PON SoCs from Broadcom are now supported in the upstream
  kernel.

* tag 'arm-soc/for-5.20/drivers' of https://github.com/Broadcom/stblinux:
  tty: serial: bcm63xx: bcmbca: Replace ARCH_BCM_63XX with ARCH_BCMBCA
  spi: bcm63xx-hsspi: bcmbca: Replace ARCH_BCM_63XX with ARCH_BCMBCA
  clk: bcm: bcmbca: Replace ARCH_BCM_63XX with ARCH_BCMBCA
  hwrng: bcm2835: bcmbca: Replace ARCH_BCM_63XX with ARCH_BCMBCA
  phy: brcm-sata: bcmbca: Replace ARCH_BCM_63XX with ARCH_BCMBCA
  i2c: brcmstb: bcmbca: Replace ARCH_BCM_63XX with ARCH_BCMBCA
  ata: ahci_brcm: bcmbca: Replace ARCH_BCM_63XX with ARCH_BCMBCA
  soc: bcm: bcm2835-power: Bypass power_on/off() calls
  soc: bcm: bcm2835-power: Add support for BCM2711's RPiVid ASB
  soc: bcm: bcm2835-power: Resolve ASB register macros
  soc: bcm: bcm2835-power: Refactor ASB control
  mfd: bcm2835-pm: Add support for BCM2711
  mfd: bcm2835-pm: Use 'reg-names' to get resources
  soc: bcm: brcmstb: biuctrl: Add missing of_node_put()
  soc: bcm: brcmstb: pm: pm-arm: fix typo in comment

Link: https://lore.kernel.org/r/20220711164451.3542127-6-f.fainelli@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-07-12 22:59:09 +02:00
Arnd Bergmann
f10c00ae86 Merge tag 'tegra-for-5.20-memory' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/drivers
memory: tegra: Changes for v5.20-rc1

Add memory client definitions for the Multi-Gigabit Ethernet (MGBE)
controllers found on Tegra234.

* tag 'tegra-for-5.20-memory' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  memory: tegra: Add MGBE memory clients for Tegra234
  dt-bindings: arm: tegra: Add NVIDIA Tegra234 CBB 2.0 binding
  dt-bindings: arm: tegra: Add NVIDIA Tegra194 AXI2APB binding
  dt-bindings: arm: tegra: Add NVIDIA Tegra194 CBB 1.0 binding
  dt-bindings: memory: Add Tegra234 MGBE memory clients
  dt-bindings: Add Tegra234 MGBE clocks and resets
  dt-bindings: power: Add Tegra234 MGBE power domains
  dt-bindings: Add headers for Tegra234 GPCDMA

Link: https://lore.kernel.org/r/20220708185608.676474-5-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-07-12 22:53:08 +02:00
Li kunyu
0bb7e14c8e blk-iocost: tracing: atomic64_read(&ioc->vtime_rate) is assigned an extra semicolon
Remove extra semicolon.

Link: https://lkml.kernel.org/r/20220629030013.10362-1-kunyu@nfschina.com

Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Li kunyu <kunyu@nfschina.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-12 16:36:37 -04:00
Arnd Bergmann
ff6c226953 Merge tag 'v5.19-next-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux into arm/drivers
pmic wrapper:
- code style improvements

devapc:
- add support for MT8186

Smart Voltage Scaling (SVS)
- add support for MT8183 and MT8192

MMSYS:
- Add more display paths for MT8365

Mutex:
- Add common interface for MOD and SOF table
- Add support for MDP on MT8183
- Move binding to soc folder
- Add support to use CMDQ to enable the mutex, needed by MDP3

Power domains:
- Add support for MT6795

* tag 'v5.19-next-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux: (29 commits)
  soc: mediatek: mutex: Simplify with devm_platform_get_and_ioremap_resource()
  soc: mediatek: pm-domains: Add support for Helio X10 MT6795
  dt-bindings: power: Add MediaTek Helio X10 MT6795 power domains
  soc: mediatek: SVS: Use DEFINE_SIMPLE_DEV_PM_OPS for svs_pm_ops
  soc: mediatek: mtk-pm-domains: Allow probing vreg supply on two MFGs
  soc: mediatek: fix missing clk_disable_unprepare() on err in svs_resume()
  soc: mediatek: mutex: Use DDP_COMPONENT_DITHER0 mod index for MT8365
  soc: mediatek: mutex: add functions that operate registers by CMDQ
  dt-bindings: soc: mediatek: add gce-client-reg for MUTEX
  dt-bindings: soc: mediatek: move out common module from display folder
  soc: mediatek: mutex: add 8183 MUTEX MOD settings for MDP
  soc: mediatek: mutex: add common interface for modules setting
  soc: mediatek: pm-domains: Add support always on flag
  soc: mediatek: mt8365-mmsys: add DPI/HDMI display path
  soc: mediatek: mutex: add MT8365 support
  soc: mediatek: SVS: add mt8192 SVS GPU driver
  dt-bindings: soc: mediatek: add mt8192 svs dt-bindings
  soc: mediatek: SVS: add debug commands
  soc: mediatek: SVS: add monitor mode
  soc: mediatek: SVS: introduce MTK SVS engine
  ...

Link: https://lore.kernel.org/r/b733bd82-6d99-23ef-0541-98e98eb8d3bc@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-07-12 15:01:31 +02:00