This reverts commit 71f660a1a7.
Eugeniu Rosca writes:
On Thu, Jul 09, 2020 at 09:00:23AM +0200, Eugeniu Rosca wrote:
>After integrating v4.14.186 commit 5410d158ca2a50 ("usb/ehci-platform:
>Set PM runtime as active on resume") into downstream v4.14.x, we started
>to consistently experience below panic [1] on every second s2ram of
>R-Car H3 Salvator-X Renesas reference board.
>
>After some investigations, we concluded the following:
> - the issue does not exist in vanilla v5.8-rc4+
> - [bisecting shows that] the panic on v4.14.186 is caused by the lack
> of v5.6-rc1 commit 987351e1ea ("phy: core: Add consumer device
> link support"). Getting evidence for that is easy. Reverting
> 987351e1ea in vanilla leads to a similar backtrace [2].
>
>Questions:
> - Backporting 987351e1ea ("phy: core: Add consumer device
> link support") to v4.14.187 looks challenging enough, so probably not
> worth it. Anybody to contradict this?
> - Assuming no plans to backport the missing mainline commit to v4.14.x,
> should the following three v4.14.186 commits be reverted on v4.14.x?
> * baef809ea497a4 ("usb/ohci-platform: Fix a warning when hibernating")
> * 9f33eff4958885 ("usb/xhci-plat: Set PM runtime as active on resume")
> * 5410d158ca2a50 ("usb/ehci-platform: Set PM runtime as active on resume")
Signed-off-by: Sasha Levin <sashal@kernel.org>
This reverts commit 661f5686a1.
Eugeniu Rosca writes:
On Thu, Jul 09, 2020 at 09:00:23AM +0200, Eugeniu Rosca wrote:
>After integrating v4.14.186 commit 5410d158ca2a50 ("usb/ehci-platform:
>Set PM runtime as active on resume") into downstream v4.14.x, we started
>to consistently experience below panic [1] on every second s2ram of
>R-Car H3 Salvator-X Renesas reference board.
>
>After some investigations, we concluded the following:
> - the issue does not exist in vanilla v5.8-rc4+
> - [bisecting shows that] the panic on v4.14.186 is caused by the lack
> of v5.6-rc1 commit 987351e1ea ("phy: core: Add consumer device
> link support"). Getting evidence for that is easy. Reverting
> 987351e1ea in vanilla leads to a similar backtrace [2].
>
>Questions:
> - Backporting 987351e1ea ("phy: core: Add consumer device
> link support") to v4.14.187 looks challenging enough, so probably not
> worth it. Anybody to contradict this?
> - Assuming no plans to backport the missing mainline commit to v4.14.x,
> should the following three v4.14.186 commits be reverted on v4.14.x?
> * baef809ea497a4 ("usb/ohci-platform: Fix a warning when hibernating")
> * 9f33eff4958885 ("usb/xhci-plat: Set PM runtime as active on resume")
> * 5410d158ca2a50 ("usb/ehci-platform: Set PM runtime as active on resume"
Signed-off-by: Sasha Levin <sashal@kernel.org>
This reverts commit c83258a757.
Eugeniu Rosca writes:
On Thu, Jul 09, 2020 at 09:00:23AM +0200, Eugeniu Rosca wrote:
>After integrating v4.14.186 commit 5410d158ca2a50 ("usb/ehci-platform:
>Set PM runtime as active on resume") into downstream v4.14.x, we started
>to consistently experience below panic [1] on every second s2ram of
>R-Car H3 Salvator-X Renesas reference board.
>
>After some investigations, we concluded the following:
> - the issue does not exist in vanilla v5.8-rc4+
> - [bisecting shows that] the panic on v4.14.186 is caused by the lack
> of v5.6-rc1 commit 987351e1ea ("phy: core: Add consumer device
> link support"). Getting evidence for that is easy. Reverting
> 987351e1ea in vanilla leads to a similar backtrace [2].
>
>Questions:
> - Backporting 987351e1ea ("phy: core: Add consumer device
> link support") to v4.14.187 looks challenging enough, so probably not
> worth it. Anybody to contradict this?
> - Assuming no plans to backport the missing mainline commit to v4.14.x,
> should the following three v4.14.186 commits be reverted on v4.14.x?
> * baef809ea497a4 ("usb/ohci-platform: Fix a warning when hibernating")
> * 9f33eff4958885 ("usb/xhci-plat: Set PM runtime as active on resume")
> * 5410d158ca2a50 ("usb/ehci-platform: Set PM runtime as active on resume")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5a8d7f126c ]
Commit 209c65b61d ("drivers/of/of_mdio.c:fix of_mdiobus_register()")
introduced a break of the loop on the premise that a successful
registration should exit the loop. The premise is correct but not to
code, because rc && rc != -ENODEV is just a special error condition,
that means we would exit the loop even with rc == -ENODEV which is
absolutely not correct since this is the error code to indicate to the
MDIO bus layer that scanning should continue.
Fix this by explicitly checking for rc = 0 as the only valid condition
to break out of the loop.
Fixes: 209c65b61d ("drivers/of/of_mdio.c:fix of_mdiobus_register()")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aa54c1c9d9 ]
On ColdFire mcf54418, using DSPI_DMA_MODE mode, spi transfers
at first boot stage are not succeding:
m25p80 spi0.1: unrecognized JEDEC id bytes: 00, 00, 00
The reason is the SPI_SR initial value set by the driver, that
is not clearing (not setting to 1) the RF_DF flag. After a tour
on the dspi hw modules that use this driver(Vybrid, ColdFire and
ls1021a) a better init value for SR register has been set.
Signed-off-by: Angelo Dureghello <angelo@sysam.it>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 3f9c6d3879 upstream.
One of a class of bugs pointed out by Lars in a recent review.
iio_push_to_buffers_with_timestamp assumes the buffer used is aligned
to the size of the timestamp (8 bytes). This is not guaranteed in
this driver which uses a 32 byte array of smaller elements on the stack.
As Lars also noted this anti pattern can involve a leak of data to
userspace and that indeed can happen here. We close both issues by
moving to a suitable structure in the iio_priv() data with alignment
explicitly requested. This data is allocated with kzalloc so no
data can leak appart from previous readings.
Fixes: eec96d1e2d ("iio: health: Add driver for the TI AFE4403 heart monitor")
Reported-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Andrew F. Davis <afd@ti.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8db4afe163 upstream.
One of a class of bugs pointed out by Lars in a recent review.
iio_push_to_buffers_with_timestamp assumes the buffer used is aligned
to the size of the timestamp (8 bytes). This is not guaranteed in
this driver which uses an array of smaller elements on the stack.
Here there is no data leak possibility so use an explicit structure
on the stack to ensure alignment and nice readable fashion.
The forced alignment of ts isn't strictly necessary in this driver
as the padding will be correct anyway (there isn't any). However
it is probably less fragile to have it there and it acts as
documentation of the requirement.
Fixes: 713bbb4efb ("iio: pressure: ms5611: Add triggered buffer support")
Reported-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Tomasz Duszynski <tomasz.duszynski@octakon.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c49056ad9 upstream.
One of a class of bugs pointed out by Lars in a recent review.
iio_push_to_buffers_with_timestamp assumes the buffer used is aligned
to the size of the timestamp (8 bytes). This is not guaranteed in
this driver which uses an array of smaller elements on the stack.
As Lars also noted this anti pattern can involve a leak of data to
userspace and that indeed can happen here. We close both issues by
moving to a suitable structure in the iio_priv() data.
This data is allocated with kzalloc so no data can leak
apart from previous readings.
Explicit alignment of ts needed to ensure consistent padding
on all architectures (particularly x86_32 with it's 4 byte alignment
of s64)
Fixes: e4a70e3e7d ("iio: humidity: add support to hts221 rh/temp combo device")
Reported-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d7369ae1f4 upstream.
The function iio_device_register() was called in mma8452_probe().
But the function iio_device_unregister() was not called after
a call of the function mma8452_set_freefall_mode() failed.
Thus add the missed function call for one error case.
Fixes: 1a965d405f ("drivers:iio:accel:mma8452: added cleanup provision in case of failure.")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0187294d22 upstream.
When devm_regmap_init_i2c() returns an error code, a pairing
runtime PM usage counter decrement is needed to keep the
counter balanced. For error paths after ak8974_set_power(),
ak8974_detect() and ak8974_reset(), things are the same.
However, When iio_triggered_buffer_setup() returns an error
code, there will be two PM usgae counter decrements.
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Fixes: 7c94a8b2ee ("iio: magn: add a driver for AK8974")
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea5e7a7bb6 upstream.
One of a class of bugs pointed out by Lars in a recent review.
iio_push_to_buffers_with_timestamp assumes the buffer used is aligned
to the size of the timestamp (8 bytes). This is not guaranteed in
this driver which uses an array of smaller elements on the stack.
As Lars also noted this anti pattern can involve a leak of data to
userspace and that indeed can happen here. We close both issues by
moving to a suitable structure in the iio_priv() data.
This data is allocated with kzalloc so no data can leak apart
from previous readings.
Fixes: 16bf793f86 ("iio: humidity: hdc100x: add triggered buffer support for HDC100X")
Reported-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Matt Ranostay <matt.ranostay@konsulko.com>
Cc: Alison Schofield <amsfield22@gmail.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 838e00b13b upstream.
One of a class of bugs pointed out by Lars in a recent review.
iio_push_to_buffers_with_timestamp assumes the buffer used is aligned
to the size of the timestamp (8 bytes). This is not guaranteed in
this driver which uses an array of smaller elements on the stack.
As Lars also noted this anti pattern can involve a leak of data to
userspace and that indeed can happen here. We close both issues by
moving to a suitable structure in the iio_priv() data.
This data is allocated with kzalloc so no data can leak appart from
previous readings.
Fixes: 7c94a8b2ee ("iio: magn: add a driver for AK8974")
Reported-by: Lars-Peter Clausen <lars@metafoo.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5679b28142 ]
Commit f7b93d4294 ("arm64/alternatives: use subsections for replacement
sequences") moved the alternatives replacement sequences into subsections,
in order to keep the as close as possible to the code that they replace.
Unfortunately, this broke the logic in branch_insn_requires_update,
which assumed that any branch into kernel executable code was a branch
that required updating, which is no longer the case now that the code
sequences that are patched in are in the same section as the patch site
itself.
So the only way to discriminate branches that require updating and ones
that don't is to check whether the branch targets the replacement sequence
itself, and so we can drop the call to kernel_text_address() entirely.
Fixes: f7b93d4294 ("arm64/alternatives: use subsections for replacement sequences")
Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20200709125953.30918-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5f90786b31 ]
The driver can't be loaded automatically because it misses
module alias to be provided. Add corresponding MODULE_DEVICE_TABLE()
call to the driver.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b780cc615b ]
Before this patch, only read-write mounts would grab the freeze
glock in read-only mode, as part of gfs2_make_fs_rw. So the freeze
glock was never initialized. That meant requests to freeze, which
request the glock in EX, were granted without any state transition.
That meant you could mount a gfs2 file system, which is currently
frozen on a different cluster node, in read-only mode.
This patch makes read-only mounts lock the freeze glock in SH mode,
which will block for file systems that are frozen on another node.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ccf6fb858e ]
Found by smatch:
drivers/char/tpm/tpm_tis_core.c:1088 tpm_tis_core_init() warn:
variable dereferenced before check 'chip->ops' (see line 979)
'chip->ops' is assigned in the beginning of function
in tpmm_chip_alloc->tpm_chip_alloc
and is used before first possible goto to error path.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f7b93d4294 ]
When building very large kernels, the logic that emits replacement
sequences for alternatives fails when relative branches are present
in the code that is emitted into the .altinstr_replacement section
and patched in at the original site and fixed up. The reason is that
the linker will insert veneers if relative branches go out of range,
and due to the relative distance of the .altinstr_replacement from
the .text section where its branch targets usually live, veneers
may be emitted at the end of the .altinstr_replacement section, with
the relative branches in the sequence pointed at the veneers instead
of the actual target.
The alternatives patching logic will attempt to fix up the branch to
point to its original target, which will be the veneer in this case,
but given that the patch site is likely to be far away as well, it
will be out of range and so patching will fail. There are other cases
where these veneers are problematic, e.g., when the target of the
branch is in .text while the patch site is in .init.text, in which
case putting the replacement sequence inside .text may not help either.
So let's use subsections to emit the replacement code as closely as
possible to the patch site, to ensure that veneers are only likely to
be emitted if they are required at the patch site as well, in which
case they will be in range for the replacement sequence both before
and after it is transported to the patch site.
This will prevent alternative sequences in non-init code from being
released from memory after boot, but this is tolerable given that the
entire section is only 512 KB on an allyesconfig build (which weighs in
at 500+ MB for the entire Image). Also, note that modules today carry
the replacement sequences in non-init sections as well, and any of
those that target init code will be emitted into init sections after
this change.
This fixes an early crash when booting an allyesconfig kernel on a
system where any of the alternatives sequences containing relative
branches are activated at boot (e.g., ARM64_HAS_PAN on TX2)
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Dave P Martin <dave.martin@arm.com>
Link: https://lore.kernel.org/r/20200630081921.13443-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c43e55796d ]
After pulling 5.7.0 (linux-next merge), mcf5441x mmu boot was
hanging silently.
memblock_add() seems not appropriate, since using MAX_NUMNODES
as node id, while memblock_add_node() sets up memory for node id 0.
Signed-off-by: Angelo Dureghello <angelo.dureghello@timesys.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d63bd8c81d ]
The m68k nommu setup code didn't register the beginning of the physical
memory with memblock because it was anyway occupied by the kernel. However,
commit fa3354e4ea ("mm: free_area_init: use maximal zone PFNs rather than
zone sizes") changed zones initialization to use memblock.memory to detect
the zone extents and this caused inconsistency between zone PFNs and the
actual PFNs:
BUG: Bad page state in process swapper pfn:20165
page:41fe0ca0 refcount:0 mapcount:1 mapping:00000000 index:0x0 flags: 0x0()
raw: 00000000 00000100 00000122 00000000 00000000 00000000 00000000 00000000
page dumped because: nonzero mapcount
CPU: 0 PID: 1 Comm: swapper Not tainted 5.8.0-rc1-00001-g3a38f8a60c65-dirty #1
Stack from 404c9ebc:
404c9ebc 4029ab28 4029ab28 40088470 41fe0ca0 40299e21 40299df1 404ba2a4
00020165 00000000 41fd2c10 402c7ba0 41fd2c04 40088504 41fe0ca0 40299e21
00000000 40088a12 41fe0ca0 41fe0ca4 0000020a 00000000 00000001 402ca000
00000000 41fe0ca0 41fd2c10 41fd2c10 00000000 00000000 402b2388 00000001
400a0934 40091056 404c9f44 404c9f44 40088db4 402c7ba0 00000001 41fd2c04
41fe0ca0 41fd2000 41fe0ca0 40089e02 4026ecf4 40089e4e 41fe0ca0 ffffffff
Call Trace:
[<40088470>] 0x40088470
[<40088504>] 0x40088504
[<40088a12>] 0x40088a12
[<402ca000>] 0x402ca000
[<400a0934>] 0x400a0934
Adjust the memory registration with memblock to include the beginning of
the physical memory and make sure that the area occupied by the kernel is
marked as reserved.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d4f5a095da ]
in mic_pre_enable, pm_runtime_get_sync is called which
increments the counter even in case of failure, leading to incorrect
ref count. In case of failure, decrement the ref count before returning.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 177d381963 ]
In function msm_submitqueue_create, the queue is a local
variable, in return -EINVAL branch, queue didn`t add to ctx`s
list yet, and also didn`t kfree, this maybe bring in potential
memleak.
Signed-off-by: Bernard Zhao <bernard@vivo.com>
[trivial commit msg fixup]
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 469aceddfa ]
Toshiaki pointed out that we now have two very similar functions to extract
the L3 protocol number in the presence of VLAN tags. And Daniel pointed out
that the unbounded parsing loop makes it possible for maliciously crafted
packets to loop through potentially hundreds of tags.
Fix both of these issues by consolidating the two parsing functions and
limiting the VLAN tag parsing to a max depth of 8 tags. As part of this,
switch over __vlan_get_protocol() to use skb_header_pointer() instead of
pskb_may_pull(), to avoid the possible side effects of the latter and keep
the skb pointer 'const' through all the parsing functions.
v2:
- Use limit of 8 tags instead of 32 (matching XMIT_RECURSION_LIMIT)
Reported-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes: d7bf2ebebc ("sched: consistently handle layer3 header accesses in the presence of VLANs")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d7bf2ebebc ]
There are a couple of places in net/sched/ that check skb->protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb->protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb->protocol to use a helper that will always see the VLAN ethertype.
However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).
To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.
To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.
v3:
- Remove empty lines
- Move vlan variable definitions inside loop in skb_protocol()
- Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
bpf_skb_ecn_set_ce()
v2:
- Use eth_type_vlan() helper in skb_protocol()
- Also fix code that reads skb->protocol directly
- Change a couple of 'if/else if' statements to switch constructs to avoid
calling the helper twice
Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
Fixes: d8b9605d26 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 14b032b8f8 ]
In order for no_refcnt and is_data to be the lowest order two
bits in the 'val' we have to pad out the bitfield of the u8.
Fixes: ad0f75e5f5 ("cgroup: fix cgroup_sk_alloc() for sk_clone_lock()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ad0f75e5f5 ]
When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.
sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd->val is always zero. So it's safe to factor out the code
to make it more readable.
The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd->val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.
This bug seems to be introduced since the beginning, commit
d979a39d72 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.
Fixes: bd1060a1d6 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Cameron Berkenpas <cam@neo-zeon.de>
Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reported-by: Daniël Sonck <dsonck92@gmail.com>
Reported-by: Zhang Qiang <qiang.zhang@windriver.com>
Tested-by: Cameron Berkenpas <cam@neo-zeon.de>
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1ca0fafd73 ]
This essentially reverts commit 7212303268 ("tcp: md5: reject TCP_MD5SIG
or TCP_MD5SIG_EXT on established sockets")
Mathieu reported that many vendors BGP implementations can
actually switch TCP MD5 on established flows.
Quoting Mathieu :
Here is a list of a few network vendors along with their behavior
with respect to TCP MD5:
- Cisco: Allows for password to be changed, but within the hold-down
timer (~180 seconds).
- Juniper: When password is initially set on active connection it will
reset, but after that any subsequent password changes no network
resets.
- Nokia: No notes on if they flap the tcp connection or not.
- Ericsson/RedBack: Allows for 2 password (old/new) to co-exist until
both sides are ok with new passwords.
- Meta-Switch: Expects the password to be set before a connection is
attempted, but no further info on whether they reset the TCP
connection on a change.
- Avaya: Disable the neighbor, then set password, then re-enable.
- Zebos: Would normally allow the change when socket connected.
We can revert my prior change because commit 9424e2e7ad ("tcp: md5: fix potential
overestimation of TCP option space") removed the leak of 4 kernel bytes to
the wire that was the main reason for my patch.
While doing my investigations, I found a bug when a MD5 key is changed, leading
to these commits that stable teams want to consider before backporting this revert :
Commit 6a2febec33 ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()")
Commit e6ced831ef ("tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers")
Fixes: 7212303268 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets"
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e6ced831ef ]
My prior fix went a bit too far, according to Herbert and Mathieu.
Since we accept that concurrent TCP MD5 lookups might see inconsistent
keys, we can use READ_ONCE()/WRITE_ONCE() instead of smp_rmb()/smp_wmb()
Clearing all key->key[] is needed to avoid possible KMSAN reports,
if key->keylen is increased. Since tcp_md5_do_add() is not fast path,
using __GFP_ZERO to clear all struct tcp_md5sig_key is simpler.
data_race() was added in linux-5.8 and will prevent KCSAN reports,
this can safely be removed in stable backports, if data_race() is
not yet backported.
v2: use data_race() both in tcp_md5_hash_key() and tcp_md5_do_add()
Fixes: 6a2febec33 ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Marco Elver <elver@google.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e114e1e8ac ]
Whenever cookie_init_timestamp() has been used to encode
ECN,SACK,WSCALE options, we can not remove the TS option in the SYNACK.
Otherwise, tcp_synack_options() will still advertize options like WSCALE
that we can not deduce later when receiving the packet from the client
to complete 3WHS.
Note that modern linux TCP stacks wont use MD5+TS+SACK in a SYN packet,
but we can not know for sure that all TCP stacks have the same logic.
Before the fix a tcpdump would exhibit this wrong exchange :
10:12:15.464591 IP C > S: Flags [S], seq 4202415601, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 456965269 ecr 0,nop,wscale 8], length 0
10:12:15.464602 IP S > C: Flags [S.], seq 253516766, ack 4202415602, win 65535, options [nop,nop,md5 valid,mss 1400,nop,nop,sackOK,nop,wscale 8], length 0
10:12:15.464611 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid], length 0
10:12:15.464678 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid], length 12
10:12:15.464685 IP S > C: Flags [.], ack 13, win 65535, options [nop,nop,md5 valid], length 0
After this patch the exchange looks saner :
11:59:59.882990 IP C > S: Flags [S], seq 517075944, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508483 ecr 0,nop,wscale 8], length 0
11:59:59.883002 IP S > C: Flags [S.], seq 1902939253, ack 517075945, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508479 ecr 1751508483,nop,wscale 8], length 0
11:59:59.883012 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 0
11:59:59.883114 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 12
11:59:59.883122 IP S > C: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508483], length 0
11:59:59.883152 IP S > C: Flags [P.], seq 1:13, ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508483], length 12
11:59:59.883170 IP C > S: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508484], length 0
Of course, no SACK block will ever be added later, but nothing should break.
Technically, we could remove the 4 nops included in MD5+TS options,
but again some stacks could break seeing not conventional alignment.
Fixes: 4957faade1 ("TCPCT part 1g: Responder Cookie => Initiator")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6a2febec33 ]
MD5 keys are read with RCU protection, and tcp_md5_do_add()
might update in-place a prior key.
Normally, typical RCU updates would allocate a new piece
of memory. In this case only key->key and key->keylen might
be updated, and we do not care if an incoming packet could
see the old key, the new one, or some intermediate value,
since changing the key on a live flow is known to be problematic
anyway.
We only want to make sure that in the case key->keylen
is changed, cpus in tcp_md5_hash_key() wont try to use
uninitialized data, or crash because key->keylen was
read twice to feed sg_init_one() and ahash_request_set_crypt()
Fixes: 9ea88a1530 ("tcp: md5: check md5 signature without socket lock")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ba3bb0e76c ]
Whenever tcp_try_rmem_schedule() returns an error, we are under
trouble and should make sure to wakeup readers so that they
can drain socket queues and eventually make room.
Fixes: 03f45c883c ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 27d5332366 ]
In the tx path of l2tp, l2tp_xmit_skb() calls skb_dst_set() to set
skb's dst. However, it will eventually call inet6_csk_xmit() or
ip_queue_xmit() where skb's dst will be overwritten by:
skb_dst_set_noref(skb, dst);
without releasing the old dst in skb. Then it causes dst/dev refcnt leak:
unregister_netdevice: waiting for eth0 to become free. Usage count = 1
This can be reproduced by simply running:
# modprobe l2tp_eth && modprobe l2tp_ip
# sh ./tools/testing/selftests/net/l2tp.sh
So before going to inet6_csk_xmit() or ip_queue_xmit(), skb's dst
should be dropped. This patch is to fix it by removing skb_dst_set()
from l2tp_xmit_skb() and moving skb_dst_drop() into l2tp_xmit_core().
Fixes: 3557baabf2 ("[L2TP]: PPP over L2TP driver core")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: James Chapman <jchapman@katalix.com>
Tested-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5eff069023 ]
IPv4 ping sockets don't set fl4.fl4_icmp_{type,code}, which leads to
incomplete IPsec ACQUIRE messages being sent to userspace. Currently,
both raw sockets and IPv6 ping sockets set those fields.
Expected output of "ip xfrm monitor":
acquire proto esp
sel src 10.0.2.15/32 dst 8.8.8.8/32 proto icmp type 8 code 0 dev ens4
policy src 10.0.2.15/32 dst 8.8.8.8/32
<snip>
Currently with ping sockets:
acquire proto esp
sel src 10.0.2.15/32 dst 8.8.8.8/32 proto icmp type 0 code 0 dev ens4
policy src 10.0.2.15/32 dst 8.8.8.8/32
<snip>
The Libreswan test suite found this problem after Fedora changed the
value for the sysctl net.ipv4.ping_group_range.
Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Paul Wouters <pwouters@redhat.com>
Tested-by: Paul Wouters <pwouters@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1e82a62fec ]
A potential deadlock can occur during registering or unregistering a
new generic netlink family between the main nl_table_lock and the
cb_lock where each thread wants the lock held by the other, as
demonstrated below.
1) Thread 1 is performing a netlink_bind() operation on a socket. As part
of this call, it will call netlink_lock_table(), incrementing the
nl_table_users count to 1.
2) Thread 2 is registering (or unregistering) a genl_family via the
genl_(un)register_family() API. The cb_lock semaphore will be taken for
writing.
3) Thread 1 will call genl_bind() as part of the bind operation to handle
subscribing to GENL multicast groups at the request of the user. It will
attempt to take the cb_lock semaphore for reading, but it will fail and
be scheduled away, waiting for Thread 2 to finish the write.
4) Thread 2 will call netlink_table_grab() during the (un)registration
call. However, as Thread 1 has incremented nl_table_users, it will not
be able to proceed, and both threads will be stuck waiting for the
other.
genl_bind() is a noop, unless a genl_family implements the mcast_bind()
function to handle setting up family-specific multicast operations. Since
no one in-tree uses this functionality as Cong pointed out, simply removing
the genl_bind() function will remove the possibility for deadlock, as there
is no attempt by Thread 1 above to take the cb_lock semaphore.
Fixes: c380d9a7af ("genetlink: pass multicast bind/unbind to families")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Johannes Berg <johannes.berg@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 00fdec98d9 upstream.
Trap handler for syscall tracing reads EFA (Exception Fault Address),
in case strace wants PC of trap instruction (EFA is not part of pt_regs
as of current code).
However this EFA read is racy as it happens after dropping to pure
kernel mode (re-enabling interrupts). A taken interrupt could
context-switch, trigger a different task's trap, clobbering EFA for this
execution context.
Fix this by reading EFA early, before re-enabling interrupts. A slight
side benefit is de-duplication of FAKE_RET_FROM_EXCPN in trap handler.
The trap handler is common to both ARCompact and ARCv2 builds too.
This just came out of code rework/review and no real problem was reported
but is clearly a potential problem specially for strace.
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41855a8986 upstream.
clang static analysis flags this error
drivers/gpu/drm/radeon/ci_dpm.c:5652:9: warning: Use of memory after it is freed [unix.Malloc]
kfree(rdev->pm.dpm.ps[i].ps_priv);
^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/radeon/ci_dpm.c:5654:2: warning: Attempt to free released memory [unix.Malloc]
kfree(rdev->pm.dpm.ps);
^~~~~~~~~~~~~~~~~~~~~~
problem is reported in ci_dpm_fini, with these code blocks.
for (i = 0; i < rdev->pm.dpm.num_ps; i++) {
kfree(rdev->pm.dpm.ps[i].ps_priv);
}
kfree(rdev->pm.dpm.ps);
The first free happens in ci_parse_power_table where it cleans up locally
on a failure. ci_dpm_fini also does a cleanup.
ret = ci_parse_power_table(rdev);
if (ret) {
ci_dpm_fini(rdev);
return ret;
}
So remove the cleanup in ci_parse_power_table and
move the num_ps calculation to inside the loop so ci_dpm_fini
will know how many array elements to free.
Fixes: cc8dbbb4f6 ("drm/radeon: add dpm support for CI dGPUs (v2)")
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6bf9cd2eed upstream.
Under somewhat convoluted conditions, it is possible to attempt to
release an extent_buffer that is under io, which triggers a BUG_ON in
btrfs_release_extent_buffer_pages.
This relies on a few different factors. First, extent_buffer reads done
as readahead for searching use WAIT_NONE, so they free the local extent
buffer reference while the io is outstanding. However, they should still
be protected by TREE_REF. However, if the system is doing signficant
reclaim, and simultaneously heavily accessing the extent_buffers, it is
possible for releasepage to race with two concurrent readahead attempts
in a way that leaves TREE_REF unset when the readahead extent buffer is
released.
Essentially, if two tasks race to allocate a new extent_buffer, but the
winner who attempts the first io is rebuffed by a page being locked
(likely by the reclaim itself) then the loser will still go ahead with
issuing the readahead. The loser's call to find_extent_buffer must also
race with the reclaim task reading the extent_buffer's refcount as 1 in
a way that allows the reclaim to re-clear the TREE_REF checked by
find_extent_buffer.
The following represents an example execution demonstrating the race:
CPU0 CPU1 CPU2
reada_for_search reada_for_search
readahead_tree_block readahead_tree_block
find_create_tree_block find_create_tree_block
alloc_extent_buffer alloc_extent_buffer
find_extent_buffer // not found
allocates eb
lock pages
associate pages to eb
insert eb into radix tree
set TREE_REF, refs == 2
unlock pages
read_extent_buffer_pages // WAIT_NONE
not uptodate (brand new eb)
lock_page
if !trylock_page
goto unlock_exit // not an error
free_extent_buffer
release_extent_buffer
atomic_dec_and_test refs to 1
find_extent_buffer // found
try_release_extent_buffer
take refs_lock
reads refs == 1; no io
atomic_inc_not_zero refs to 2
mark_buffer_accessed
check_buffer_tree_ref
// not STALE, won't take refs_lock
refs == 2; TREE_REF set // no action
read_extent_buffer_pages // WAIT_NONE
clear TREE_REF
release_extent_buffer
atomic_dec_and_test refs to 1
unlock_page
still not uptodate (CPU1 read failed on trylock_page)
locks pages
set io_pages > 0
submit io
return
free_extent_buffer
release_extent_buffer
dec refs to 0
delete from radix tree
btrfs_release_extent_buffer_pages
BUG_ON(io_pages > 0)!!!
We observe this at a very low rate in production and were also able to
reproduce it in a test environment by introducing some spurious delays
and by introducing probabilistic trylock_page failures.
To fix it, we apply check_tree_ref at a point where it could not
possibly be unset by a competing task: after io_pages has been
incremented. All the codepaths that clear TREE_REF check for io, so they
would not be able to clear it after this point until the io is done.
Stack trace, for reference:
[1417839.424739] ------------[ cut here ]------------
[1417839.435328] kernel BUG at fs/btrfs/extent_io.c:4841!
[1417839.447024] invalid opcode: 0000 [#1] SMP
[1417839.502972] RIP: 0010:btrfs_release_extent_buffer_pages+0x20/0x1f0
[1417839.517008] Code: ed e9 ...
[1417839.558895] RSP: 0018:ffffc90020bcf798 EFLAGS: 00010202
[1417839.570816] RAX: 0000000000000002 RBX: ffff888102d6def0 RCX: 0000000000000028
[1417839.586962] RDX: 0000000000000002 RSI: ffff8887f0296482 RDI: ffff888102d6def0
[1417839.603108] RBP: ffff88885664a000 R08: 0000000000000046 R09: 0000000000000238
[1417839.619255] R10: 0000000000000028 R11: ffff88885664af68 R12: 0000000000000000
[1417839.635402] R13: 0000000000000000 R14: ffff88875f573ad0 R15: ffff888797aafd90
[1417839.651549] FS: 00007f5a844fa700(0000) GS:ffff88885f680000(0000) knlGS:0000000000000000
[1417839.669810] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1417839.682887] CR2: 00007f7884541fe0 CR3: 000000049f609002 CR4: 00000000003606e0
[1417839.699037] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1417839.715187] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1417839.731320] Call Trace:
[1417839.737103] release_extent_buffer+0x39/0x90
[1417839.746913] read_block_for_search.isra.38+0x2a3/0x370
[1417839.758645] btrfs_search_slot+0x260/0x9b0
[1417839.768054] btrfs_lookup_file_extent+0x4a/0x70
[1417839.778427] btrfs_get_extent+0x15f/0x830
[1417839.787665] ? submit_extent_page+0xc4/0x1c0
[1417839.797474] ? __do_readpage+0x299/0x7a0
[1417839.806515] __do_readpage+0x33b/0x7a0
[1417839.815171] ? btrfs_releasepage+0x70/0x70
[1417839.824597] extent_readpages+0x28f/0x400
[1417839.833836] read_pages+0x6a/0x1c0
[1417839.841729] ? startup_64+0x2/0x30
[1417839.849624] __do_page_cache_readahead+0x13c/0x1a0
[1417839.860590] filemap_fault+0x6c7/0x990
[1417839.869252] ? xas_load+0x8/0x80
[1417839.876756] ? xas_find+0x150/0x190
[1417839.884839] ? filemap_map_pages+0x295/0x3b0
[1417839.894652] __do_fault+0x32/0x110
[1417839.902540] __handle_mm_fault+0xacd/0x1000
[1417839.912156] handle_mm_fault+0xaa/0x1c0
[1417839.921004] __do_page_fault+0x242/0x4b0
[1417839.930044] ? page_fault+0x8/0x30
[1417839.937933] page_fault+0x1e/0x30
[1417839.945631] RIP: 0033:0x33c4bae
[1417839.952927] Code: Bad RIP value.
[1417839.960411] RSP: 002b:00007f5a844f7350 EFLAGS: 00010206
[1417839.972331] RAX: 000000000000006e RBX: 1614b3ff6a50398a RCX: 0000000000000000
[1417839.988477] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002
[1417840.004626] RBP: 00007f5a844f7420 R08: 000000000000006e R09: 00007f5a94aeccb8
[1417840.020784] R10: 00007f5a844f7350 R11: 0000000000000000 R12: 00007f5a94aecc79
[1417840.036932] R13: 00007f5a94aecc78 R14: 00007f5a94aecc90 R15: 00007f5a94aecc40
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>