commit f24e5834a2 upstream.
The high_memory global variable is used by
cma_declare_contiguous(.) before it is defined.
We don't notice this as we compute __pa(high_memory - 1), and it looks
like we're processing a VA from the direct linear map.
This problem becomes apparent when we flip the kernel virtual address
space and the linear map is moved to the bottom of the kernel VA space.
This patch moves the initialisation of high_memory before it used.
Fixes: f7426b983a ("mm: cma: adjust address limit to avoid hitting low/high memory boundary")
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 12841f87b7 upstream.
During an eeh a kernel-oops is reported if no vPHB is allocated to the
AFU. This happens as during AFU init, an error in creation of vPHB is
a non-fatal error. Hence afu->phb should always be checked for NULL
before iterating over it for the virtual AFU pci devices.
This patch fixes the kenel-oops by adding a NULL pointer check for
afu->phb before it is dereferenced.
Fixes: 9e8df8a219 ("cxl: EEH support")
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ee0a47186e ]
When the user sets count to zero the string buffer would remain
completely uninitialized which causes the kernel to parse its
own stack data, potentially leading to an info leak. In addition
to that, the string might be not terminated properly when the
user data does not contain a 0-terminator.
Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
Reviewed-by: Christoph Böhmwalder <christoph@boehmwalder.at>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 258bbb1b0e ]
The ICMP implementation currently replies to an ICMP time exceeded message
(type 11) with an ICMP host unreachable message (type 3, code 1).
However, time exceeded messages can either represent "time to live exceeded
in transit" (code 0) or "fragment reassembly time exceeded" (code 1).
Unconditionally replying to "fragment reassembly time exceeded" with
host unreachable messages might cause unjustified connection resets
which are now easily triggered as UFO has been removed, because, in turn,
sending large buffers triggers IP fragmentation.
The issue can be easily reproduced by running a lot of UDP streams
which is likely to trigger IP fragmentation:
# start netserver in the test namespace
ip netns add test
ip netns exec test netserver
# create a VETH pair
ip link add name veth0 type veth peer name veth0 netns test
ip link set veth0 up
ip -n test link set veth0 up
for i in $(seq 20 29); do
# assign addresses to both ends
ip addr add dev veth0 192.168.$i.1/24
ip -n test addr add dev veth0 192.168.$i.2/24
# start the traffic
netperf -L 192.168.$i.1 -H 192.168.$i.2 -t UDP_STREAM -l 0 &
done
# wait
send_data: data send error: No route to host (errno 113)
netperf: send_omni: send_data failed: No route to host
We need to differentiate instead: if fragment reassembly time exceeded
is reported, we need to silently drop the packet,
if time to live exceeded is reported, maintain the current behaviour.
In both cases increment the related error count "icmpInTimeExcds".
While at it, fix a typo in a comment, and convert the if statement
into a switch to mate it more readable.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b4b678b06f ]
When ndo_open and ndo_stop are called RTNL lock should be held.
In this specific case ipoib_ib_dev_open calls the offloaded ndo_open
which re-sets the number of TX queue assuming RTNL lock is held.
Since RTNL lock is not held, RTNL assert will fail.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c0b64f58e8 ]
According to the C standard the behavior of computations with
integer operands is as follows:
* A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one
greater than the largest value that can be represented by the
resulting type.
* The behavior for signed integer underflow and overflow is
undefined.
Hence only use unsigned integers when checking for integer
overflow.
This patch is what I came up with after having analyzed the
following smatch warnings:
drivers/infiniband/core/cma.c:3448: cma_resolve_ib_udp() warn: signed overflow undefined. 'offset + conn_param->private_data_len < conn_param->private_data_len'
drivers/infiniband/core/cma.c:3505: cma_connect_ib() warn: signed overflow undefined. 'offset + conn_param->private_data_len < conn_param->private_data_len'
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dd6b9c2c33 ]
This patch intoduces a slight adjustment for macvlan to address the fact
that in source mode I was seeing two copies of any packet addressed to the
macvlan interface being delivered where there should have been only one.
The issue appears to be that one copy was delivered based on the source MAC
address and then the second copy was being delivered based on the
destination MAC address. To fix it I am just treating a unicast address
match as though it is not a match since source based macvlan isn't supposed
to be matching based on the destination MAC anyway.
Fixes: 79cf79abce ("macvlan: add source mode")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit abdc0eb069 ]
When session starts beyond offset 2^31 the arithmetics in
udf_check_vsd() would overflow. Make sure the computation is done in
large enough type.
Reported-by: Cezary Sliwa <sliwa@ifpan.edu.pl>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3e35127565 ]
We could allocate less memory than intended because we do:
bfad->regdata = kzalloc(len << 2, GFP_KERNEL);
The shift can overflow leading to a crash. This is debugfs code so the
impact is very small. I fixed the network version of this in March with
commit 13e2d5187f ("bna: integer overflow bug in debugfs").
Fixes: ab2a9ba189 ("[SCSI] bfa: add debugfs support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 08880f8e08 ]
The driver may sleep under a spinlock, and the function call path is:
rtw_set_802_11_bssid(acquire the spinlock)
rtw_disassoc_cmd
kzalloc(GFP_KERNEL) --> may sleep
To fix it, GFP_KERNEL is replaced with GFP_ATOMIC.
This bug is found by my static analysis tool and my code review.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2bf9806d42 ]
The driver may sleep under a spinlock, and the function call path is:
rtw_surveydone_event_callback(acquire the spinlock)
rtw_createbss_cmd
kzalloc(GFP_KERNEL) --> may sleep
To fix it, GFP_KERNEL is replaced with GFP_ATOMIC.
This bug is found by my static analysis tool and my code review.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 42c8eb3f6e ]
The driver may sleep under a spinlock, and the function call path is:
vt6655_suspend (acquire the spinlock)
pci_set_power_state
__pci_start_power_transition (drivers/pci/pci.c)
msleep --> may sleep
To fix it, pci_set_power_state is called without having a spinlock.
This bug is found by my static analysis tool and my code review.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 99260132fd ]
The original code only took into consideration the largest header
possible after the IB_BTH_BYTES. This was incorrect, as the largest
possible header size is the largest possible combination of headers we
might run into. The new code accounts for all possible headers in the
largest possible combination and subtracts that from the MTU to make
sure that all packets will fit on the wire.
Link: https://www.spinics.net/lists/linux-rdma/msg54558.html
Fixes: 3c86aa70bf ("RDMA/cm: Add RDMA CM support for IBoE devices")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reported-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 235b6003fb ]
When reshaping a fully degraded raid5/raid6 to a larger
nubmer of devices, the new device(s) are not in-sync
and so that can make the newly grown stripe appear to be
"failed".
To avoid this, we set the R5_Expanded flag to say "Even though
this device is not fully in-sync, this block is safe so
don't treat the device as failed for this stripe".
This flag is set for data devices, not not for parity devices.
Consequently, if you have a RAID6 with two devices that are partly
recovered and a spare, and start a reshape to include the spare,
then when the reshape gets past the point where the recovery was
up to, it will think the stripes are failed and will get into
an infinite loop, failing to make progress.
So when contructing parity on an EXPAND_READY stripe,
set R5_Expanded.
Reported-by: Curt <lightspd@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1c363531dd ]
The build robot is complaining on Blackfin:
drivers/pinctrl/pinctrl-adi2.c: In function 'port_setup':
>> drivers/pinctrl/pinctrl-adi2.c:221:21: error: dereferencing
pointer to incomplete type 'struct gpio_port_t'
writew(readw(®s->port_fer) & ~BIT(offset),
^~
drivers/pinctrl/pinctrl-adi2.c: In function 'adi_gpio_ack_irq':
>> drivers/pinctrl/pinctrl-adi2.c:266:18: error: dereferencing
pointer to incomplete type 'struct bfin_pint_regs'
if (readl(®s->invert_set) & pintbit)
^~
It seems the driver need to include <asm/gpio.h> and <asm/irq.h>
to compile.
The Blackfin architecture was re-defining the Kconfig
PINCTRL symbol which is not OK, so replaced this with
PINCTRL_BLACKFIN_ADI2 which selects PINCTRL and PINCTRL_ADI2
just like most arches do.
Further, the old GPIO driver symbol GPIO_ADI was possible to
select at the same time as selecting PINCTRL. This was not
working because the arch-local <asm/gpio.h> header contains
an explicit #ifndef PINCTRL clause making compilation break
if you combine them. The same is true for DEBUG_MMRS.
Make sure the ADI2 pinctrl driver is not selected at the same
time as the old GPIO implementation. (This should be converted
to use gpiolib or pincontrol and move to drivers/...) Also make
sure the old GPIO_ADI driver or DEBUG_MMRS is not selected at
the same time as the new PINCTRL implementation, and only make
PINCTRL_ADI2 selectable for the Blackfin families that actually
have it.
This way it is still possible to add e.g. I2C-based pin
control expanders on the Blackfin.
Cc: Steven Miao <realmz6@gmail.com>
Cc: Huanhuan Feng <huanhuan.feng@analog.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd3486ded7 upstream.
When babble condition happens, the musb controller might automatically
turns off VBUS. On DA8xx platform, the controller generates drvvbus
interrupt for turning off VBUS along with the babble interrupt.
In this case, we should handle the babble interrupt first and recover
from the babble condition.
This change ignores the drvvbus interrupt if babble interrupt is also
generated at the same time, so the babble recovery routine works
properly.
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fb2c1934f3 ]
When compiling using sparse, we got the following error:
drivers/soc/mediatek/mtk-pmic-wrap.c:686:25: error: dubious one-bit signed bitfield
Changing the data type to unsigned fixes this.
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 05c14c0313 ]
In the hv-24x7 code there is a function memord() which tries to
implement a sort function return -1, 0, 1. However one of the
conditions is incorrect, such that it can never be true, because we
will have already returned.
I don't believe there is a bug in practice though, because the
comparisons are an optimisation prior to calling memcmp().
Fix it by swapping the second comparision, so it can be true.
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dfb2e6f46b ]
This patch cleans up a lot of warnings when unloading the driver.
A current example of the stack trace starts with:
[ 142.570715] sysfs group 'power' not found for kobject 'port-5:0'
There can be hundreds of these messages during a driver unload.
I am resubmitting this patch on behalf of Martin Wilck with his
permission.
His original patch can be found here:
https://www.spinics.net/lists/linux-scsi/msg102085.html
This patch did not help until Hannes's
commit 9441284fbc39 ("scsi-fixup-kernel-warning-during-rmmod")
was applied to the kernel.
---------------------------
Original patch description:
---------------------------
Unloading the hpsa driver causes warnings
[ 1063.793652] WARNING: CPU: 1 PID: 4850 at ../fs/sysfs/group.c:237 device_del+0x54/0x240()
[ 1063.793659] sysfs group ffffffff81cf21a0 not found for kobject 'port-2:0'
with two different stacks:
1)
[ 1063.793774] [<ffffffff81448af4>] device_del+0x54/0x240
[ 1063.793780] [<ffffffff8145178a>] transport_remove_classdev+0x4a/0x60
[ 1063.793784] [<ffffffff81451216>] attribute_container_device_trigger+0xa6/0xb0
[ 1063.793802] [<ffffffffa0105d46>] sas_port_delete+0x126/0x160 [scsi_transport_sas]
[ 1063.793819] [<ffffffffa036ebcc>] hpsa_free_sas_port+0x3c/0x70 [hpsa]
2)
[ 1063.797103] [<ffffffff81448af4>] device_del+0x54/0x240
[ 1063.797118] [<ffffffffa0105d4e>] sas_port_delete+0x12e/0x160 [scsi_transport_sas]
[ 1063.797134] [<ffffffffa036ebcc>] hpsa_free_sas_port+0x3c/0x70 [hpsa]
This is caused by the fact that host device hostX is deleted before the
SAS transport devices hostX/port-a:b.
This patch fixes this by reverting the order of device deletions.
Tested-by: Don Brace <don.brace@microsemi.com>
Reviewed-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin Wilck <mwilck@suse.de>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 55ca38b425 ]
I am resubmitting this patch on behalf of Martin Wilck with his
permission.
The original patch can be found here:
https://www.spinics.net/lists/linux-scsi/msg102083.html
This patch did not help until Hannes's
commit 9441284fbc39 ("scsi-fixup-kernel-warning-during-rmmod")
was applied to the kernel.
--------------------------------------
Original patch description from Martin:
--------------------------------------
When the hpsa module is unloaded using rmmod, dangling
symlinks remain under /sys/class/sas_phy. Fix this by
calling sas_phy_delete() rather than sas_phy_free (which,
according to comments, should not be called for PHYs that
have been set up successfully, anyway).
Tested-by: Don Brace <don.brace@microsemi.com>
Reviewed-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin Wilck <mwilck@suse.de>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 16b6c8bb68 ]
When removing a device, for example a VF being removed due to SR-IOV
teardown, a "soft" hot-unplug via 'echo 1 > remove' in sysfs, or an actual
hot-unplug, we first remove the procfs and sysfs attributes for the device
before attempting to release the device from any driver bound to it.
Unbinding the driver from the device can take time. The device might need
to write out data or it might be actively in use. If it's in use by
userspace through a vfio driver, the unbind might block until the user
releases the device. This leads to a potentially non-trivial amount of
time where the device exists, but we've torn down the interfaces that
userspace uses to examine devices, for instance lspci might generate this
sort of error:
pcilib: Cannot open /sys/bus/pci/devices/0000:01:0a.3/config
lspci: Unable to read the standard configuration space header of device 0000:01:0a.3
We don't seem to have any dependence on this teardown ordering in the
kernel, so let's unbind the driver first, which is also more symmetric with
the instantiation of the device in pci_bus_add_device().
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5e422f5e4f ]
There was one spot in xfs_bmap_add_extent_unwritten_real that didn't use the
passed in new extent state but always converted to normal, leading to wrong
behavior when converting from normal to unwritten.
Only found by code inspection, it seems like this code path to move partial
extent from written to unwritten while merging it with the next extent is
rarely exercised.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9f2a450580 ]
It is possible for mkfs to format very small filesystems with too
small of an internal log with respect to the various minimum size
and block count requirements. If this occurs when the log happens to
be smaller than the scan window used for cycle verification and the
scan wraps the end of the log, the start_blk calculation in
xlog_find_head() underflows and leads to an attempt to scan an
invalid range of log blocks. This results in log recovery failure
and a failed mount.
Since there may be filesystems out in the wild with this kind of
geometry, we cannot simply refuse to mount. Instead, cap the scan
window for cycle verification to the size of the physical log. This
ensures that the cycle verification proceeds as expected when the
scan wraps the end of the log.
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2dd4122854 ]
For kref_get_unless_zero to protect against lookup vs free races we need
to use it in all places where we aren't guaranteed to already hold a
reference. There is no such guarantee in nvme_find_get_ns, so switch to
kref_get_unless_zero in this function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 163ca80013 ]
Added support for HP ProBook 440 G4 laptops by including the accelerometer
orientation quirk for that device. Testing was performed based on the
axis orientation guidelines here:
https://www.kernel.org/doc/Documentation/misc-devices/lis3lv02d
which states "If the left side is elevated, X increases (becomes positive)".
When tested, on lifting the left edge, x values became increasingly negative
thus indicating an inverted x-axis on the installed lis3lv02d chip.
This was compensated by adding an entry for this device in hp_accel.c
specifying the quirk as x_inverted. The patch was tested on a
ProBook 440 G4 device and x-axis as well as y and z-axis values are now
generated as per spec.
Signed-off-by: Osama Khan <osama.khan@ericsson.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fd9dde6abc ]
Upon upgrading to binutils 2.27, we found that our lz4 and gzip
compressed kernel images were significantly larger, resulting is 10ms
boot time regressions.
As noted by Rahul:
"aarch64 binaries uses RELA relocations, where each relocation entry
includes an addend value. This is similar to x86_64. On x86_64, the
addend values are also stored at the relocation offset for relative
relocations. This is an optimization: in the case where code does not
need to be relocated, the loader can simply skip processing relative
relocations. In binutils-2.25, both bfd and gold linkers did this for
x86_64, but only the gold linker did this for aarch64. The kernel build
here is using the bfd linker, which stored zeroes at the relocation
offsets for relative relocations. Since a set of zeroes compresses
better than a set of non-zero addend values, this behavior was resulting
in much better lz4 compression.
The bfd linker in binutils-2.27 is now storing the actual addend values
at the relocation offsets. The behavior is now consistent with what it
does for x86_64 and what gold linker does for both architectures. The
change happened in this upstream commit:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613
Since a bunch of zeroes got replaced by non-zero addend values, we see
the side effect of lz4 compressed image being a bit bigger.
To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
flag can be used:
$ LDFLAGS="--no-apply-dynamic-relocs" make
With this flag, the compressed image size is back to what it was with
binutils-2.25.
If the kernel is using ASLR, there aren't additional runtime costs to
--no-apply-dynamic-relocs, as the relocations will need to be applied
again anyway after the kernel is relocated to a random address.
If the kernel is not using ASLR, then presumably the current default
behavior of the linker is better. Since the static linker performed the
dynamic relocs, and the kernel is not moved to a different address at
load time, it can skip applying the relocations all over again."
Some measurements:
$ ld -v
GNU ld (binutils-2.25-f3d35cf6) 2.25.51.20141117
^
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300652760 Oct 26 11:57 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 16932627 Oct 26 11:57 Image.lz4-dtb
$ ld -v
GNU ld (binutils-2.27-53dd00a1) 2.27.0.20170315
^
pre patch:
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 11:43 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 18159474 Oct 26 11:43 Image.lz4-dtb
post patch:
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 12:06 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 16932466 Oct 26 12:06 Image.lz4-dtb
By Siqi's measurement w/ gzip:
binutils 2.27 with this patch (with --no-apply-dynamic-relocs):
Image 41535488
Image.gz 13404067
binutils 2.27 without this patch (without --no-apply-dynamic-relocs):
Image 41535488
Image.gz 14125516
Any compression scheme should be able to get better results from the
longer runs of zeros, not just GZIP and LZ4.
10ms boot time savings isn't anything to get excited about, but users of
arm64+compression+bfd-2.27 should not have to pay a penalty for no
runtime improvement.
Reported-by: Gopinath Elanchezhian <gelanchezhian@google.com>
Reported-by: Sindhuri Pentyala <spentyala@google.com>
Reported-by: Wei Wang <wvw@google.com>
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Suggested-by: Rahul Chaudhry <rahulchaudhry@google.com>
Suggested-by: Siqi Lin <siqilin@google.com>
Suggested-by: Stephen Hines <srhines@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: added comment to Makefile]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 00f9203119 ]
__subn_get_opa_portinfo stores value returned by hfi1_get_ib_cfg() as
operational vls. hfi1_get_ib_cfg() returns vls_operational field in
hfi1_pportdata. The problem with this is that the value is always equal
to vls_supported field in hfi1_pportdata.
The logic to calculate operational_vls is to set value passed by FM
(in __subn_set_opa_portinfo routine). If no value is passed then
default value is stored in operational_vls.
Field actual_vls_operational is calculated on the basis of buffer
control table. Hence, modifying hfi1_get_ib_cfg() to return
actual_operational_vls when used with HFI1_IB_CFG_OP_VLS parameter
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Patel Jay P <jay.p.patel@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c157313791 ]
Currently, Cache missed IOs are identified by s->cache_miss, but actually,
there are many situations that missed IOs are not assigned a value for
s->cache_miss in cached_dev_cache_miss(), for example, a bypassed IO
(s->iop.bypass = 1), or the cache_bio allocate failed. In these situations,
it will go to out_put or out_submit, and s->cache_miss is null, which leads
bch_mark_cache_accounting() to treat this IO as a hit IO.
[ML: applied by 3-way merge]
Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Reviewed-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 330a4db89d ]
mutex_destroy does nothing most of time, but it's better to call
it to make the code future proof and it also has some meaning
for like mutex debug.
As Coly pointed out in a previous review, bcache_exit() may not be
able to handle all the references properly if userspace registers
cache and backing devices right before bch_debug_init runs and
bch_debug_init failes later. So not exposing userspace interface
until everything is ready to avoid that issue.
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Reviewed-by: Coly Li <colyli@suse.de>
Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cc555b09d8 ]
This patch fixes a deadlock caused when the jdata flag is set for
inodes that are already on the ordered write list. Since it is
on the ordered write list, log_flush calls gfs2_ordered_write which
calls filemap_fdatawrite. But since the inode had the jdata flag
set, that calls gfs2_jdata_writepages, which tries to start a new
transaction. A new transaction cannot be started because it tries
to acquire the log_flush rwsem which is already locked by the log
flush operation.
The bottom line is: We cannot switch an inode from ordered to jdata
until we eliminate any ordered data pages (via log flush) or any
log_flush operation afterward will create the circular dependency
above. So we need to flush the log before setting the diskflags to
switch the file mode, then we need to remove the inode from the
ordered writes list.
Before this patch, the log flush was done for jdata->ordered, but
that's wrong. If we're going from jdata to ordered, we don't need
to call gfs2_log_flush because the call to filemap_fdatawrite will
do it for us:
filemap_fdatawrite() -> __filemap_fdatawrite_range()
__filemap_fdatawrite_range() -> do_writepages()
do_writepages() -> gfs2_jdata_writepages()
gfs2_jdata_writepages() -> gfs2_log_flush()
This patch modifies function do_gfs2_set_flags so that if a file
has its jdata flag set, and it's already on the ordered write list,
the log will be flushed and it will be removed from the list
before setting the flag.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e33d7c5645 ]
The scsi_debug driver incorrectly suggests there is an error with the
SCSI WRITE SAME command when the number_of_logical_blocks is greater
than 1. It will also suggest there is an error when NDOB
(no data-out buffer) is set and the number_of_logical_blocks is
greater than 0. Both are valid, fix.
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 07209fcf33 ]
There is a particular situation when the cooling device is cpufreq and the heat
dissipation is not efficient enough where the temperature increases little by
little until reaching the critical threshold and leading to a SoC reset.
The behavior is reproducible on a hikey6220 with bad heat dissipation (eg.
stacked with other boards).
Running a simple C program doing while(1); for each CPU of the SoC makes the
temperature to reach the passive regulation trip point and ends up to the
maximum allowed temperature followed by a reset.
This issue has been also reported by running the libhugetlbfs test suite.
What is observed is a ping pong between two cpu frequencies, 1.2GHz and 900MHz
while the temperature continues to grow.
It appears the step wise governor calls get_target_state() the first time with
the throttle set to true and the trend to 'raising'. The code selects logically
the next state, so the cpu frequency decreases from 1.2GHz to 900MHz, so far so
good. The temperature decreases immediately but still stays greater than the
trip point, then get_target_state() is called again, this time with the
throttle set to true *and* the trend to 'dropping'. From there the algorithm
assumes we have to step down the state and the cpu frequency jumps back to
1.2GHz. But the temperature is still higher than the trip point, so
get_target_state() is called with throttle=1 and trend='raising' again, we jump
to 900MHz, then get_target_state() is called with throttle=1 and
trend='dropping', we jump to 1.2GHz, etc ... but the temperature does not
stabilizes and continues to increase.
[ 237.922654] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1
[ 237.922678] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1
[ 237.922690] thermal cooling_device0: cur_state=0
[ 237.922701] thermal cooling_device0: old_target=0, target=1
[ 238.026656] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1
[ 238.026680] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=1
[ 238.026694] thermal cooling_device0: cur_state=1
[ 238.026707] thermal cooling_device0: old_target=1, target=0
[ 238.134647] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1
[ 238.134667] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1
[ 238.134679] thermal cooling_device0: cur_state=0
[ 238.134690] thermal cooling_device0: old_target=0, target=1
In this situation the temperature continues to increase while the trend is
oscillating between 'dropping' and 'raising'. We need to keep the current state
untouched if the throttle is set, so the temperature can decrease or a higher
state could be selected, thus preventing this oscillation.
Keeping the next_target untouched when 'throttle' is true at 'dropping' time
fixes the issue.
The following traces show the governor does not change the next state if
trend==2 (dropping) and throttle==1.
[ 2306.127987] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1
[ 2306.128009] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1
[ 2306.128021] thermal cooling_device0: cur_state=0
[ 2306.128031] thermal cooling_device0: old_target=0, target=1
[ 2306.231991] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1
[ 2306.232016] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=1
[ 2306.232030] thermal cooling_device0: cur_state=1
[ 2306.232042] thermal cooling_device0: old_target=1, target=1
[ 2306.335982] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1
[ 2306.336006] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=1
[ 2306.336021] thermal cooling_device0: cur_state=1
[ 2306.336034] thermal cooling_device0: old_target=1, target=1
[ 2306.439984] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1
[ 2306.440008] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=0
[ 2306.440022] thermal cooling_device0: cur_state=1
[ 2306.440034] thermal cooling_device0: old_target=1, target=0
[ ... ]
After a while, if the temperature continues to increase, the next state becomes
2 which is 720MHz on the hikey. That results in the temperature stabilizing
around the trip point.
[ 2455.831982] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1
[ 2455.832006] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=0
[ 2455.832019] thermal cooling_device0: cur_state=1
[ 2455.832032] thermal cooling_device0: old_target=1, target=1
[ 2455.935985] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1
[ 2455.936013] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=0
[ 2455.936027] thermal cooling_device0: cur_state=1
[ 2455.936040] thermal cooling_device0: old_target=1, target=1
[ 2456.043984] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1
[ 2456.044009] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=0
[ 2456.044023] thermal cooling_device0: cur_state=1
[ 2456.044036] thermal cooling_device0: old_target=1, target=1
[ 2456.148001] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1
[ 2456.148028] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1
[ 2456.148042] thermal cooling_device0: cur_state=1
[ 2456.148055] thermal cooling_device0: old_target=1, target=2
[ 2456.252009] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1
[ 2456.252041] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=0
[ 2456.252058] thermal cooling_device0: cur_state=2
[ 2456.252075] thermal cooling_device0: old_target=2, target=1
IOW, this change is needed to keep the state for a cooling device if the
temperature trend is oscillating while the temperature increases slightly.
Without this change, the situation above leads to a catastrophic crash by a
hardware reset on hikey. This issue has been reported to happen on an OMAP
dra7xx also.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Keerthy <j-keerthy@ti.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Tested-by: Keerthy <j-keerthy@ti.com>
Reviewed-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c68ee58d9e ]
On i.MX6 SoCs without VPU (in my case MCIMX6D4AVT10AC), the hdmi driver
fails to probe:
[ 2.540030] dwhdmi-imx 120000.hdmi: Unsupported HDMI controller
(0000:00:00)
[ 2.548199] imx-drm display-subsystem: failed to bind 120000.hdmi
(ops dw_hdmi_imx_ops): -19
[ 2.557403] imx-drm display-subsystem: master bind failed: -19
That's because hdmi_isfr's parent, video_27m, is not correctly ungated.
As explained in commit 5ccc248cc5 ("ARM: imx6q: clk: Add support for
mipi_core_cfg clock as a shared clock gate"), video_27m is gated by
CCM_CCGR3[CG8].
On i.MX6 SoCs with VPU, the hdmi is working thanks to the
CCM_CMEOR[mod_en_ov_vpu] bit which makes the video_27m ungated whatever
is in CCM_CCGR3[CG8]. The issue can be reproduced by setting
CCMEOR[mod_en_ov_vpu] to 0.
Make the HDMI work in every case by setting hdmi_isfr's parent to
mipi_core_cfg.
Signed-off-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c955bf3998 ]
Since the previous setup always sets the PLL using crystal 26MHz, this
doesn't always happen in every MediaTek platform. So the patch added
flexibility for assigning extra member for determining the PLL source
clock.
Signed-off-by: Chen Zhong <chen.zhong@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 592e254502 ]
_calc_vm_trans() does not handle the situation when some of the passed
flags are 0 (which can happen if these VM flags do not make sense for
the architecture). Improve the _calc_vm_trans() macro to return 0 in
such situation. Since all passed flags are constant, this does not add
any runtime overhead.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 350976ae21 ]
On truncate down, if new size is not block size aligned, we zero the
rest of block to avoid exposing stale data to user, and
iomap_truncate_page() skips zeroing if the range is already in
unwritten state or a hole. Then we writeback from on-disk i_size to
the new size if this range hasn't been written to disk yet, and
truncate page cache beyond new EOF and set in-core i_size.
The problem is that we could write data between di_size and newsize
before removing the page cache beyond newsize, as the extents may
still be in unwritten state right after a buffer write. As such, the
page of data that newsize lies in has not been zeroed by page cache
invalidation before it is written, and xfs_do_writepage() hasn't
triggered it's "zero data beyond EOF" case because we haven't
updated in-core i_size yet. Then a subsequent mmap read could see
non-zeros past EOF.
I occasionally see this in fsx runs in fstests generic/112, a
simplified fsx operation sequence is like (assuming 4k block size
xfs):
fallocate 0x0 0x1000 0x0 keep_size
write 0x0 0x1000 0x0
truncate 0x0 0x800 0x1000
punch_hole 0x0 0x800 0x800
mapread 0x0 0x800 0x800
where fallocate allocates unwritten extent but doesn't update
i_size, buffer write populates the page cache and extent is still
unwritten, truncate skips zeroing page past new EOF and writes the
page to disk, punch_hole invalidates the page cache, at last mapread
reads the block back and sees non-zero beyond EOF.
Fix it by moving truncate_setsize() to before writeback so the page
cache invalidation zeros the partial page at the new EOF. This also
triggers "zero data beyond EOF" in xfs_do_writepage() at writeback
time, because newsize has been set and page straddles the newsize.
Also fixed the wrong 'end' param of filemap_write_and_wait_range()
call while we're at it, the 'end' is inclusive and should be
'newsize - 1'.
Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eryu Guan <eguan@redhat.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>