[ Upstream commit 2bbea6e117 ]
when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.
PID: 6766 TASK: ffff88007b2a6dd0 CPU: 0 COMMAND: "mount"
#0 [ffff880078447ae0] __schedule at ffffffff8168d605
#1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
#2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
#3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
#4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
#5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
#6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
#7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
#8 [ffff880078447da8] mount_bdev at ffffffff81202570
#9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
#10 [ffff880078447e28] mount_fs at ffffffff81202d09
#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
#12 [ffff880078447ea8] do_mount at ffffffff81220fee
#13 [ffff880078447f28] sys_mount at ffffffff812218d6
#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
RIP: 00007fd9ea914e9a RSP: 00007ffd5d9bf648 RFLAGS: 00010246
RAX: 00000000000000a5 RBX: ffffffff81698c49 RCX: 0000000000000010
RDX: 00007fd9ec2bc210 RSI: 00007fd9ec2bc290 RDI: 00007fd9ec2bcf30
RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000010
R10: 00000000c0ed0001 R11: 0000000000000206 R12: 00007fd9ec2bc040
R13: 00007fd9eb6b2380 R14: 00007fd9ec2bc210 R15: 00007fd9ec2bcf30
ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b
This task was trying to mount the cdrom. It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.
PID: 6785 TASK: ffff880078720fb0 CPU: 0 COMMAND: "systemd-udevd"
#0 [ffff880078417898] __schedule at ffffffff8168d605
#1 [ffff880078417900] schedule at ffffffff8168dc59
#2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
#3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
#4 [ffff8800784179d0] down_read at ffffffff8168cde0
#5 [ffff8800784179e8] get_super at ffffffff81201cc7
#6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
#7 [ffff880078417a40] flush_disk at ffffffff8123a94b
#8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
#9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
#16 [ffff880078417d00] do_last at ffffffff8120d53d
#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
#20 [ffff880078417f70] sys_open at ffffffff811fde4e
#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
RIP: 00007f29438b0c20 RSP: 00007ffc76624b78 RFLAGS: 00010246
RAX: 0000000000000002 RBX: ffffffff81698c49 RCX: 0000000000000000
RDX: 00007f2944a5fa70 RSI: 00000000000a0800 RDI: 00007f2944a5fa70
RBP: 00007f2944a5f540 R8: 0000000000000000 R9: 0000000000000020
R10: 00007f2943614c40 R11: 0000000000000246 R12: ffffffff811fde4e
R13: ffff880078417f78 R14: 000000000000000c R15: 00007f2944a4b010
ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b
This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.
The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.
This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ecb29abd4c ]
A negative page register value means that no page needs to be
selected. This is used by status register read operations and needs
to be accepted. The failure to do so so results in missed status
and limit registers.
Fixes: da8e48ab48 ("hwmon: (pmbus) Always call _pmbus_read_byte in core driver")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a46f8cd696 ]
A negative page register value means that no page needs to be
selected. This is used by status register evaluations and needs
to be accepted.
Fixes: da8e48ab48 ("hwmon: (pmbus) Always call _pmbus_read_byte in core driver")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9e5b127d6f ]
Mark reported his arm64 perf fuzzer runs sometimes splat like:
armv8pmu_read_counter+0x1e8/0x2d8
armpmu_event_update+0x8c/0x188
armpmu_read+0xc/0x18
perf_output_read+0x550/0x11e8
perf_event_read_event+0x1d0/0x248
perf_event_exit_task+0x468/0xbb8
do_exit+0x690/0x1310
do_group_exit+0xd0/0x2b0
get_signal+0x2e8/0x17a8
do_signal+0x144/0x4f8
do_notify_resume+0x148/0x1e8
work_pending+0x8/0x14
which asserts that we only call pmu::read() on ACTIVE events.
The above callchain does:
perf_event_exit_task()
perf_event_exit_task_context()
task_ctx_sched_out() // INACTIVE
perf_event_exit_event()
perf_event_set_state(EXIT) // EXIT
sync_child_event()
perf_event_read_event()
perf_output_read()
perf_output_read_group()
leader->pmu->read()
Which results in doing a pmu::read() on an !ACTIVE event.
I _think_ this is 'new' since we added attr.inherit_stat, which added
the perf_event_read_event() to the exit path, without that
perf_event_read_output() would only trigger from samples and for
@event to trigger a sample, it's leader _must_ be ACTIVE too.
Still, adding this check makes it consistent with the @sub case for
the siblings.
Reported-and-Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 66ec32fc7c ]
max17042_get_status uses the core power_supply_am_i_supplied. That
function relies on DT properties to figure out the power supply
topology, and will error out without DT.
Fixes max17042 battery status being reported as "unknown".
Signed-off-by: Pierre Bourdon <delroth@google.com>
Signed-off-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bf617f7a92 ]
If noextent_cache mount option is on, we will never initialize extent tree
in inode, but still we're going to access it in f2fs_drop_extent_tree,
result in kernel panic as below:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: _raw_write_lock+0xc/0x30
Call Trace:
? f2fs_drop_extent_tree+0x41/0x70 [f2fs]
f2fs_fallocate+0x5a0/0xdd0 [f2fs]
? common_file_perm+0x47/0xc0
? apparmor_file_permission+0x1a/0x20
vfs_fallocate+0x15b/0x290
SyS_fallocate+0x44/0x70
do_syscall_64+0x6e/0x160
entry_SYSCALL64_slow_path+0x25/0x25
This patch fixes to check extent cache status before using in
f2fs_drop_extent_tree.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cd36d7a17f ]
Once CP_TRIMMED_FLAG is set, after a reboot, we will never issue discard
before LBA becomes invalid again, fix it by clearing the flag in
checkpoint without CP_TRIMMED reason.
Fixes: 1f43e2ad7b ("f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 17cd07ae95 ]
As Jayashree Mohan reported:
A simple workload to reproduce this would be :
1. create foo
2. Write (8K - 16K) // foo size = 16K now
3. fsync()
4. falloc zero_range , keep_size (4202496 - 4210688) // foo size must be 16K
5. fdatasync()
Crash now
On recovery, we see that the file size is 4210688 and not 16K, which
violates the semantics of keep_size flag. We have a test case to
reproduce this using CrashMonkey on 4.15 kernel. Try this out by
simply running :
./c_harness -f /dev/sda -d /dev/cow_ram0 -t f2fs -e 102400 -P -v
tests/generic_468_zero.so
The root cause is that we miss to set KEEP_SIZE bit correctly in zero_range
when zeroing block cross EOF with FALLOC_FL_KEEP_SIZE, let's fix this
missing case.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 94322ed8e8 ]
PSL9D doesn't have a data-cache that needs to be flushed before
resetting the card. However when cxl tries to flush data-cache on such
a card, it times-out as PSL_Control register never indicates flush
operation complete due to missing data-cache. This is usually
indicated in the kernel logs with this message:
"WARNING: cache flush timed out"
To fix this the patch checks PSL_Debug register CDC-Field(BIT:27)
which indicates the absence of a data-cache and sets a flag
'no_data_cache' in 'struct cxl_native' to indicate this. When
cxl_data_cache_flush() is called it checks the flag and if set bails
out early without requesting a data-cache flush operation to the PSL.
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2b74e2a9b3 ]
When sending TLB invalidates to the NPU we need to send extra flushes due
to a hardware issue. The original implementation would lock the all the
ATSD MMIO registers sequentially before unlocking and relocking each of
them sequentially to do the extra flush.
This introduced a deadlock as it is possible for one thread to hold one
ATSD register whilst waiting for another register to be freed while the
other thread is holding that register waiting for the one in the first
thread to be freed.
For example if there are two threads and two ATSD registers:
Thread A Thread B
----------------------
Acquire 1
Acquire 2
Release 1 Acquire 1
Wait 1 Wait 2
Both threads will be stuck waiting to acquire a register resulting in an
RCU stall warning or soft lockup.
This patch solves the deadlock by refactoring the code to ensure registers
are not released between flushes and to ensure all registers are either
acquired or released together and in order.
Fixes: bbd5ff50af ("powerpc/powernv/npu-dma: Add explicit flush when sending an ATSD")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f5246862f8 ]
In commit 4f8b50bbbe ("irq_work, ppc: Fix up arch hooks") a new
function arch_irq_work_raise() was added without a prototype in header
irq_work.h.
Fix the following warning (treated as error in W=1):
arch/powerpc/kernel/time.c:523:6: error: no previous prototype for ‘arch_irq_work_raise’
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d2fc8db691 ]
Assert RESET_SYSTEM bit for any reset and set MODE field from reset
type.
The watchdog control register has a RESET_SYSTEM bit that is really
closer to activate a reset, and RESET_SYSTEM_MODE field that chooses
how much to reset.
Before this patch, a node without these optional property would do a
SOC reset, but a node with properties requesting a cpu or SOC reset
would do nothing and a node requesting a system reset would do a
SOC reset.
Fixes: b7f0b8ad25 ("drivers/watchdog: ASPEED reference dev tree properties for config")
Signed-off-by: Milton Miller <miltonm@us.ibm.com>
Signed-off-by: Eddie James <eajames@linux.vnet.ibm.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a81abbb412 ]
RK3399 has rst_pulse_length in CONTROL_REG[4:2], determining the length
of pulse to issue for system reset. We shouldn't clobber this value,
because that might make the system reset ineffective. On RK3399, we're
seeing that a value of 000b (meaning 2 cycles) yields an unreliable
(partial?) reset, and so we only fully reset after the watchdog fires a
second time. If we retain the system default (010b, or 8 clock cycles),
then the watchdog reset is much more reliable.
Read-modify-write retains the system value and improves reset
reliability.
It seems we were intentionally clobbering the response mode previously,
to ensure we performed a system reset (we don't support an interrupt
notification), so retain that explicitly.
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5775b843a6 ]
We leave PCI devices not bound to a driver in D0 during runtime suspend.
But they may have a parent which is bound and can be transitioned to
D3cold at runtime. Once the parent goes to D3cold, the unbound child
may go to D3cold as well. When the child goes to D3cold, its internal
state, including configuration of BARs, MSI, ASPM, MPS, etc., is lost.
One example are recent hybrid graphics laptops which cut power to the
discrete GPU when the root port above it goes to ACPI power state D3.
Users may provoke this by unbinding the GPU driver and allowing runtime
PM on the GPU via sysfs: The PM core will then treat the GPU as
"suspended", which in turn allows the root port to runtime suspend,
causing the power resources listed in its _PR3 object to be powered off.
The GPU's BARs will be uninitialized when a driver later probes it.
Another example are hybrid graphics laptops where the GPU itself (rather
than the root port) is capable of runtime suspending to D3cold. If the
GPU's integrated HDA controller is not bound and the GPU's driver
decides to runtime suspend to D3cold, the HDA controller's BARs will be
uninitialized when a driver later probes it.
Fix by saving and restoring config space over a runtime suspend cycle
even if the device is not bound.
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Peter Wu <peter@lekensteyn.nl> # Nvidia Optimus
Tested-by: Lukas Wunner <lukas@wunner.de> # MacBook Pro
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[lukas: add commit message, bikeshed code comments for clarity]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://patchwork.freedesktop.org/patch/msgid/92fb6e6ae2730915eb733c08e2f76c6a313e3860.1520068884.git.lukas@wunner.de
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 831c326fcd ]
Commit ad67b74d24 ("printk: hash addresses printed with %p") lets
printk specifier %p to hash all addresses before printing, this was
resulting in the high 32 bits of pcsr can only output zeros. So
module cannot completely print pc value and it's pointless for debugging
purpose.
This patch fixes this by using %px to print pcsr instead.
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 563c4ba3bd ]
ah_attr contains the port number to which cm_id is bound. However, while
searching for GID table for matching GID entry, the port number is
ignored.
This could cause the wrong GID to be used when the ah_attr is converted to
an AH.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fca32340a5 ]
Executing command 'perf stat -T -- ls' dumps core on x86 and s390.
Here is the call back chain (done on x86):
# gdb ./perf
....
(gdb) r stat -T -- ls
...
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff56d1963 in vasprintf () from /lib64/libc.so.6
(gdb) where
#0 0x00007ffff56d1963 in vasprintf () from /lib64/libc.so.6
#1 0x00007ffff56ae484 in asprintf () from /lib64/libc.so.6
#2 0x00000000004f1982 in __parse_events_add_pmu (parse_state=0x7fffffffd580,
list=0xbfb970, name=0xbf3ef0 "cpu",
head_config=0xbfb930, auto_merge_stats=false) at util/parse-events.c:1233
#3 0x00000000004f1c8e in parse_events_add_pmu (parse_state=0x7fffffffd580,
list=0xbfb970, name=0xbf3ef0 "cpu",
head_config=0xbfb930) at util/parse-events.c:1288
#4 0x0000000000537ce3 in parse_events_parse (_parse_state=0x7fffffffd580,
scanner=0xbf4210) at util/parse-events.y:234
#5 0x00000000004f2c7a in parse_events__scanner (str=0x6b66c0
"task-clock,{instructions,cycles,cpu/cycles-t/,cpu/tx-start/}",
parse_state=0x7fffffffd580, start_token=258) at util/parse-events.c:1673
#6 0x00000000004f2e23 in parse_events (evlist=0xbe9990, str=0x6b66c0
"task-clock,{instructions,cycles,cpu/cycles-t/,cpu/tx-start/}", err=0x0)
at util/parse-events.c:1713
#7 0x000000000044e137 in add_default_attributes () at builtin-stat.c:2281
#8 0x000000000044f7b5 in cmd_stat (argc=1, argv=0x7fffffffe3b0) at
builtin-stat.c:2828
#9 0x00000000004c8b0f in run_builtin (p=0xab01a0 <commands+288>, argc=4,
argv=0x7fffffffe3b0) at perf.c:297
#10 0x00000000004c8d7c in handle_internal_command (argc=4,
argv=0x7fffffffe3b0) at perf.c:349
#11 0x00000000004c8ece in run_argv (argcp=0x7fffffffe20c,
argv=0x7fffffffe200) at perf.c:393
#12 0x00000000004c929c in main (argc=4, argv=0x7fffffffe3b0) at perf.c:537
(gdb)
It turns out that a NULL pointer is referenced. Here are the
function calls:
...
cmd_stat()
+---> add_default_attributes()
+---> parse_events(evsel_list, transaction_attrs, NULL);
3rd parameter set to NULL
Function parse_events(xx, xx, struct parse_events_error *err) dives
into a bison generated scanner and creates
parser state information for it first:
struct parse_events_state parse_state = {
.list = LIST_HEAD_INIT(parse_state.list),
.idx = evlist->nr_entries,
.error = err, <--- NULL POINTER !!!
.evlist = evlist,
};
Now various functions inside the bison scanner are called to end up in
__parse_events_add_pmu(struct parse_events_state *parse_state, ..) with
first parameter being a pointer to above structure definition.
Now the PMU event name is not found (because being executed in a VM) and
this function tries to create an error message with
asprintf(&parse_state->error.str, ....)
which references a NULL pointer and dumps core.
Fix this by providing a pointer to the necessary error information
instead of NULL. Technically only the else part is needed to avoid the
core dump, just lets be safe...
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180308145735.64717-1-tmricht@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0bcc3fb95b ]
Devices which use level-triggered interrupts under Windows 2016 with
Hyper-V role enabled don't work: Windows disables EOI broadcast in SPIV
unconditionally. Our in-kernel IOAPIC implementation emulates an old IOAPIC
version which has no EOI register so EOI never happens.
The issue was discovered and discussed a while ago:
https://www.spinics.net/lists/kvm/msg148098.html
While this is a guest OS bug (it should check that IOAPIC has the required
capabilities before disabling EOI broadcast) we can workaround it in KVM:
advertising DIRECTED_EOI with in-kernel IOAPIC makes little sense anyway.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 31184d8c6e ]
The errata FE-8471889 description has been updated. There is still a
timing violation for repeated start. But the errata now states that it
was only the case for the Standard mode (100 kHz), in Fast mode (400 kHz)
there is no issue.
This patch limit the errata fix to the Standard mode.
It has been tesed successfully on the clearfog (Aramda 388 based board).
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d7cb44496a ]
Setting sge_uld_rxq_info to NULL in free_queues_uld().
We are referencing sge_uld_rxq_info in cxgb_up(). This
will fix a panic when interface is brought up after a
ULDq creation failure.
Fixes: 94cdb8bb99 (cxgb4: Add support for dynamic allocation
of resources for ULD)
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudhar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2e08e4d2ff ]
The Allwinner H6 main CCU uses the internal oscillator of the SoC, which
is different with old SoCs' main CCU.
Add device tree binding for the Allwinner H6 main CCU.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fadd94e05c ]
In patch "bcache: fix cached_dev->count usage for bch_cache_set_error()",
cached_dev_get() is called when creating dc->writeback_thread, and
cached_dev_put() is called when exiting dc->writeback_thread. This
modification works well unless people detach the bcache device manually by
'echo 1 > /sys/block/bcache<N>/bcache/detach'
Because this sysfs interface only calls bch_cached_dev_detach() which wakes
up dc->writeback_thread but does not stop it. The reason is, before patch
"bcache: fix cached_dev->count usage for bch_cache_set_error()", inside
bch_writeback_thread(), if cache is not dirty after writeback,
cached_dev_put() will be called here. And in cached_dev_make_request() when
a new write request makes cache from clean to dirty, cached_dev_get() will
be called there. Since we don't operate dc->count in these locations,
refcount d->count cannot be dropped after cache becomes clean, and
cached_dev_detach_finish() won't be called to detach bcache device.
This patch fixes the issue by checking whether BCACHE_DEV_DETACHING is
set inside bch_writeback_thread(). If this bit is set and cache is clean
(no existing writeback_keys), break the while-loop, call cached_dev_put()
and quit the writeback thread.
Please note if cache is still dirty, even BCACHE_DEV_DETACHING is set the
writeback thread should continue to perform writeback, this is the original
design of manually detach.
It is safe to do the following check without locking, let me explain why,
+ if (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) &&
+ (!atomic_read(&dc->has_dirty) || !dc->writeback_running)) {
If the kenrel thread does not sleep and continue to run due to conditions
are not updated in time on the running CPU core, it just consumes more CPU
cycles and has no hurt. This should-sleep-but-run is safe here. We just
focus on the should-run-but-sleep condition, which means the writeback
thread goes to sleep in mistake while it should continue to run.
1, First of all, no matter the writeback thread is hung or not,
kthread_stop() from cached_dev_detach_finish() will wake up it and
terminate by making kthread_should_stop() return true. And in normal
run time, bit on index BCACHE_DEV_DETACHING is always cleared, the
condition
!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags)
is always true and can be ignored as constant value.
2, If one of the following conditions is true, the writeback thread should
go to sleep,
"!atomic_read(&dc->has_dirty)" or "!dc->writeback_running)"
each of them independently controls the writeback thread should sleep or
not, let's analyse them one by one.
2.1 condition "!atomic_read(&dc->has_dirty)"
If dc->has_dirty is set from 0 to 1 on another CPU core, bcache will
call bch_writeback_queue() immediately or call bch_writeback_add() which
indirectly calls bch_writeback_queue() too. In bch_writeback_queue(),
wake_up_process(dc->writeback_thread) is called. It sets writeback
thread's task state to TASK_RUNNING and following an implicit memory
barrier, then tries to wake up the writeback thread.
In writeback thread, its task state is set to TASK_INTERRUPTIBLE before
doing the condition check. If other CPU core sets the TASK_RUNNING state
after writeback thread setting TASK_INTERRUPTIBLE, the writeback thread
will be scheduled to run very soon because its state is not
TASK_INTERRUPTIBLE. If other CPU core sets the TASK_RUNNING state before
writeback thread setting TASK_INTERRUPTIBLE, the implict memory barrier
of wake_up_process() will make sure modification of dc->has_dirty on
other CPU core is updated and observed on the CPU core of writeback
thread. Therefore the condition check will correctly be false, and
continue writeback code without sleeping.
2.2 condition "!dc->writeback_running)"
dc->writeback_running can be changed via sysfs file, every time it is
modified, a following bch_writeback_queue() is alwasy called. So the
change is always observed on the CPU core of writeback thread. If
dc->writeback_running is changed from 0 to 1 on other CPU core, this
condition check will observe the modification and allow writeback
thread to continue to run without sleeping.
Now we can see, even without a locking protection, multiple conditions
check is safe here, no deadlock or process hang up will happen.
I compose a separte patch because that patch "bcache: fix cached_dev->count
usage for bch_cache_set_error()" already gets a "Reviewed-by:" from Hannes
Reinecke. Also this fix is not trivial and good for a separate patch.
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Huijun Tang <tang.junhui@zte.com.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 55496d3fe2 ]
The generic DMA API uses dev->dma_mask to check the DMA addressable
memory bitmask, and warns if no mask is set or even allocated.
Set z->dev.dma_coherent_mask on Zorro bus scan, and make z->dev.dma_mask
to point to z->dev.dma_coherent_mask so device drivers that need DMA have
everything set up to avoid warnings from dma_alloc_coherent(). Drivers can
still use dma_set_mask_and_coherent() to explicitly set their DMA bit mask.
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
[geert: Handle Zorro II with 24-bit address space]
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7672ed33c4 ]
Before commit f1b65df5a2 ("IB/mlx5: Add support for active_width and
active_speed in RoCE"), the mlx5_ib driver set the default active_width
and active_speed to IB_WIDTH_4X and IB_SPEED_QDR.
When the RoCE port is down, the RoCE port does not negotiate the active
width with the remote side, causing the active width to be zero. When
running userspace ibstat to view the port status, ibstat will panic as it
reads an invalid width from sys file.
This patch restores the original behavior.
Fixes: f1b65df5a2 ("IB/mlx5: Add support for active_width and active_speed in RoCE").
Signed-off-by: Honggang Li <honli@redhat.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 70ca608b2e ]
In MediaTek's IOMMU design, When a iommu translation fault occurs
(HW can NOT translate the destination address to a valid physical
address), the IOMMU HW output the dirty data into a special memory
to avoid corrupting the main memory, this is called "protect memory".
the register(0x114) for protect memory is a little different between
mt8173 and mt2712.
In the mt8173, bit[30:6] in the register represents [31:7] of the
physical address. In the 4GB mode, the register bit[31] should be 1.
While in the mt2712, the bits don't shift. bit[31:7] in the register
represents [31:7] in the physical address, and bit[1:0] in the
register represents bit[33:32] of the physical address if it has.
Fixes: e6dec92308 ("iommu/mediatek: Add mt2712 IOMMU support")
Reported-by: Honghui Zhang <honghui.zhang@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 20fb5a635a ]
We were relying on the pinned screen object backup buffer to be destroyed
when not used. But if we hold a copy of the atomic state, like when
hibernating, the backup buffer might not be destroyed since it's
refcounted by the atomic state. This causes us to hibernate with a
buffer pinned in VRAM.
Fix this by only having the buffer pinned when it is actually used by a
screen object.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0d9366d67b ]
If mount is auto-probing for filesystem type, it will try various
filesystems in order, with the MS_SILENT flag set. We get
that flag as the silent arg to ext4_fill_super.
If we're probing (silent==1) then don't complain about feature
incompatibilities that are found if it looks like it's actually
a different valid extN type - failed probes should be silent
in this case.
If the on-disk features are unknown even to ext4, then complain.
Reported-by: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
Tested-by: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bda7fab548 ]
The operstate update logic will leave an interface in the
default UNKNOWN operstate if the interface carrier state never changes
from the default carrier up state set at creation. This includes the
case of an explicit call to netif_carrier_on, as the carrier on to on
transition has no effect on operstate.
This affects virtio-net for the case that the virtio peer does
not support VIRTIO_NET_F_STATUS (the feature that provides carrier state
updates). Without this feature, the virtio specification states that
"the link should be assumed active," so, logically, the operstate should
be UP instead of UNKNOWN. This has impact on user space applications
that use the operstate to make availability decisions for the interface.
Resolve this by changing the virtio probe logic slightly to call
netif_carrier_off for both the "with" and "without" VIRTIO_NET_F_STATUS
cases, and then the existing call to netif_carrier_on for the "without"
case will cause an operstate transition.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6ffa340221 ]
Allow the device tree to specify a watchdog to fallover to
the alternate boot source.
The aspeeed watchdog can set a latch directing flash chip select 0 to
chip select 1, allowing boot from an alternate media if the watchdog
is not reset in time. On the ast2400 bank 1 also goes to flash bank 1,
while on the ast2500 the chip selects are swapped.
Also clear the secondary boot bit during the machine restart operation.
Otherwise, the system will switch to the alternate boot after every
reboot, which is not desired.
Signed-off-by: Milton Miller <miltonm@us.ibm.com>
Signed-off-by: Eddie James <eajames@linux.vnet.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fac37c628f ]
TPM_CRB driver provides TPM CRB 2.0 support. If it is built as a
module, the TPM chip is registered after IMA init. tpm_pcr_read() in
IMA fails and displays the following message even though eventually
there is a TPM chip on the system.
ima: No TPM chip found, activating TPM-bypass! (rc=-19)
Fix IMA Kconfig to select TPM_CRB so TPM_CRB driver is built in the kernel
and initializes before IMA.
Signed-off-by: Jiandi An <anjiandi@codeaurora.org>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>