Commit Graph

900440 Commits

Author SHA1 Message Date
Jens Axboe
0c9d5ccd26 io-wq: add support for uncancellable work
Not all work can be cancelled, some of it we may need to guarantee
that it runs to completion. Allow the caller to set IO_WQ_WORK_NO_CANCEL
on work that must not be cancelled. Note that the caller work function
must also check for IO_WQ_WORK_NO_CANCEL on work that is marked
IO_WQ_WORK_CANCEL.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:01:53 -07:00
Jens Axboe
6e802a4ba0 fs: move filp_close() outside of __close_fd_get_file()
Just one caller of this, and just use filp_close() there manually.
This is important to allow async close/removal of the fd.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:01:53 -07:00
Jens Axboe
15b71abe7b io_uring: add support for IORING_OP_OPENAT
This works just like openat(2), except it can be performed async. For
the normal case of a non-blocking path lookup this will complete
inline. If we have to do IO to perform the open, it'll be done from
async context.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:01:53 -07:00
Jens Axboe
35cb6d54c1 fs: make build_open_flags() available internally
This is a prep patch for supporting non-blocking open from io_uring.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:01:53 -07:00
Jens Axboe
d63d1b5edb io_uring: add support for fallocate()
This exposes fallocate(2) through io_uring.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:01:53 -07:00
Jens Axboe
4d92748373 Merge branch 'io_uring-5.5' into for-5.6/io_uring-vfs
Pull in compatability fix for the files_update command.

* io_uring-5.5:
  io_uring: fix compat for IORING_REGISTER_FILES_UPDATE
2020-01-20 17:01:17 -07:00
Eugene Syromiatnikov
1292e972ff io_uring: fix compat for IORING_REGISTER_FILES_UPDATE
fds field of struct io_uring_files_update is problematic with regards
to compat user space, as pointer size is different in 32-bit, 32-on-64-bit,
and 64-bit user space.  In order to avoid custom handling of compat in
the syscall implementation, make fds __u64 and use u64_to_user_ptr in
order to retrieve it.  Also, align the field naturally and check that
no garbage is passed there.

Fixes: c3a31e6056 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:00:44 -07:00
Alexander Lobakin
4f29ad200f MIPS: syscalls: fix indentation of the 'SYSNR' message
It also lacks a whitespace (copy'n'paste error?) and also messes up the
output:

  SYSHDR  arch/mips/include/generated/uapi/asm/unistd_n32.h
  SYSHDR  arch/mips/include/generated/uapi/asm/unistd_n64.h
  SYSHDR  arch/mips/include/generated/uapi/asm/unistd_o32.h
  SYSNR  arch/mips/include/generated/uapi/asm/unistd_nr_n32.h
  SYSNR  arch/mips/include/generated/uapi/asm/unistd_nr_n64.h
  SYSNR  arch/mips/include/generated/uapi/asm/unistd_nr_o32.h
  WRAP    arch/mips/include/generated/uapi/asm/bpf_perf_event.h
  WRAP    arch/mips/include/generated/uapi/asm/ipcbuf.h

After:

  SYSHDR  arch/mips/include/generated/uapi/asm/unistd_n32.h
  SYSHDR  arch/mips/include/generated/uapi/asm/unistd_n64.h
  SYSHDR  arch/mips/include/generated/uapi/asm/unistd_o32.h
  SYSNR   arch/mips/include/generated/uapi/asm/unistd_nr_n32.h
  SYSNR   arch/mips/include/generated/uapi/asm/unistd_nr_n64.h
  SYSNR   arch/mips/include/generated/uapi/asm/unistd_nr_o32.h
  WRAP    arch/mips/include/generated/uapi/asm/bpf_perf_event.h
  WRAP    arch/mips/include/generated/uapi/asm/ipcbuf.h

Present since day 0 of syscall table generation introduction for MIPS.

Fixes: 9bcbf97c62 ("mips: add system call table generation support")
Cc: <stable@vger.kernel.org> # v5.0+
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: Paul Burton <paulburton@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Rob Herring <robh@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
2020-01-20 15:39:34 -08:00
Alexander Lobakin
16202c0957 MIPS: boot: fix typo in 'vmlinux.lzma.its' target
Commit 92b34a9763 ("MIPS: boot: add missing targets for vmlinux.*.its")
fixed constant rebuild of *.its files on every make invocation, but due
to typo ("lzmo") it made no sense for vmlinux.lzma.its.

Fixes: 92b34a9763 ("MIPS: boot: add missing targets for vmlinux.*.its")
Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
[paulburton@kernel.org: s/invokation/invocation/]
Signed-off-by: Paul Burton <paulburton@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Rob Herring <robh@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
2020-01-20 15:39:06 -08:00
Bart Van Assche
17c5f65db6 scsi: qla2xxx: Fix a NULL pointer dereference in an error path
This patch fixes the following Coverity complaint:

FORWARD_NULL

qla_init.c: 5275 in qla2x00_configure_local_loop()
5269
5270     		if (fcport->scan_state == QLA_FCPORT_FOUND)
5271     			qla24xx_fcport_handle_login(vha, fcport);
5272     	}
5273
5274     cleanup_allocation:
>>>     CID 353340:    (FORWARD_NULL)
>>>     Passing null pointer "new_fcport" to "qla2x00_free_fcport", which dereferences it.
5275     	qla2x00_free_fcport(new_fcport);
5276
5277     	if (rval != QLA_SUCCESS) {
5278     		ql_dbg(ql_dbg_disc, vha, 0x2098,
5279     		    "Configure local loop error exit: rval=%x.\n", rval);
5280     	}
qla_init.c: 5275 in qla2x00_configure_local_loop()
5269
5270     		if (fcport->scan_state == QLA_FCPORT_FOUND)
5271     			qla24xx_fcport_handle_login(vha, fcport);
5272     	}
5273
5274     cleanup_allocation:
>>>     CID 353340:    (FORWARD_NULL)
>>>     Passing null pointer "new_fcport" to "qla2x00_free_fcport", which dereferences it.
5275     	qla2x00_free_fcport(new_fcport);
5276
5277     	if (rval != QLA_SUCCESS) {
5278     		ql_dbg(ql_dbg_disc, vha, 0x2098,
5279     		    "Configure local loop error exit: rval=%x.\n", rval);
5280     	}

Fixes: 3dae220595 ("scsi: qla2xxx: Use common routine to free fcport struct")
Cc: Himanshu Madhani <hmadhani@marvell.com>
Cc: Quinn Tran <qutran@marvell.com>
Cc: Martin Wilck <mwilck@suse.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Roman Bolshakov <r.bolshakov@yadro.com>
Link: https://lore.kernel.org/r/20200118042056.32232-1-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-01-20 18:38:20 -05:00
Alexander Lobakin
a53998802e MIPS: fix indentation of the 'RELOCS' message
quiet_cmd_relocs lacks a whitespace which results in:

  LD      vmlinux
  SORTEX  vmlinux
  SYSMAP  System.map
  RELOCS vmlinux
  Building modules, stage 2.
  MODPOST 64 modules

After this patch:

  LD      vmlinux
  SORTEX  vmlinux
  SYSMAP  System.map
  RELOCS  vmlinux
  Building modules, stage 2.
  MODPOST 64 modules

Typo is present in kernel tree since the introduction of relocatable
kernel support in commit e818fac595 ("MIPS: Generate relocation table
when CONFIG_RELOCATABLE"), but the relocation scripts were moved to
Makefile.postlink later with commit 44079d3509 ("MIPS: Use
Makefile.postlink to insert relocations into vmlinux").

Fixes: 44079d3509 ("MIPS: Use Makefile.postlink to insert relocations into vmlinux")
Cc: <stable@vger.kernel.org> # v4.11+
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
[paulburton@kernel.org: Fixup commit references in commit message.]
Signed-off-by: Paul Burton <paulburton@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Rob Herring <robh@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
2020-01-20 15:37:30 -08:00
Thomas Bogendoerfer
6ca67a8e7b scsi: qla1280: Make checking for 64bit support consistent
Use #ifdef QLA_64BIT_PTR to check if 64bit support is enabled.  This
fixes ("scsi: qla1280: Fix dma firmware download, if dma address is
64bit").

Link: https://lore.kernel.org/r/20200117115628.13219-1-tbogendoerfer@suse.de
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-01-20 18:35:55 -05:00
zhengbin
aa124436f3 xfs: change return value of xfs_inode_need_cow to int
Fixes coccicheck warning:

fs/xfs/xfs_reflink.c:236:9-10: WARNING: return of 0/1 in function 'xfs_inode_need_cow' with return type bool

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
[darrick: rename the function so it doesn't sound like a predicate]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-01-20 14:34:47 -08:00
Hangbin Liu
f1c3656c6d selftests/bpf: Skip perf hw events test if the setup disabled it
The same with commit 4e59afbbed ("selftests/bpf: skip nmi test when perf
hw events are disabled"), it would make more sense to skip the
test_stacktrace_build_id_nmi test if the setup (e.g. virtual machines) has
disabled hardware perf events.

Fixes: 13790d1cc7 ("bpf: add selftest for stackmap with build_id in NMI context")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200117100656.10359-1-liuhangbin@gmail.com
2020-01-20 23:26:58 +01:00
Stanislav Fomichev
396b83fa75 selftests/bpf: Don't check for btf fd in test_btf
After commit 0d13bfce02 ("libbpf: Don't require root for
bpf_object__open()") we no longer load BTF during bpf_object__open(),
so let's remove the expectation from test_btf that the fd is not -1.
The test currently fails.

Before:
BTF libbpf test[1] (test_btf_haskv.o): do_test_file:4152:FAIL bpf_object__btf_fd: -1
BTF libbpf test[2] (test_btf_newkv.o): do_test_file:4152:FAIL bpf_object__btf_fd: -1
BTF libbpf test[3] (test_btf_nokv.o): do_test_file:4152:FAIL bpf_object__btf_fd: -1

After:
BTF libbpf test[1] (test_btf_haskv.o): OK
BTF libbpf test[2] (test_btf_newkv.o): OK
BTF libbpf test[3] (test_btf_nokv.o): OK

Fixes: 0d13bfce02 ("libbpf: Don't require root for bpf_object__open()")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200118010546.74279-1-sdf@google.com
2020-01-20 22:49:03 +01:00
Brian Vazquez
2e3a94aa2b bpf: Fix memory leaks in generic update/delete batch ops
Generic update/delete batch ops functions were using __bpf_copy_key
without properly freeing the memory. Handle the memory allocation and
copy_from_user separately.

Fixes: aa2e93b8e5 ("bpf: Add generic support for update and delete batch ops")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200119194040.128369-1-brianvv@google.com
2020-01-20 22:27:51 +01:00
Masami Ichikawa
bf24daac8f tracing: Do not set trace clock if tracefs lockdown is in effect
When trace_clock option is not set and unstable clcok detected,
tracing_set_default_clock() sets trace_clock(ThinkPad A285 is one of
case). In that case, if lockdown is in effect, null pointer
dereference error happens in ring_buffer_set_clock().

Link: http://lkml.kernel.org/r/20200116131236.3866925-1-masami256@gmail.com

Cc: stable@vger.kernel.org
Fixes: 17911ff38a ("tracing: Add locked_down checks to the open calls of files created for tracefs")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1788488
Signed-off-by: Masami Ichikawa <masami256@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-20 16:18:14 -05:00
Steven Rostedt (VMware)
8bcebc77e8 tracing: Fix histogram code when expression has same var as value
While working on a tool to convert SQL syntex into the histogram language of
the kernel, I discovered the following bug:

 # echo 'first u64 start_time u64 end_time pid_t pid u64 delta' >> synthetic_events
 # echo 'hist:keys=pid:start=common_timestamp' > events/sched/sched_waking/trigger
 # echo 'hist:keys=next_pid:delta=common_timestamp-$start,start2=$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger

Would not display any histograms in the sched_switch histogram side.

But if I were to swap the location of

  "delta=common_timestamp-$start" with "start2=$start"

Such that the last line had:

 # echo 'hist:keys=next_pid:start2=$start,delta=common_timestamp-$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger

The histogram works as expected.

What I found out is that the expressions clear out the value once it is
resolved. As the variables are resolved in the order listed, when
processing:

  delta=common_timestamp-$start

The $start is cleared. When it gets to "start2=$start", it errors out with
"unresolved symbol" (which is silent as this happens at the location of the
trace), and the histogram is dropped.

When processing the histogram for variable references, instead of adding a
new reference for a variable used twice, use the same reference. That way,
not only is it more efficient, but the order will no longer matter in
processing of the variables.

From Tom Zanussi:

 "Just to clarify some more about what the problem was is that without
  your patch, we would have two separate references to the same variable,
  and during resolve_var_refs(), they'd both want to be resolved
  separately, so in this case, since the first reference to start wasn't
  part of an expression, it wouldn't get the read-once flag set, so would
  be read normally, and then the second reference would do the read-once
  read and also be read but using read-once.  So everything worked and
  you didn't see a problem:

   from: start2=$start,delta=common_timestamp-$start

  In the second case, when you switched them around, the first reference
  would be resolved by doing the read-once, and following that the second
  reference would try to resolve and see that the variable had already
  been read, so failed as unset, which caused it to short-circuit out and
  not do the trigger action to generate the synthetic event:

   to: delta=common_timestamp-$start,start2=$start

  With your patch, we only have the single resolution which happens
  correctly the one time it's resolved, so this can't happen."

Link: https://lore.kernel.org/r/20200116154216.58ca08eb@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 067fe038e7 ("tracing: Add variable reference handling to hist triggers")
Reviewed-by: Tom Zanuss <zanussi@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-20 16:11:47 -05:00
Olof Johansson
684415d0de Merge tag 'v5.5-next-soc' of https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux into arm/drivers
cmdq:
- clean ups of unused code and debuggability
- add cmdq_instruction to make the function call interface more readable
- add functions for polling and providing info for the user of cmdq

scpsys:
- add bindings for MT6765

* tag 'v5.5-next-soc' of https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux:
  dt-bindings: mediatek: add MT6765 power dt-bindings
  soc: mediatek: cmdq: delete not used define
  soc: mediatek: cmdq: add cmdq_dev_get_client_reg function
  soc: mediatek: cmdq: add polling function
  soc: mediatek: cmdq: define the instruction struct
  soc: mediatek: cmdq: remove OR opertaion from err return

Link: https://lore.kernel.org/r/9b365e76-e346-f813-d750-d7cfd0d16e4e@gmail.com
Signed-off-by: Olof Johansson <olof@lixom.net>
2020-01-20 11:26:10 -08:00
Linus Torvalds
d96d875ef5 Merge tag 'fixes_for_v5.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull reiserfs fix from Jan Kara:
 "A fixup of a recently merged reiserfs fix which has caused problem
  when xattrs were not compiled in"

* tag 'fixes_for_v5.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  reiserfs: fix handling of -EOPNOTSUPP in reiserfs_for_each_xattr
2020-01-20 11:24:13 -08:00
Olof Johansson
96b34bac41 Merge tag 'arm-soc/for-5.6/maintainers' of https://github.com/Broadcom/stblinux into arm/drivers
This pull request contains Broadcom SoCs MAINTAINERS file updates for
5.6, please pull the following:

- Nicolas adds an entry for the Broadcom STB PCIe Root Complex files for
  both BCM7xxx (actual STB SoCs) and BCM2711 (Raspberry Pi 4).

* tag 'arm-soc/for-5.6/maintainers' of https://github.com/Broadcom/stblinux:
  MAINTAINERS: Add brcmstb PCIe controller entry

Link: https://lore.kernel.org/r/20200118032935.1346-1-f.fainelli@gmail.com
Signed-off-by: Olof Johansson <olof@lixom.net>
2020-01-20 11:22:49 -08:00
Olof Johansson
1342a6aa4a Merge tag 'samsung-defconfig-5.6' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into arm/defconfig
Samsung defconfig changes for v5.6

1. Bring back explicitly wanted options which were removed through
   `make savedefconfig`.  savedefconfig removes options selected by
   other symbol, however developers of this other symbol can remove
   anytime 'select' statement.
2. Enable NFS v4.1 and v4.2, useful in testing/CI systems.
3. Enable thermal throttling through devfreq framework.

* tag 'samsung-defconfig-5.6' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
  ARM: multi_v7_defconfig: Enable devfreq thermal integration
  ARM: exynos_defconfig: Enable devfreq thermal integration
  ARM: multi_v7_defconfig: Enable NFS v4.1 and v4.2
  ARM: exynos_defconfig: Enable NFS v4.1 and v4.2
  ARM: exynos_defconfig: Bring back explicitly wanted options

Link: https://lore.kernel.org/r/20200120180227.9061-1-krzk@kernel.org
Signed-off-by: Olof Johansson <olof@lixom.net>
2020-01-20 11:20:23 -08:00
Kevin Hao
0f394daef8 irqdomain: Fix a memory leak in irq_domain_push_irq()
Fix a memory leak reported by kmemleak:
unreferenced object 0xffff000bc6f50e80 (size 128):
  comm "kworker/23:2", pid 201, jiffies 4294894947 (age 942.132s)
  hex dump (first 32 bytes):
    00 00 00 00 41 00 00 00 86 c0 03 00 00 00 00 00  ....A...........
    00 a0 b2 c6 0b 00 ff ff 40 51 fd 10 00 80 ff ff  ........@Q......
  backtrace:
    [<00000000e62d2240>] kmem_cache_alloc_trace+0x1a4/0x320
    [<00000000279143c9>] irq_domain_push_irq+0x7c/0x188
    [<00000000d9f4c154>] thunderx_gpio_probe+0x3ac/0x438
    [<00000000fd09ec22>] pci_device_probe+0xe4/0x198
    [<00000000d43eca75>] really_probe+0xdc/0x320
    [<00000000d3ebab09>] driver_probe_device+0x5c/0xf0
    [<000000005b3ecaa0>] __device_attach_driver+0x88/0xc0
    [<000000004e5915f5>] bus_for_each_drv+0x7c/0xc8
    [<0000000079d4db41>] __device_attach+0xe4/0x140
    [<00000000883bbda9>] device_initial_probe+0x18/0x20
    [<000000003be59ef6>] bus_probe_device+0x98/0xa0
    [<0000000039b03d3f>] deferred_probe_work_func+0x74/0xa8
    [<00000000870934ce>] process_one_work+0x1c8/0x470
    [<00000000e3cce570>] worker_thread+0x1f8/0x428
    [<000000005d64975e>] kthread+0xfc/0x128
    [<00000000f0eaa764>] ret_from_fork+0x10/0x18

Fixes: 495c38d300 ("irqdomain: Add irq_domain_{push,pop}_irq() functions")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200120043547.22271-1-haokexin@gmail.com
2020-01-20 19:10:05 +00:00
Joakim Zhang
2fbb13961e irqchip: Add NXP INTMUX interrupt multiplexer support
The Interrupt Multiplexer (INTMUX) expands the number of peripherals
that can interrupt the core:
* The INTMUX has 8 channels that are assigned to 8 NVIC interrupt slots.
* Each INTMUX channel can receive up to 32 interrupt sources and has 1
  interrupt output.
* The INTMUX routes the interrupt sources to the interrupt outputs.

Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200117060653.27485-3-qiangqing.zhang@nxp.com
2020-01-20 19:10:05 +00:00
Joakim Zhang
618ea6275b dt-bindings: interrupt-controller: Add binding for NXP INTMUX interrupt multiplexer
This patch adds the DT bindings for the NXP INTMUX interrupt multiplexer
for i.MX8 family SoCs.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20200117060653.27485-2-qiangqing.zhang@nxp.com
2020-01-20 19:10:05 +00:00
Hyunki Koo
b74416dba3 irqchip: Define EXYNOS_IRQ_COMBINER
This patch is written to clean up dependency of ARCH_EXYNOS
Not all exynos device have IRQ_COMBINER, especially aarch64 EXYNOS
but it is built for all exynos devices.
Thus add the config for EXYNOS_IRQ_COMBINER
remove direct dependency between ARCH_EXYNOS and exynos-combiner.c
and only selected on the aarch32 devices

Signed-off-by: Hyunki Koo <hyunki00.koo@samsung.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20191224211108.7128-1-hyunki00.koo@gmail.com
2020-01-20 19:10:05 +00:00
Qianggui Song
8f78bd62bd irqchip/meson-gpio: Add support for meson a1 SoCs
The meson a1 Socs have some changes compared with previous
chips. For A113L, it contains 62 pins and can be spied on:

- 62:128 undefined
- 61:50 12 pins on bank A
- 49:37 13 pins on bank F
- 36:20 17 pins on bank X
- 19:13 7  pins on bank B
- 12:0  13 pins on bank P

There are five relative registers for gpio interrupt controller,
details are as below:

- PADCTRL_GPIO_IRQ_CTRL0
  bit[31]:    enable/disable the whole irq lines
  bit[16-23]: both edge trigger
  bit[8-15]:  single edge trigger
  bit[0-7]:   pol trigger

- PADCTRL_GPIO_IRQ_CTRL[X]
  bit[0-6]:   7 bits to choose gpio source for irq line 2*[X] - 2
  bit[16-22]: 7 bits to choose gpio source for irq line 2*[X] - 1
  where X =1,2,3,4

Signed-off-by: Qianggui Song <qianggui.song@amlogic.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191216123645.10099-4-qianggui.song@amlogic.com
2020-01-20 19:10:05 +00:00
Qianggui Song
e2514165f3 irqchip/meson-gpio: Rework meson irqchip driver to support meson-A1 SoCs
Since Meson-A1 SoCs register layout of gpio interrupt controller has
difference with previous chips, registers to decide irq line and offset
of trigger method are all changed, the current driver should be modified.

Signed-off-by: Qianggui Song <qianggui.song@amlogic.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191216123645.10099-3-qianggui.song@amlogic.com
2020-01-20 19:10:05 +00:00
Qianggui Song
fd6765b4c9 dt-bindings: interrupt-controller: New binding for Meson-A1 SoCs
Update dt-binding document for GPIO interrupt controller of Meson-A1 SoCs

Signed-off-by: Qianggui Song <qianggui.song@amlogic.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20191216123645.10099-2-qianggui.song@amlogic.com
2020-01-20 19:10:05 +00:00
John Garry
d6152e6ec9 irqchip/mbigen: Set driver .suppress_bind_attrs to avoid remove problems
The following crash can be seen for setting
CONFIG_DEBUG_TEST_DRIVER_REMOVE=y for DT FW (which some people still use):

Hisilicon MBIGEN-V2 60080000.interrupt-controller: Failed to create mbi-gen irqdomain
Hisilicon MBIGEN-V2: probe of 60080000.interrupt-controller failed with error -12

[...]

Unable to handle kernel paging request at virtual address 0000000000005008
 Mem abort info:
   ESR = 0x96000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=0000041fb9990000
 [0000000000005008] pgd=0000000000000000
 Internal error: Oops: 96000004 [#1] PREEMPT SMP
 Modules linked in:
 CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc6-00002-g3fc42638a506-dirty #1622
 Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
 pstate: 40000085 (nZcv daIf -PAN -UAO)
 pc : mbigen_set_type+0x38/0x60
 lr : __irq_set_trigger+0x6c/0x188
 sp : ffff800014b4b400
 x29: ffff800014b4b400 x28: 0000000000000007
 x27: 0000000000000000 x26: 0000000000000000
 x25: ffff041fd83bd0d4 x24: ffff041fd83bd188
 x23: 0000000000000000 x22: ffff80001193ce00
 x21: 0000000000000004 x20: 0000000000000000
 x19: ffff041fd83bd000 x18: ffffffffffffffff
 x17: 0000000000000000 x16: 0000000000000000
 x15: ffff8000119098c8 x14: ffff041fb94ec91c
 x13: ffff041fb94ec1a1 x12: 0000000000000030
 x11: 0101010101010101 x10: 0000000000000040
 x9 : 0000000000000000 x8 : ffff041fb98c6680
 x7 : ffff800014b4b380 x6 : ffff041fd81636c8
 x5 : 0000000000000000 x4 : 000000000000025f
 x3 : 0000000000005000 x2 : 0000000000005008
 x1 : 0000000000000004 x0 : 0000000080000000
 Call trace:
  mbigen_set_type+0x38/0x60
  __setup_irq+0x744/0x900
  request_threaded_irq+0xe0/0x198
  pcie_pme_probe+0x98/0x118
  pcie_port_probe_service+0x38/0x78
  really_probe+0xa0/0x3e0
  driver_probe_device+0x58/0x100
  __device_attach_driver+0x90/0xb0
  bus_for_each_drv+0x64/0xc8
  __device_attach+0xd8/0x138
  device_initial_probe+0x10/0x18
  bus_probe_device+0x90/0x98
  device_add+0x4c4/0x770
  device_register+0x1c/0x28
  pcie_port_device_register+0x1e4/0x4f0
  pcie_portdrv_probe+0x34/0xd8
  local_pci_probe+0x3c/0xa0
  pci_device_probe+0x128/0x1c0
  really_probe+0xa0/0x3e0
  driver_probe_device+0x58/0x100
  __device_attach_driver+0x90/0xb0
  bus_for_each_drv+0x64/0xc8
  __device_attach+0xd8/0x138
  device_attach+0x10/0x18
  pci_bus_add_device+0x4c/0xb8
  pci_bus_add_devices+0x38/0x88
  pci_host_probe+0x3c/0xc0
  pci_host_common_probe+0xf0/0x208
  hisi_pcie_almost_ecam_probe+0x24/0x30
  platform_drv_probe+0x50/0xa0
  really_probe+0xa0/0x3e0
  driver_probe_device+0x58/0x100
  device_driver_attach+0x6c/0x90
  __driver_attach+0x84/0xc8
  bus_for_each_dev+0x74/0xc8
  driver_attach+0x20/0x28
  bus_add_driver+0x148/0x1f0
  driver_register+0x60/0x110
  __platform_driver_register+0x40/0x48
  hisi_pcie_almost_ecam_driver_init+0x1c/0x24

The specific problem here is that the mbigen driver real probe has failed
as the mbigen_of_create_domain()->of_platform_device_create() call fails,
the reason for that being that we never destroyed the platform device
created during the remove test dry run and there is some conflict.

Since we generally would never want to unbind this driver, and to save
adding a driver tear down path for that, just set the driver
.suppress_bind_attrs member to avoid this possibility.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://lore.kernel.org/r/1579196323-180137-1-git-send-email-john.garry@huawei.com
2020-01-20 19:10:04 +00:00
Eddie James
04f605906f irqchip: Add Aspeed SCU interrupt controller
The Aspeed SOCs provide some interrupts through the System Control
Unit registers. Add an interrupt controller that provides these
interrupts to the system.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://lore.kernel.org/r/1579123790-6894-3-git-send-email-eajames@linux.ibm.com
2020-01-20 19:10:04 +00:00
Eddie James
5350a237b4 dt-bindings: interrupt-controller: Add Aspeed SCU interrupt controller
Document the Aspeed SCU interrupt controller and add an include file
for the interrupts it provides.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://lore.kernel.org/r/1579123790-6894-2-git-send-email-eajames@linux.ibm.com
2020-01-20 19:10:03 +00:00
Yash Shah
96868dce64 gpio/sifive: Add GPIO driver for SiFive SoCs
Adds the GPIO driver for SiFive RISC-V SoCs.

Signed-off-by: Wesley W. Terpstra <wesley@sifive.com>
[Atish: Various fixes and code cleanup]
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/1575976274-13487-6-git-send-email-yash.shah@sifive.com
2020-01-20 19:10:03 +00:00
Enric Balletbo i Serra
3d7610e8da regulator: core: Fix exported symbols to the exported GPL version
Change the exported symbols introduced by commit e915331149
("regulator: vctrl-regulator: Avoid deadlock getting and setting the voltage")
from EXPORT_SYMBOL() to EXPORT_SYMBOL_GPL(), like is used for all the core
parts.

Fixes: e915331149 ("regulator: vctrl-regulator: Avoid deadlock getting and setting the voltage")
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Link: https://lore.kernel.org/r/20200120123921.1204339-1-enric.balletbo@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-20 19:09:48 +00:00
Gustavo A. R. Silva
c878465715 remoteproc: use struct_size() helper
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct fw_rsc_vdev {
	...
        struct fw_rsc_vdev_vring vring[0];
} __packed;

Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.

So, replace the following form:

sizeof(*rsc) + rsc->num_of_vrings * sizeof(struct fw_rsc_vdev_vring)

with:

struct_size(rsc, vring, rsc->num_of_vrings)

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Link: https://lore.kernel.org/r/20190830151406.GA23274@embeddedor
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2020-01-20 10:50:54 -08:00
Eric Biggers
50d9fad73a ubifs: use IS_ENCRYPTED() instead of ubifs_crypt_is_encrypted()
There's no need for the ubifs_crypt_is_encrypted() function anymore.
Just use IS_ENCRYPTED() instead, like ext4 and f2fs do.  IS_ENCRYPTED()
checks the VFS-level flag instead of the UBIFS-specific flag, but it
shouldn't change any behavior since the flags are kept in sync.

Link: https://lore.kernel.org/r/20191209212721.244396-1-ebiggers@kernel.org
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-20 10:43:46 -08:00
Brandon Maier
a8f40111d1 remoteproc: Initialize rproc_class before use
The remoteproc_core and remoteproc drivers all initialize with module_init().
However remoteproc drivers need the rproc_class during their probe. If one of
the remoteproc drivers runs init and gets through probe before
remoteproc_init() runs, a NULL pointer access of rproc_class's `glue_dirs`
spinlock occurs.

> Unable to handle kernel NULL pointer dereference at virtual address 000000dc
> pgd = c0004000
> [000000dc] *pgd=00000000
> Internal error: Oops: 5 [#1] PREEMPT ARM
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper Tainted: G        W       4.14.106-rt56 #1
> Hardware name: Generic OMAP36xx (Flattened Device Tree)
> task: c6050000 task.stack: c604a000
> PC is at rt_spin_lock+0x40/0x6c
> LR is at rt_spin_lock+0x28/0x6c
> pc : [<c0523c90>]    lr : [<c0523c78>]    psr: 60000013
> sp : c604bdc0  ip : 00000000  fp : 00000000
> r10: 00000000  r9 : c61c7c10  r8 : c6269c20
> r7 : c0905888  r6 : c6269c20  r5 : 00000000  r4 : 000000d4
> r3 : 000000dc  r2 : c6050000  r1 : 00000002  r0 : 000000d4
> Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
...
> [<c0523c90>] (rt_spin_lock) from [<c03b65a4>] (get_device_parent+0x54/0x17c)
> [<c03b65a4>] (get_device_parent) from [<c03b6bec>] (device_add+0xe0/0x5b4)
> [<c03b6bec>] (device_add) from [<c042adf4>] (rproc_add+0x18/0xd8)
> [<c042adf4>] (rproc_add) from [<c01110e4>] (my_rproc_probe+0x158/0x204)
> [<c01110e4>] (my_rproc_probe) from [<c03bb6b8>] (platform_drv_probe+0x34/0x70)
> [<c03bb6b8>] (platform_drv_probe) from [<c03b9dd4>] (driver_probe_device+0x2c8/0x420)
> [<c03b9dd4>] (driver_probe_device) from [<c03ba02c>] (__driver_attach+0x100/0x11c)
> [<c03ba02c>] (__driver_attach) from [<c03b7d08>] (bus_for_each_dev+0x7c/0xc0)
> [<c03b7d08>] (bus_for_each_dev) from [<c03b910c>] (bus_add_driver+0x1cc/0x264)
> [<c03b910c>] (bus_add_driver) from [<c03ba714>] (driver_register+0x78/0xf8)
> [<c03ba714>] (driver_register) from [<c010181c>] (do_one_initcall+0x100/0x190)
> [<c010181c>] (do_one_initcall) from [<c0800de8>] (kernel_init_freeable+0x130/0x1d0)
> [<c0800de8>] (kernel_init_freeable) from [<c051eee8>] (kernel_init+0x8/0x114)
> [<c051eee8>] (kernel_init) from [<c01175b0>] (ret_from_fork+0x14/0x24)
> Code: e2843008 e3c2203f f5d3f000 e5922010 (e193cf9f)
> ---[ end trace 0000000000000002 ]---

Signed-off-by: Brandon Maier <brandon.maier@rockwellcollins.com>
Link: https://lore.kernel.org/r/20190530225223.136420-1-brandon.maier@rockwellcollins.com
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2020-01-20 10:41:34 -08:00
Pi-Hsun Shih
7017996951 rpmsg: add rpmsg support for mt8183 SCP.
Add a simple rpmsg support for mt8183 SCP, that use IPI / IPC directly.

Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org>
Link: https://lore.kernel.org/r/20191112110330.179649-4-pihsun@chromium.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2020-01-20 10:29:56 -08:00
Erin Lo
63c13d61ea remoteproc/mediatek: add SCP support for mt8183
Provide a basic driver to control Cortex M4 co-processor

Signed-off-by: Erin Lo <erin.lo@mediatek.com>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org>
Link: https://lore.kernel.org/r/20191112110330.179649-3-pihsun@chromium.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2020-01-20 10:29:54 -08:00
Erin Lo
e47e98877b dt-bindings: Add a binding for Mediatek SCP
Add a DT binding documentation of SCP for the
MT8183 SoC from Mediatek.

Signed-off-by: Erin Lo <erin.lo@mediatek.com>
Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20191112110330.179649-2-pihsun@chromium.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2020-01-20 10:29:51 -08:00
Kadlecsik József
32c72165db netfilter: ipset: use bitmap infrastructure completely
The bitmap allocation did not use full unsigned long sizes
when calculating the required size and that was triggered by KASAN
as slab-out-of-bounds read in several places. The patch fixes all
of them.

Reported-by: syzbot+fabca5cbf5e54f3fe2de@syzkaller.appspotmail.com
Reported-by: syzbot+827ced406c9a1d9570ed@syzkaller.appspotmail.com
Reported-by: syzbot+190d63957b22ef673ea5@syzkaller.appspotmail.com
Reported-by: syzbot+dfccdb2bdb4a12ad425e@syzkaller.appspotmail.com
Reported-by: syzbot+df0d0f5895ef1f41a65b@syzkaller.appspotmail.com
Reported-by: syzbot+b08bd19bb37513357fd4@syzkaller.appspotmail.com
Reported-by: syzbot+53cdd0ec0bbabd53370a@syzkaller.appspotmail.com
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-20 17:41:45 +01:00
Xiaochen Shen
32ada3b9e0 x86/resctrl: Clean up unused function parameter in mkdir path
Commit

  334b0f4e9b ("x86/resctrl: Fix a deadlock due to inaccurate reference")

changed the argument to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock()
within mkdir_rdt_prepare(). That change resulted in an unused function
parameter to mkdir_rdt_prepare().

Clean up the unused function parameter in mkdir_rdt_prepare() and its
callers rdtgroup_mkdir_mon() and rdtgroup_mkdir_ctrl_mon().

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1578500886-21771-5-git-send-email-xiaochen.shen@intel.com
2020-01-20 17:00:41 +01:00
Jessica Yu
708e0ada19 module: avoid setting info->name early in case we can fall back to info->mod->name
In setup_load_info(), info->name (which contains the name of the module,
mostly used for early logging purposes before the module gets set up)
gets unconditionally assigned if .modinfo is missing despite the fact
that there is an if (!info->name) check near the end of the function.
Avoid assigning a placeholder string to info->name if .modinfo doesn't
exist, so that we can fall back to info->mod->name later on.

Fixes: 5fdc7db644 ("module: setup load info before module_sig_check()")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2020-01-20 16:59:39 +01:00
Xiaochen Shen
334b0f4e9b x86/resctrl: Fix a deadlock due to inaccurate reference
There is a race condition which results in a deadlock when rmdir and
mkdir execute concurrently:

$ ls /sys/fs/resctrl/c1/mon_groups/m1/
cpus  cpus_list  mon_data  tasks

Thread 1: rmdir /sys/fs/resctrl/c1
Thread 2: mkdir /sys/fs/resctrl/c1/mon_groups/m1

3 locks held by mkdir/48649:
 #0:  (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
 #1:  (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c13b>] filename_create+0x7b/0x170
 #2:  (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70

4 locks held by rmdir/48652:
 #0:  (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
 #1:  (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c3cf>] do_rmdir+0x13f/0x1e0
 #2:  (&type->i_mutex_dir_key#8){++++}, at: [<ffffffffb4c86d5d>] vfs_rmdir+0x4d/0x120
 #3:  (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70

Thread 1 is deleting control group "c1". Holding rdtgroup_mutex,
kernfs_remove() removes all kernfs nodes under directory "c1"
recursively, then waits for sub kernfs node "mon_groups" to drop active
reference.

Thread 2 is trying to create a subdirectory "m1" in the "mon_groups"
directory. The wrapper kernfs_iop_mkdir() takes an active reference to
the "mon_groups" directory but the code drops the active reference to
the parent directory "c1" instead.

As a result, Thread 1 is blocked on waiting for active reference to drop
and never release rdtgroup_mutex, while Thread 2 is also blocked on
trying to get rdtgroup_mutex.

Thread 1 (rdtgroup_rmdir)   Thread 2 (rdtgroup_mkdir)
(rmdir /sys/fs/resctrl/c1)  (mkdir /sys/fs/resctrl/c1/mon_groups/m1)
-------------------------   -------------------------
                            kernfs_iop_mkdir
                              /*
                               * kn: "m1", parent_kn: "mon_groups",
                               * prgrp_kn: parent_kn->parent: "c1",
                               *
                               * "mon_groups", parent_kn->active++: 1
                               */
                              kernfs_get_active(parent_kn)
kernfs_iop_rmdir
  /* "c1", kn->active++ */
  kernfs_get_active(kn)

  rdtgroup_kn_lock_live
    atomic_inc(&rdtgrp->waitcount)
    /* "c1", kn->active-- */
    kernfs_break_active_protection(kn)
    mutex_lock

  rdtgroup_rmdir_ctrl
    free_all_child_rdtgrp
      sentry->flags = RDT_DELETED

    rdtgroup_ctrl_remove
      rdtgrp->flags = RDT_DELETED
      kernfs_get(kn)
      kernfs_remove(rdtgrp->kn)
        __kernfs_remove
          /* "mon_groups", sub_kn */
          atomic_add(KN_DEACTIVATED_BIAS, &sub_kn->active)
          kernfs_drain(sub_kn)
            /*
             * sub_kn->active == KN_DEACTIVATED_BIAS + 1,
             * waiting on sub_kn->active to drop, but it
             * never drops in Thread 2 which is blocked
             * on getting rdtgroup_mutex.
             */
Thread 1 hangs here ---->
            wait_event(sub_kn->active == KN_DEACTIVATED_BIAS)
            ...
                              rdtgroup_mkdir
                                rdtgroup_mkdir_mon(parent_kn, prgrp_kn)
                                  mkdir_rdt_prepare(parent_kn, prgrp_kn)
                                    rdtgroup_kn_lock_live(prgrp_kn)
                                      atomic_inc(&rdtgrp->waitcount)
                                      /*
                                       * "c1", prgrp_kn->active--
                                       *
                                       * The active reference on "c1" is
                                       * dropped, but not matching the
                                       * actual active reference taken
                                       * on "mon_groups", thus causing
                                       * Thread 1 to wait forever while
                                       * holding rdtgroup_mutex.
                                       */
                                      kernfs_break_active_protection(
                                                               prgrp_kn)
                                      /*
                                       * Trying to get rdtgroup_mutex
                                       * which is held by Thread 1.
                                       */
Thread 2 hangs here ---->             mutex_lock
                                      ...

The problem is that the creation of a subdirectory in the "mon_groups"
directory incorrectly releases the active protection of its parent
directory instead of itself before it starts waiting for rdtgroup_mutex.
This is triggered by the rdtgroup_mkdir() flow calling
rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() with kernfs node of the
parent control group ("c1") as argument. It should be called with kernfs
node "mon_groups" instead. What is currently missing is that the
kn->priv of "mon_groups" is NULL instead of pointing to the rdtgrp.

Fix it by pointing kn->priv to rdtgrp when "mon_groups" is created. Then
it could be passed to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock()
instead. And then it operates on the same rdtgroup structure but handles
the active reference of kernfs node "mon_groups" to prevent deadlock.
The same changes are also made to the "mon_data" directories.

This results in some unused function parameters that will be cleaned up
in follow-up patch as the focus here is on the fix only in support of
backporting efforts.

Fixes: c7d9aac613 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1578500886-21771-4-git-send-email-xiaochen.shen@intel.com
2020-01-20 16:57:53 +01:00
Xiaochen Shen
074fadee59 x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup
There is a race condition in the following scenario which results in an
use-after-free issue when reading a monitoring file and deleting the
parent ctrl_mon group concurrently:

Thread 1 calls atomic_inc() to take refcount of rdtgrp and then calls
kernfs_break_active_protection() to drop the active reference of kernfs
node in rdtgroup_kn_lock_live().

In Thread 2, kernfs_remove() is a blocking routine. It waits on all sub
kernfs nodes to drop the active reference when removing all subtree
kernfs nodes recursively. Thread 2 could block on kernfs_remove() until
Thread 1 calls kernfs_break_active_protection(). Only after
kernfs_remove() completes the refcount of rdtgrp could be trusted.

Before Thread 1 calls atomic_inc() and kernfs_break_active_protection(),
Thread 2 could call kfree() when the refcount of rdtgrp (sentry) is 0
instead of 1 due to the race.

In Thread 1, in rdtgroup_kn_unlock(), referring to earlier rdtgrp memory
(rdtgrp->waitcount) which was already freed in Thread 2 results in
use-after-free issue.

Thread 1 (rdtgroup_mondata_show)  Thread 2 (rdtgroup_rmdir)
--------------------------------  -------------------------
rdtgroup_kn_lock_live
  /*
   * kn active protection until
   * kernfs_break_active_protection(kn)
   */
  rdtgrp = kernfs_to_rdtgroup(kn)
                                  rdtgroup_kn_lock_live
                                    atomic_inc(&rdtgrp->waitcount)
                                    mutex_lock
                                  rdtgroup_rmdir_ctrl
                                    free_all_child_rdtgrp
                                      /*
                                       * sentry->waitcount should be 1
                                       * but is 0 now due to the race.
                                       */
                                      kfree(sentry)*[1]
  /*
   * Only after kernfs_remove()
   * completes, the refcount of
   * rdtgrp could be trusted.
   */
  atomic_inc(&rdtgrp->waitcount)
  /* kn->active-- */
  kernfs_break_active_protection(kn)
                                    rdtgroup_ctrl_remove
                                      rdtgrp->flags = RDT_DELETED
                                      /*
                                       * Blocking routine, wait for
                                       * all sub kernfs nodes to drop
                                       * active reference in
                                       * kernfs_break_active_protection.
                                       */
                                      kernfs_remove(rdtgrp->kn)
                                  rdtgroup_kn_unlock
                                    mutex_unlock
                                    atomic_dec_and_test(
                                                &rdtgrp->waitcount)
                                    && (flags & RDT_DELETED)
                                      kernfs_unbreak_active_protection(kn)
                                      kfree(rdtgrp)
  mutex_lock
mon_event_read
rdtgroup_kn_unlock
  mutex_unlock
  /*
   * Use-after-free: refer to earlier rdtgrp
   * memory which was freed in [1].
   */
  atomic_dec_and_test(&rdtgrp->waitcount)
  && (flags & RDT_DELETED)
    /* kn->active++ */
    kernfs_unbreak_active_protection(kn)
    kfree(rdtgrp)

Fix it by moving free_all_child_rdtgrp() to after kernfs_remove() in
rdtgroup_rmdir_ctrl() to ensure it has the accurate refcount of rdtgrp.

Fixes: f3cbeacaa0 ("x86/intel_rdt/cqm: Add rmdir support")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1578500886-21771-3-git-send-email-xiaochen.shen@intel.com
2020-01-20 16:56:11 +01:00
Xiaochen Shen
b8511ccc75 x86/resctrl: Fix use-after-free when deleting resource groups
A resource group (rdtgrp) contains a reference count (rdtgrp->waitcount)
that indicates how many waiters expect this rdtgrp to exist. Waiters
could be waiting on rdtgroup_mutex or some work sitting on a task's
workqueue for when the task returns from kernel mode or exits.

The deletion of a rdtgrp is intended to have two phases:

  (1) while holding rdtgroup_mutex the necessary cleanup is done and
  rdtgrp->flags is set to RDT_DELETED,

  (2) after releasing the rdtgroup_mutex, the rdtgrp structure is freed
  only if there are no waiters and its flag is set to RDT_DELETED. Upon
  gaining access to rdtgroup_mutex or rdtgrp, a waiter is required to check
  for the RDT_DELETED flag.

When unmounting the resctrl file system or deleting ctrl_mon groups,
all of the subdirectories are removed and the data structure of rdtgrp
is forcibly freed without checking rdtgrp->waitcount. If at this point
there was a waiter on rdtgrp then a use-after-free issue occurs when the
waiter starts running and accesses the rdtgrp structure it was waiting
on.

See kfree() calls in [1], [2] and [3] in these two call paths in
following scenarios:
(1) rdt_kill_sb() -> rmdir_all_sub() -> free_all_child_rdtgrp()
(2) rdtgroup_rmdir() -> rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp()

There are several scenarios that result in use-after-free issue in
following:

Scenario 1:
-----------
In Thread 1, rdtgroup_tasks_write() adds a task_work callback
move_myself(). If move_myself() is scheduled to execute after Thread 2
rdt_kill_sb() is finished, referring to earlier rdtgrp memory
(rdtgrp->waitcount) which was already freed in Thread 2 results in
use-after-free issue.

Thread 1 (rdtgroup_tasks_write)        Thread 2 (rdt_kill_sb)
-------------------------------        ----------------------
rdtgroup_kn_lock_live
  atomic_inc(&rdtgrp->waitcount)
  mutex_lock
rdtgroup_move_task
  __rdtgroup_move_task
    /*
     * Take an extra refcount, so rdtgrp cannot be freed
     * before the call back move_myself has been invoked
     */
    atomic_inc(&rdtgrp->waitcount)
    /* Callback move_myself will be scheduled for later */
    task_work_add(move_myself)
rdtgroup_kn_unlock
  mutex_unlock
  atomic_dec_and_test(&rdtgrp->waitcount)
  && (flags & RDT_DELETED)
                                       mutex_lock
                                       rmdir_all_sub
                                         /*
                                          * sentry and rdtgrp are freed
                                          * without checking refcount
                                          */
                                         free_all_child_rdtgrp
                                           kfree(sentry)*[1]
                                         kfree(rdtgrp)*[2]
                                       mutex_unlock
/*
 * Callback is scheduled to execute
 * after rdt_kill_sb is finished
 */
move_myself
  /*
   * Use-after-free: refer to earlier rdtgrp
   * memory which was freed in [1] or [2].
   */
  atomic_dec_and_test(&rdtgrp->waitcount)
  && (flags & RDT_DELETED)
    kfree(rdtgrp)

Scenario 2:
-----------
In Thread 1, rdtgroup_tasks_write() adds a task_work callback
move_myself(). If move_myself() is scheduled to execute after Thread 2
rdtgroup_rmdir() is finished, referring to earlier rdtgrp memory
(rdtgrp->waitcount) which was already freed in Thread 2 results in
use-after-free issue.

Thread 1 (rdtgroup_tasks_write)        Thread 2 (rdtgroup_rmdir)
-------------------------------        -------------------------
rdtgroup_kn_lock_live
  atomic_inc(&rdtgrp->waitcount)
  mutex_lock
rdtgroup_move_task
  __rdtgroup_move_task
    /*
     * Take an extra refcount, so rdtgrp cannot be freed
     * before the call back move_myself has been invoked
     */
    atomic_inc(&rdtgrp->waitcount)
    /* Callback move_myself will be scheduled for later */
    task_work_add(move_myself)
rdtgroup_kn_unlock
  mutex_unlock
  atomic_dec_and_test(&rdtgrp->waitcount)
  && (flags & RDT_DELETED)
                                       rdtgroup_kn_lock_live
                                         atomic_inc(&rdtgrp->waitcount)
                                         mutex_lock
                                       rdtgroup_rmdir_ctrl
                                         free_all_child_rdtgrp
                                           /*
                                            * sentry is freed without
                                            * checking refcount
                                            */
                                           kfree(sentry)*[3]
                                         rdtgroup_ctrl_remove
                                           rdtgrp->flags = RDT_DELETED
                                       rdtgroup_kn_unlock
                                         mutex_unlock
                                         atomic_dec_and_test(
                                                     &rdtgrp->waitcount)
                                         && (flags & RDT_DELETED)
                                           kfree(rdtgrp)
/*
 * Callback is scheduled to execute
 * after rdt_kill_sb is finished
 */
move_myself
  /*
   * Use-after-free: refer to earlier rdtgrp
   * memory which was freed in [3].
   */
  atomic_dec_and_test(&rdtgrp->waitcount)
  && (flags & RDT_DELETED)
    kfree(rdtgrp)

If CONFIG_DEBUG_SLAB=y, Slab corruption on kmalloc-2k can be observed
like following. Note that "0x6b" is POISON_FREE after kfree(). The
corrupted bits "0x6a", "0x64" at offset 0x424 correspond to
waitcount member of struct rdtgroup which was freed:

  Slab corruption (Not tainted): kmalloc-2k start=ffff9504c5b0d000, len=2048
  420: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkjkkkkkkkkkkk
  Single bit error detected. Probably bad RAM.
  Run memtest86+ or a similar memory test tool.
  Next obj: start=ffff9504c5b0d800, len=2048
  000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk

  Slab corruption (Not tainted): kmalloc-2k start=ffff9504c58ab800, len=2048
  420: 6b 6b 6b 6b 64 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkdkkkkkkkkkkk
  Prev obj: start=ffff9504c58ab000, len=2048
  000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk

Fix this by taking reference count (waitcount) of rdtgrp into account in
the two call paths that currently do not do so. Instead of always
freeing the resource group it will only be freed if there are no waiters
on it. If there are waiters, the resource group will have its flags set
to RDT_DELETED.

It will be left to the waiter to free the resource group when it starts
running and finding that it was the last waiter and the resource group
has been removed (rdtgrp->flags & RDT_DELETED) since. (1) rdt_kill_sb()
-> rmdir_all_sub() -> free_all_child_rdtgrp() (2) rdtgroup_rmdir() ->
rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp()

Fixes: f3cbeacaa0 ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: 60cf5e101f ("x86/intel_rdt: Add mkdir to resctrl file system")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1578500886-21771-2-git-send-email-xiaochen.shen@intel.com
2020-01-20 16:45:43 +01:00
Anand Jain
a69976bc69 btrfs: device stats, log when stats are zeroed
We had a report indicating that some read errors aren't reported by the
device stats in the userland. It is important to have the errors
reported in the device stat as user land scripts might depend on it to
take the reasonable corrective actions. But to debug these issue we need
to be really sure that request to reset the device stat did not come
from the userland itself. So log an info message when device error reset
happens.

For example:
 BTRFS info (device sdc): device stats zeroed by btrfs(9223)

Reported-by: philip@philip-seeger.de
Link: https://www.spinics.net/lists/linux-btrfs/msg96528.html
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20 16:41:02 +01:00
Josef Bacik
556755a8a9 btrfs: fix improper setting of scanned for range cyclic write cache pages
We noticed that we were having regular CG OOM kills in cases where there
was still enough dirty pages to avoid OOM'ing.  It turned out there's
this corner case in btrfs's handling of range_cyclic where files that
were being redirtied were not getting fully written out because of how
we do range_cyclic writeback.

We unconditionally were setting scanned = 1; the first time we found any
pages in the inode.  This isn't actually what we want, we want it to be
set if we've scanned the entire file.  For range_cyclic we could be
starting in the middle or towards the end of the file, so we could write
one page and then not write any of the other dirty pages in the file
because we set scanned = 1.

Fix this by not setting scanned = 1 if we find pages.  The rules for
setting scanned should be

1) !range_cyclic.  In this case we have a specified range to write out.
2) range_cyclic && index == 0.  In this case we've started at the
   beginning and there is no need to loop around a second time.
3) range_cyclic && we started at index > 0 and we've reached the end of
   the file without satisfying our nr_to_write.

This patch fixes both of our writepages implementations to make sure
these rules hold true.  This fixed our over zealous CG OOMs in
production.

Fixes: d1310b2e0c ("Btrfs: Split the extent_map code into two parts")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20 16:41:02 +01:00
David Sterba
4babad1019 btrfs: safely advance counter when looking up bio csums
Dan's smatch tool reports

  fs/btrfs/file-item.c:295 btrfs_lookup_bio_sums()
  warn: should this be 'count == -1'

which points to the while (count--) loop. With count == 0 the check
itself could decrement it to -1. There's a WARN_ON a few lines below
that has never been seen in practice though.

It turns out that the value of page_bytes_left matches the count (by
sectorsize multiples). The loop never reaches the state where count
would go to -1, because page_bytes_left == 0 is found first and this
breaks out.

For clarity, use only plain check on count (and only for positive
value), decrement safely inside the loop. Any other discrepancy after
the whole bio list processing should be reported by the exising
WARN_ON_ONCE as well.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20 16:41:01 +01:00
David Sterba
94f8c46566 btrfs: remove unused member btrfs_device::work
This is a leftover from recently removed bio scheduling framework.

Fixes: ba8a9d0795 ("Btrfs: delete the entire async bio submission framework")
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20 16:41:01 +01:00