Commit Graph

966058 Commits

Author SHA1 Message Date
Constantine Sapuntzakis
acaa532687 ext4: fix superblock checksum calculation race
The race condition could cause the persisted superblock checksum
to not match the contents of the superblock, causing the
superblock to be considered corrupt.

An example of the race follows.  A first thread is interrupted in the
middle of a checksum calculation. Then, another thread changes the
superblock, calculates a new checksum, and sets it. Then, the first
thread resumes and sets the checksum based on the older superblock.

To fix, serialize the superblock checksum calculation using the buffer
header lock. While a spinlock is sufficient, the buffer header is
already there and there is precedent for locking it (e.g. in
ext4_commit_super).

Tested the patch by booting up a kernel with the patch, creating
a filesystem and some files (including some orphans), and then
unmounting and remounting the file system.

Cc: stable@kernel.org
Signed-off-by: Constantine Sapuntzakis <costa@purestorage.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200914161014.22275-1-costa@purestorage.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:36:23 -04:00
Dinghao Liu
c9e87161cc ext4: fix error handling code in add_new_gdb
When ext4_journal_get_write_access() fails, we should
terminate the execution flow and release n_group_desc,
iloc.bh, dind and gdb_bh.

Cc: stable@kernel.org
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20200829025403.3139-1-dinghao.liu@zju.edu.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:36:14 -04:00
Xiao Yang
aa2f77920b ext4: disallow modifying DAX inode flag if inline_data has been set
inline_data is mutually exclusive to DAX so enabling both of them triggers
the following issue:
------------------------------------------
# mkfs.ext4 -F -O inline_data /dev/pmem1
...
# mount /dev/pmem1 /mnt
# echo 'test' >/mnt/file
# lsattr -l /mnt/file
/mnt/file                    Inline_Data
# xfs_io -c "chattr +x" /mnt/file
# xfs_io -c "lsattr -v" /mnt/file
[dax] /mnt/file
# umount /mnt
# mount /dev/pmem1 /mnt
# cat /mnt/file
cat: /mnt/file: Numerical result out of range
------------------------------------------

Fixes: b383a73f2b ("fs/ext4: Introduce DAX inode flag")
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20200828084330.15776-1-yangx.jy@cn.fujitsu.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:36:13 -04:00
Nikolay Borisov
15ed2851b0 ext4: remove unused argument from ext4_(inc|dec)_count
The 'handle' argument is not used for anything so simply remove it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200826133116.11592-1-nborisov@suse.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:36:13 -04:00
Petr Malat
81e8c3c503 ext4: do not interpret high bytes if 64bit feature is disabled
Fields s_free_blocks_count_hi, s_r_blocks_count_hi and s_blocks_count_hi
are not valid if EXT4_FEATURE_INCOMPAT_64BIT is not enabled and should be
treated as zeroes.

Signed-off-by: Petr Malat <oss@malat.biz>
Link: https://lore.kernel.org/r/20200825150016.3363-1-oss@malat.biz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:36:13 -04:00
Randy Dunlap
b483bb7719 ext4: delete duplicated words + other fixes
Delete repeated words in fs/ext4/.
{the, this, of, we, after}

Also change spelling of "xttr" in inline.c to "xattr" in 2 places.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200805024850.12129-1-rdunlap@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:36:13 -04:00
Jens Axboe
766ef1e101 ext4: flag as supporting buffered async reads
ext4 uses generic_file_read_iter(), which already supports this.

Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Link: https://lore.kernel.org/r/fb90cc2d-b12c-738f-21a4-dd7a8ae0556a@kernel.dk
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:36:12 -04:00
Eric Biggers
cb8d53d2c9 ext4: fix leaking sysfs kobject after failed mount
ext4_unregister_sysfs() only deletes the kobject.  The reference to it
needs to be put separately, like ext4_put_super() does.

This addresses the syzbot report
"memory leak in kobject_set_name_vargs (3)"
(https://syzkaller.appspot.com/bug?extid=9f864abad79fae7c17e1).

Reported-by: syzbot+9f864abad79fae7c17e1@syzkaller.appspotmail.com
Fixes: 72ba74508b ("ext4: release sysfs kobject when failing to enable quotas on mount")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20200922162456.93657-1-ebiggers@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:36:12 -04:00
Jan Kara
5b3dc19dda ext4: discard preallocations before releasing group lock
ext4_mb_discard_group_preallocations() can be releasing group lock with
preallocations accumulated on its local list. Thus although
discard_pa_seq was incremented and concurrent allocating processes will
be retrying allocations, it can happen that premature ENOSPC error is
returned because blocks used for preallocations are not available for
reuse yet. Make sure we always free locally accumulated preallocations
before releasing group lock.

Fixes: 07b5b8e1ac ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200924150959.4335-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:36:12 -04:00
Ye Bin
70022da804 ext4: fix dead loop in ext4_mb_new_blocks
As we test disk offline/online with running fsstress, we find fsstress
process is keeping running state.
kworker/u32:3-262   [004] ...1   140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
....
kworker/u32:3-262   [004] ...1   140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114

ext4_mb_new_blocks
repeat:
        ext4_mb_discard_preallocations_should_retry(sb, ac, &seq)
                freed = ext4_mb_discard_preallocations
                        ext4_mb_discard_group_preallocations
                                this_cpu_inc(discard_pa_seq);
                ---> freed == 0
                seq_retry = ext4_get_discard_pa_seq_sum
                        for_each_possible_cpu(__cpu)
                                __seq += per_cpu(discard_pa_seq, __cpu);
                if (seq_retry != *seq) {
                        *seq = seq_retry;
                        ret = true;
                }

As we see seq_retry is sum of discard_pa_seq every cpu, if
ext4_mb_discard_group_preallocations return zero discard_pa_seq in this
cpu maybe increase one, so condition "seq_retry != *seq" have always
been met.
Ritesh Harjani suggest to in ext4_mb_discard_group_preallocations function we
only increase discard_pa_seq when there is some PA to free.

Fixes: 07b5b8e1ac ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200916113859.1556397-3-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:36:12 -04:00
Ritesh Harjani
0e6895ba00 ext4: implement swap_activate aops using iomap
After moving ext4's bmap to iomap interface, swapon functionality
on files created using fallocate (which creates unwritten extents) are
failing. This is since iomap_bmap interface returns 0 for unwritten
extents and thus generic_swapfile_activate considers this as holes
and hence bail out with below kernel msg :-

[340.915835] swapon: swapfile has holes

To fix this we need to implement ->swap_activate aops in ext4
which will use ext4_iomap_report_ops. Since we only need to return
the list of extents so ext4_iomap_report_ops should be enough.

Cc: stable@kernel.org
Reported-by: Yuxuan Shui <yshuiv7@gmail.com>
Fixes: ac58e4fb03 ("ext4: move ext4 bmap to use iomap infrastructure")
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200904091653.1014334-1-riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18 10:35:54 -04:00
Denis Efremov
edc05fe555 coccinelle: api: add kfree_mismatch script
Check that alloc and free types of functions match each other.

Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
2020-10-17 23:11:06 +02:00
Jens Axboe
91989c7078 task_work: cleanup notification modes
A previous commit changed the notification mode from true/false to an
int, allowing notify-no, notify-yes, or signal-notify. This was
backwards compatible in the sense that any existing true/false user
would translate to either 0 (on notification sent) or 1, the latter
which mapped to TWA_RESUME. TWA_SIGNAL was assigned a value of 2.

Clean this up properly, and define a proper enum for the notification
mode. Now we have:

- TWA_NONE. This is 0, same as before the original change, meaning no
  notification requested.
- TWA_RESUME. This is 1, same as before the original change, meaning
  that we use TIF_NOTIFY_RESUME.
- TWA_SIGNAL. This uses TIF_SIGPENDING/JOBCTL_TASK_WORK for the
  notification.

Clean up all the callers, switching their 0/1/false/true to using the
appropriate TWA_* mode for notifications.

Fixes: e91b481623 ("task_work: teach task_work_add() to do signal_wake_up()")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 15:05:30 -06:00
Jens Axboe
3c532798ec tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
All the callers currently do this, clean it up and move the clearing
into tracehook_notify_resume() instead.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 15:04:36 -06:00
Jens Axboe
324bcf54c4 mm: use limited read-ahead to satisfy read
For the case where read-ahead is disabled on the file, or if the cgroup
is congested, ensure that we can at least do 1 page of read-ahead to
make progress on the read in an async fashion. This could potentially be
larger, but it's not needed in terms of functionality, so let's error on
the side of caution as larger counts of pages may run into reclaim
issues (particularly if we're congested).

This makes sure we're not hitting the potentially sync ->readpage() path
for IO that is marked IOCB_WAITQ, which could cause us to block. It also
means we'll use the same path for IO, regardless of whether or not
read-ahead happens to be disabled on the lower level device.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Hao_Xu <haoxu@linux.alibaba.com>
[axboe: updated for new ractl API]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 13:49:08 -06:00
Jens Axboe
13bd691421 mm: mark async iocb read as NOWAIT once some data has been copied
Once we've copied some data for an iocb that is marked with IOCB_WAITQ,
we should no longer attempt to async lock a new page. Instead make sure
we return the copied amount, and let the caller retry, instead of
returning -EIOCBQUEUED for a new page.

This should only be possible with read-ahead disabled on the below
device, and multiple threads racing on the same file. Haven't been able
to reproduce on anything else.

Cc: stable@vger.kernel.org # v5.9
Fixes: 1a0a7853b9 ("mm: support async buffered reads in generic_file_buffered_read()")
Reported-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 13:49:05 -06:00
Linus Torvalds
9d9af1007b Merge tag 'perf-tools-for-v5.10-2020-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:

 - cgroup improvements for 'perf stat', allowing for compact
   specification of events and cgroups in the command line.

 - Support per thread topdown metrics in 'perf stat'.

 - Support sample-read topdown metric group in 'perf record'

 - Show start of latency in addition to its start in 'perf sched
   latency'.

 - Add min, max to 'perf script' futex-contention output, in addition to
   avg.

 - Allow usage of 'perf_event_attr->exclusive' attribute via the new
   ':e' event modifier.

 - Add 'snapshot' command to 'perf record --control', using it with
   Intel PT.

 - Support FIFO file names as alternative options to 'perf record
   --control'.

 - Introduce branch history "streams", to compare 'perf record' runs
   with 'perf diff' based on branch records and report hot streams.

 - Support PE executable symbol tables using libbfd, to profile, for
   instance, wine binaries.

 - Add filter support for option 'perf ftrace -F/--funcs'.

 - Allow configuring the 'disassembler_style' 'perf annotate' knob via
   'perf config'

 - Update CascadelakeX and SkylakeX JSON vendor events files.

 - Add support for parsing perchip/percore JSON vendor events.

 - Add power9 hv_24x7 core level metric events.

 - Add L2 prefetch, ITLB instruction fetch hits JSON events for AMD
   zen1.

 - Enable Family 19h users by matching Zen2 AMD vendor events.

 - Use debuginfod in 'perf probe' when required debug files not found
   locally.

 - Display negative tid in non-sample events in 'perf script'.

 - Make GTK2 support opt-in

 - Add build test with GTK+

 - Add missing -lzstd to the fast path feature detection

 - Add scripts to auto generate 'mmap', 'mremap' string<->id tables for
   use in 'perf trace'.

 - Show python test script in verbose mode.

 - Fix uncore metric expressions

 - Msan uninitialized use fixes.

 - Use condition variables in 'perf bench numa'

 - Autodetect python3 binary in systems without python2.

 - Support md5 build ids in addition to sha1.

 - Add build id 'perf test' regression test.

 - Fix printable strings in python3 scripts.

 - Fix off by ones in 'perf trace' in arches using libaudit.

 - Fix JSON event code for events referencing std arch events.

 - Introduce 'perf test' shell script for Arm CoreSight testing.

 - Add rdtsc() for Arm64 for used in the PERF_RECORD_TIME_CONV metadata
   event and in 'perf test tsc'.

 - 'perf c2c' improvements: Add "RMT Load Hit" metric, "Total Stores",
   fixes and documentation update.

 - Fix usage of reloc_sym in 'perf probe' when using both kallsyms and
   debuginfo files.

 - Do not print 'Metric Groups:' unnecessarily in 'perf list'

 - Refcounting fixes in the event parsing code.

 - Add expand cgroup event 'perf test' entry.

 - Fix out of bounds CPU map access when handling armv8_pmu events in
   'perf stat'.

 - Add build-id injection 'perf bench' benchmark.

 - Enter namespace when reading build-id in 'perf inject'.

 - Do not load map/dso when injecting build-id speeding up the 'perf
   inject' process.

 - Add --buildid-all option to avoid processing all samples, just the
   mmap metadata events.

 - Add feature test to check if libbfd has buildid support

 - Add 'perf test' entry for PE binary format support.

 - Fix typos in power8 PMU vendor events JSON files.

 - Hide libtraceevent non API functions.

* tag 'perf-tools-for-v5.10-2020-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (113 commits)
  perf c2c: Update documentation for metrics reorganization
  perf c2c: Add metrics "RMT Load Hit"
  perf c2c: Correct LLC load hit metrics
  perf c2c: Change header for LLC local hit
  perf c2c: Use more explicit headers for HITM
  perf c2c: Change header from "LLC Load Hitm" to "Load Hitm"
  perf c2c: Organize metrics based on memory hierarchy
  perf c2c: Display "Total Stores" as a standalone metrics
  perf c2c: Display the total numbers continuously
  perf bench: Use condition variables in numa.
  perf jevents: Fix event code for events referencing std arch events
  perf diff: Support hot streams comparison
  perf streams: Report hot streams
  perf streams: Calculate the sum of total streams hits
  perf streams: Link stream pair
  perf streams: Compare two streams
  perf streams: Get the evsel_streams by evsel_idx
  perf streams: Introduce branch history "streams"
  perf intel-pt: Improve PT documentation slightly
  perf tools: Add support for exclusive groups/events
  ...
2020-10-17 11:47:46 -07:00
Linus Torvalds
a1e16bc7d5 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
 "A usual cycle for RDMA with a typical mix of driver and core subsystem
  updates:

   - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma,
     hns, usnic, qib, qedr, cxgb4, hns, bnxt_re

   - Various rtrs fixes and updates

   - Bug fix for mlx4 CM emulation for virtualization scenarios where
     MRA wasn't working right

   - Use tracepoints instead of pr_debug in the CM code

   - Scrub the locking in ucma and cma to close more syzkaller bugs

   - Use tasklet_setup in the subsystem

   - Revert the idea that 'destroy' operations are not allowed to fail
     at the driver level. This proved unworkable from a HW perspective.

   - Revise how the umem API works so drivers make fewer mistakes using
     it

   - XRC support for qedr

   - Convert uverbs objects RWQ and MW to new the allocation scheme

   - Large queue entry sizes for hns

   - Use hmm_range_fault() for mlx5 On Demand Paging

   - uverbs APIs to inspect the GID table instead of sysfs

   - Move some of the RDMA code for building large page SGLs into
     lib/scatterlist"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits)
  RDMA/ucma: Fix use after free in destroy id flow
  RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
  RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
  RDMA: Explicitly pass in the dma_device to ib_register_device
  lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values
  IB/mlx4: Convert rej_tmout radix-tree to XArray
  RDMA/rxe: Fix bug rejecting all multicast packets
  RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
  RDMA/rxe: Remove duplicate entries in struct rxe_mr
  IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS
  IB/rdmavt: Fix sizeof mismatch
  MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER
  RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
  RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
  RDMA/umem: Move to allocate SG table from pages
  lib/scatterlist: Add support in dynamic allocation of SG table from pages
  tools/testing/scatterlist: Show errors in human readable form
  tools/testing/scatterlist: Rejuvenate bit-rotten test
  RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
  RDMA/uverbs: Expose the new GID query API to user space
  ...
2020-10-17 11:18:18 -07:00
Linus Torvalds
2a934b38c0 Merge tag 'i3c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c updates from Boris Brezillon:

 - Fix DAA for the pre-reserved address case

 - Fix an error path in the cadence driver

* tag 'i3c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
  i3c: master: Fix error return in cdns_i3c_master_probe()
  i3c: master: fix for SETDASA and DAA process
  i3c: master add i3c_master_attach_boardinfo to preserve boardinfo
2020-10-17 11:01:01 -07:00
Linus Torvalds
6f78b9acf0 Merge tag 'mtd/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD updates from Richard Weinberger:
 "NAND core changes:
   - Drop useless 'depends on' in Kconfig
   - Add an extra level in the Kconfig hierarchy
   - Trivial spellings
   - Dynamic allocation of the interface configurations
   - Dropping the default ONFI timing mode
   - Various cleanup (types, structures, naming, comments)
   - Hide the chip->data_interface indirection
   - Add the generic rb-gpios property
   - Add the ->choose_interface_config() hook
   - Introduce nand_choose_best_sdr_timings()
   - Use default values for tPROG_max and tBERS_max
   - Avoid redefining tR_max and tCCS_min
   - Add a helper to find the closest ONFI mode
   - bcm63xx MTD parsers: simplify CFE detection

  Raw NAND controller drivers changes:
   - fsl-upm: Deprecation of specific DT properties
   - fsl_upm: Driver rework and cleanup in favor of ->exec_op()
   - Ingenic: Cleanup ARRAY_SIZE() vs sizeof() use
   - brcmnand: ECC error handling on EDU transfers
   - brcmnand: Don't default to EDU transfers
   - qcom: Set BAM mode only if not set already
   - qcom: Avoid write to unavailable register
   - gpio: Driver rework in favor of ->exec_op()
   - tango: ->exec_op() conversion
   - mtk: ->exec_op() conversion

  Raw NAND chip drivers changes:
   - toshiba: Implement ->choose_interface_config() for TH58NVG2S3HBAI4
   - toshiba: Implement ->choose_interface_config() for TC58NVG0S3E
   - toshiba: Implement ->choose_interface_config() for TC58TEG5DCLTA00
   - hynix: Implement ->choose_interface_config() for H27UCG8T2ATR-BC

  HyperBus changes:
   - DMA support for TI's AM654 HyperBus controller driver.
   - HyperBus frontend driver for Renesas RPC-IF driver.

  SPI NOR core changes:
   - Support for Winbond w25q64jwm flash
   - Enable 4K sector support for mx25l12805d

  SPI NOR controller drivers changes:
   - intel-spi Add Alder Lake-S PCI ID

  MTD Core changes:
   - mtdoops: Don't run panic write twice
   - mtdconcat: Correctly handle panic write
   - Use DEFINE_SHOW_ATTRIBUTE"

* tag 'mtd/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (76 commits)
  mtd: hyperbus: Fix build failure when only RPCIF_HYPERBUS is enabled
  mtd: hyperbus: add Renesas RPC-IF driver
  Revert "mtd: spi-nor: Prefer asynchronous probe"
  mtd: parsers: bcm63xx: Do not make it modular
  mtd: spear_smi: Enable compile testing
  mtd: maps: vmu-flash: fix typos for struct memcard
  mtd: physmap: Add Baikal-T1 physically mapped ROM support
  mtd: maps: vmu-flash: simplify the return expression of probe_maple_vmu
  mtd: onenand: simplify the return expression of onenand_transfer_auto_oob
  mtd: rawnand: cadence: remove a redundant dev_err call
  mtd: rawnand: ams-delta: Fix non-OF build warning
  mtd: rawnand: Don't overwrite the error code from nand_set_ecc_soft_ops()
  mtd: rawnand: Introduce nand_set_ecc_on_host_ops()
  mtd: rawnand: atmel: Check return values for nand_read_data_op
  mtd: rawnand: vf610: Remove unused function vf610_nfc_transfer_size()
  mtd: rawnand: qcom: Simplify with dev_err_probe()
  mtd: rawnand: marvell: Fix and update kerneldoc
  mtd: rawnand: marvell: Simplify with dev_err_probe()
  mtd: rawnand: gpmi: Simplify with dev_err_probe()
  mtd: rawnand: atmel: Simplify with dev_err_probe()
  ...
2020-10-17 10:45:42 -07:00
Linus Torvalds
5a77b6a013 Merge tag 'thermal-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal updates from Daniel Lezcano:

 - Fix Kconfig typo "acces" -> "access" (Colin Ian King)

 - Use dev_error_probe() to simplify the error handling on imx and imx8
   platforms (Anson Huang)

 - Use dedicated kobj_to_dev() instead of container_of() in the sysfs
   core code (Tian Tao)

 - Fix coding style by adding braces to a one line conditional statement
   on rcar (Geert Uytterhoeven)

 - Add DT binding documentation for the r8a774e1 platform and update the
   Kconfig description supporting RZ/G2 SoCs (Lad Prabhakar)

 - Simplify the return expression of stm_thermal_prepare on the stm32
   platform (Qinglang Miao)

 - Fix the unit in the function documentation for the idle injection
   cooling device (Zhuguang Qing)

 - Remove an unecessary mutex_init() in the core code (Qinglang Miao)

 - Add support for keep alive events in the core code and the specific
   int340x (Srinivas Pandruvada)

 - Remove unused thermal zone variable in devfreq and cpufreq cooling
   devices (Zhuguang Qing)

 - Add the A100's THS controller support (Yangtao Li)

 - Add power management on the omap3's bandgap sensor (Adam Ford)

 - Fix a missing nlmsg_free in the netlink core error path (Jing
   Xiangfeng)

* tag 'thermal-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal: core: Adding missing nlmsg_free() in thermal_genl_sampling_temp()
  thermal: ti-soc-thermal: Enable addition power management
  thermal: sun8i: Add A100's THS controller support
  thermal: sun8i: add TEMP_CALIB_MASK for calibration data in sun50i_h6_ths_calibrate
  dt-bindings: thermal: sun8i: Add binding for A100's THS controller
  thermal: cooling: Remove unused variable *tz
  thermal: int340x: Add keep alive response method
  thermal: core: Add new event for sending keep alive notifications
  thermal: int340x: Provide notification for OEM variable change
  thermal: core: remove unnecessary mutex_init()
  thermal/idle_inject: Fix comment of idle_duration_us and name of latency_ns
  thermal: Kconfig: Update description for RCAR_GEN3_THERMAL config
  thermal: stm32: simplify the return expression of stm_thermal_prepare()
  dt-bindings: thermal: rcar-gen3-thermal: Add r8a774e1 support
  thermal: rcar_thermal: Add missing braces to conditional statement
  thermal: Use kobj_to_dev() instead of container_of()
  thermal: imx8mm: Use dev_err_probe() to simplify error handling
  thermal: imx: Use dev_err_probe() to simplify error handling
  drivers: thermal: Kconfig: fix spelling mistake "acces" -> "access"
2020-10-17 10:40:22 -07:00
Pavel Begunkov
58852d4d67 io_uring: fix double poll mask init
__io_queue_proc() is used by both, poll reqs and apoll. Don't use
req->poll.events to copy poll mask because for apoll it aliases with
private data of the request.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:47 -06:00
Jens Axboe
4ea33a976b io-wq: inherit audit loginuid and sessionid
Make sure the async io-wq workers inherit the loginuid and sessionid from
the original task, and restore them to unset once we're done with the
async work item.

While at it, disable the ability for kernel threads to write to their own
loginuid.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:47 -06:00
Jens Axboe
d8a6df10aa io_uring: use percpu counters to track inflight requests
Even though we place the req_issued and req_complete in separate
cachelines, there's considerable overhead in doing the atomics
particularly on the completion side.

Get rid of having the two counters, and just use a percpu_counter for
this. That's what it was made for, after all. This considerably
reduces the overhead in __io_free_req().

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:47 -06:00
Jens Axboe
500a373d73 io_uring: assign new io_identity for task if members have changed
This avoids doing a copy for each new async IO, if some parts of the
io_identity has changed. We avoid reference counting for the normal
fast path of nothing ever changing.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:46 -06:00
Jens Axboe
5c3462cfd1 io_uring: store io_identity in io_uring_task
This is, by definition, a per-task structure. So store it in the
task context, instead of doing carrying it in each io_kiocb. We're being
a bit inefficient if members have changed, as that requires an alloc and
copy of a new io_identity struct. The next patch will fix that up.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:46 -06:00
Jens Axboe
1e6fa5216a io_uring: COW io_identity on mismatch
If the io_identity doesn't completely match the task, then create a
copy of it and use that. The existing copy remains valid until the last
user of it has gone away.

This also changes the personality lookup to be indexed by io_identity,
instead of creds directly.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:46 -06:00
Jens Axboe
98447d65b4 io_uring: move io identity items into separate struct
io-wq contains a pointer to the identity, which we just hold in io_kiocb
for now. This is in preparation for putting this outside io_kiocb. The
only exception is struct files_struct, which we'll need different rules
for to avoid a circular dependency.

No functional changes in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:45 -06:00
Jens Axboe
dfead8a8e2 io_uring: rely solely on work flags to determine personality.
We solely rely on work->work_flags now, so use that for proper checking
and clearing/dropping of various identity items.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:45 -06:00
Jens Axboe
0f20376588 io_uring: pass required context in as flags
We have a number of bits that decide what context to inherit. Set up
io-wq flags for these instead. This is in preparation for always having
the various members set, but not always needing them for all requests.

No intended functional changes in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:45 -06:00
Jens Axboe
a8b595b22d io-wq: assign NUMA node locality if appropriate
There was an assumption that kthread_create_on_node() would properly set
NUMA affinities in terms of CPUs allowed, but it doesn't. Make sure we
do this when creating an io-wq context on NUMA.

Cc: stable@vger.kernel.org
Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:44 -06:00
Jens Axboe
55cbc2564a io_uring: fix error path cleanup in io_sqe_files_register()
syzbot reports the following crash:

general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 PID: 8927 Comm: syz-executor.3 Not tainted 5.9.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_file_from_index fs/io_uring.c:5963 [inline]
RIP: 0010:io_sqe_files_register fs/io_uring.c:7369 [inline]
RIP: 0010:__io_uring_register fs/io_uring.c:9463 [inline]
RIP: 0010:__do_sys_io_uring_register+0x2fd2/0x3ee0 fs/io_uring.c:9553
Code: ec 03 49 c1 ee 03 49 01 ec 49 01 ee e8 57 61 9c ff 41 80 3c 24 00 0f 85 9b 09 00 00 4d 8b af b8 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 28 00 0f 85 76 09 00 00 49 8b 55 00 89 d8 c1 f8 09 48 98 4c
RSP: 0018:ffffc90009137d68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9000ef2a000
RDX: 0000000000040000 RSI: ffffffff81d81dd9 RDI: 0000000000000005
RBP: dffffc0000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffed1012882a37
R13: 0000000000000000 R14: ffffed1012882a38 R15: ffff888094415000
FS:  00007f4266f3c700(0000) GS:ffff8880ae500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000118c000 CR3: 000000008e57d000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45de59
Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f4266f3bc78 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
RAX: ffffffffffffffda RBX: 00000000000083c0 RCX: 000000000045de59
RDX: 0000000020000280 RSI: 0000000000000002 RDI: 0000000000000005
RBP: 000000000118bf68 R08: 0000000000000000 R09: 0000000000000000
R10: 40000000000000a1 R11: 0000000000000246 R12: 000000000118bf2c
R13: 00007fff2fa4f12f R14: 00007f4266f3c9c0 R15: 000000000118bf2c
Modules linked in:
---[ end trace 2a40a195e2d5e6e6 ]---
RIP: 0010:io_file_from_index fs/io_uring.c:5963 [inline]
RIP: 0010:io_sqe_files_register fs/io_uring.c:7369 [inline]
RIP: 0010:__io_uring_register fs/io_uring.c:9463 [inline]
RIP: 0010:__do_sys_io_uring_register+0x2fd2/0x3ee0 fs/io_uring.c:9553
Code: ec 03 49 c1 ee 03 49 01 ec 49 01 ee e8 57 61 9c ff 41 80 3c 24 00 0f 85 9b 09 00 00 4d 8b af b8 01 00 00 4c 89 e8 48 c1 e8 03 <80> 3c 28 00 0f 85 76 09 00 00 49 8b 55 00 89 d8 c1 f8 09 48 98 4c
RSP: 0018:ffffc90009137d68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9000ef2a000
RDX: 0000000000040000 RSI: ffffffff81d81dd9 RDI: 0000000000000005
RBP: dffffc0000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffed1012882a37
R13: 0000000000000000 R14: ffffed1012882a38 R15: ffff888094415000
FS:  00007f4266f3c700(0000) GS:ffff8880ae400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000074a918 CR3: 000000008e57d000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

which is a copy of fget failure condition jumping to cleanup, but the
cleanup requires ctx->file_data to be assigned. Assign it when setup,
and ensure that we clear it again for the error path exit.

Fixes: 5398ae6985 ("io_uring: clean file_data access in files_register")
Reported-by: syzbot+f4ebcc98223dafd8991e@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:44 -06:00
Jens Axboe
0918682be4 Revert "io_uring: mark io_uring_fops/io_op_defs as __read_mostly"
This reverts commit 738277adc8.

This change didn't make a lot of sense, and as Linus reports, it actually
fails on clang:

   /tmp/io_uring-dd40c4.s:26476: Warning: ignoring changed section
   attributes for .data..read_mostly

The arrays are already marked const so, by definition, they are not
just read-mostly, they are read-only.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:43 -06:00
Pavel Begunkov
216578e55a io_uring: fix REQ_F_COMP_LOCKED by killing it
REQ_F_COMP_LOCKED is used and implemented in a buggy way. The problem is
that the flag is set before io_put_req() but not cleared after, and if
that wasn't the final reference, the request will be freed with the flag
set from some other context, which may not hold a spinlock. That means
possible races with removing linked timeouts and unsynchronised
completion (e.g. access to CQ).

Instead of fixing REQ_F_COMP_LOCKED, kill the flag and use
task_work_add() to move such requests to a fresh context to free from
it, as was done with __io_free_req_finish().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:43 -06:00
Pavel Begunkov
4edf20f999 io_uring: dig out COMP_LOCK from deep call chain
io_req_clean_work() checks REQ_F_COMP_LOCK to pass this two layers up.
Move the check up into __io_free_req(), so at least it doesn't looks so
ugly and would facilitate further changes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:43 -06:00
Pavel Begunkov
6a0af224c2 io_uring: don't put a poll req under spinlock
Move io_put_req() in io_poll_task_handler() from under spinlock. This
eliminates the need to use REQ_F_COMP_LOCKED, at the expense of
potentially having to grab the lock again. That's still a better trade
off than relying on the locked flag.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:42 -06:00
Pavel Begunkov
b1b74cfc19 io_uring: don't unnecessarily clear F_LINK_TIMEOUT
If a request had REQ_F_LINK_TIMEOUT it would've been cleared in
__io_kill_linked_timeout() by the time of __io_fail_links(), so no need
to care about it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:42 -06:00
Pavel Begunkov
368c5481ae io_uring: don't set COMP_LOCKED if won't put
__io_kill_linked_timeout() sets REQ_F_COMP_LOCKED for a linked timeout
even if it can't cancel it, e.g. it's already running. It not only races
with io_link_timeout_fn() for ->flags field, but also leaves the flag
set and so io_link_timeout_fn() may find it and decide that it holds the
lock. Hopefully, the second problem is potential.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:42 -06:00
Colin Ian King
035fbafc7a io_uring: Fix sizeof() mismatch
An incorrect sizeof() is being used, sizeof(file_data->table) is not
correct, it should be sizeof(*file_data->table).

Fixes: 5398ae6985 ("io_uring: clean file_data access in files_register")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 09:25:41 -06:00
Tian Tao
db07327270 skd_main: remove unused including <linux/version.h>
Remove including <linux/version.h> that don't need it.

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 08:11:14 -06:00
Colin Ian King
9ce88a13b3 ALSA: hda/ca0132: make some const arrays static, makes object smaller
Don't populate const arrays on the stack but instead make them
static. Makes the object code smaller by 57 bytes.

Before:
   text	   data	    bss	    dec	    hex	filename
 173256	  38016	    192	 211464	  33a08	sound/pci/hda/patch_ca0132.o

After:
   text	   data	    bss	    dec	    hex	filename
 172879	  38336	    192	 211407	  339cf	sound/pci/hda/patch_ca0132.o

(gcc version 10.2.0)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201016224913.687724-1-colin.king@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-10-17 09:58:59 +02:00
Randy Dunlap
a97cbcd00f ALSA: sparc: dbri: fix repeated word 'the'
Change the duplicated word "the" to "Then the".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20201016174405.17745-1-rdunlap@infradead.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-10-17 09:58:35 +02:00
Jassi Brar
c7dacf5b0f mailbox: avoid timer start from callback
If the txdone is done by polling, it is possible for msg_submit() to start
the timer while txdone_hrtimer() callback is running. If the timer needs
recheduling, it could already be enqueued by the time hrtimer_forward_now()
is called, leading hrtimer to loudly complain.

WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 hrtimer_forward+0xc4/0x110
CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 5.9.0-rc2-00236-gd3520067d01c-dirty #5
Hardware name: Libre Computer AML-S805X-AC (DT)
Workqueue: events_freezable_power_ thermal_zone_device_check
pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--)
pc : hrtimer_forward+0xc4/0x110
lr : txdone_hrtimer+0xf8/0x118
[...]

This can be fixed by not starting the timer from the callback path. Which
requires the timer reloading as long as any message is queued on the
channel, and not just when current tx is not done yet.

Fixes: 0cc67945ea ("mailbox: switch to hrtimer for tx_complete polling")
Reported-by: Da Xue <da@libre.computer>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2020-10-16 19:09:17 -05:00
Ioana Ciornei
f355a55f82 net: pcs-xpcs: depend on MDIO_BUS instead of selecting it
The below compile time error can be seen when PHYLIB is configured as a
module.

 ld: drivers/net/pcs/pcs-xpcs.o: in function `xpcs_read':
 pcs-xpcs.c:(.text+0x29): undefined reference to `mdiobus_read'
 ld: drivers/net/pcs/pcs-xpcs.o: in function `xpcs_soft_reset.constprop.7':
 pcs-xpcs.c:(.text+0x80): undefined reference to `mdiobus_write'
 ld: drivers/net/pcs/pcs-xpcs.o: in function `xpcs_config_aneg':
 pcs-xpcs.c:(.text+0x318): undefined reference to `mdiobus_write'
 ld: pcs-xpcs.c:(.text+0x38e): undefined reference to `mdiobus_write'
 ld: pcs-xpcs.c:(.text+0x3eb): undefined reference to `mdiobus_write'
 ld: pcs-xpcs.c:(.text+0x437): undefined reference to `mdiobus_write'
 ld: drivers/net/pcs/pcs-xpcs.o:pcs-xpcs.c:(.text+0xb1e): more undefined references to `mdiobus_write' follow

PHYLIB being a module leads to MDIO_BUS being a module as well while the
XPCS is still built-in. What should happen in this configuration is that
PCS_XPCS should be forced to build as module. However, that select only
acts in the opposite way so we should turn it into a depends.

Fix this up by explicitly depending on MDIO_BUS.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Fixes: 2fa4e4b799 ("net: pcs: Move XPCS into new PCS subdirectory")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-16 16:54:11 -07:00
Eric Dumazet
b38e7819ca icmp: randomize the global rate limiter
Keyu Man reported that the ICMP rate limiter could be used
by attackers to get useful signal. Details will be provided
in an upcoming academic publication.

Our solution is to add some noise, so that the attackers
no longer can get help from the predictable token bucket limiter.

Fixes: 4cdf507d54 ("icmp: add a global rate limitation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-16 16:47:09 -07:00
Dylan Hung
137d23cea1 net: ftgmac100: Fix Aspeed ast2600 TX hang issue
The new HW arbitration feature on Aspeed ast2600 will cause MAC TX to
hang when handling scatter-gather DMA.  Disable the problematic feature
by setting MAC register 0x58 bit28 and bit27.

Fixes: 39bfab8844 ("net: ftgmac100: Add support for DT phy-handle property")
Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-16 15:36:34 -07:00
Darrick J. Wong
894645546b xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n
Pavel Machek complained that the question about supporting deprecated
XFS v4 comes up even when XFS is disabled.  This clearly makes no sense,
so fix Kconfig.

Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2020-10-16 15:34:28 -07:00
Darrick J. Wong
d88850bd55 xfs: fix high key handling in the rt allocator's query_range function
Fix some off-by-one errors in xfs_rtalloc_query_range.  The highest key
in the realtime bitmap is always one less than the number of rt extents,
which means that the key clamp at the start of the function is wrong.
The 4th argument to xfs_rtfind_forw is the highest rt extent that we
want to probe, which means that passing 1 less than the high key is
wrong.  Finally, drop the rem variable that controls the loop because we
can compare the iteration point (rtstart) against the high key directly.

The sordid history of this function is that the original commit (fb3c3)
incorrectly passed (high_rec->ar_startblock - 1) as the 'limit' parameter
to xfs_rtfind_forw.  This was wrong because the "high key" is supposed
to be the largest key for which the caller wants result rows, not the
key for the first row that could possibly be outside the range that the
caller wants to see.

A subsequent attempt (8ad56) to strengthen the parameter checking added
incorrect clamping of the parameters to the number of rt blocks in the
system (despite the bitmap functions all taking units of rt extents) to
avoid querying ranges past the end of rt bitmap file but failed to fix
the incorrect _rtfind_forw parameter.  The original _rtfind_forw
parameter error then survived the conversion of the startblock and
blockcount fields to rt extents (a0e5c), and the most recent off-by-one
fix (a3a37) thought it was patching a problem when the end of the rt
volume is not in use, but none of these fixes actually solved the
original problem that the author was confused about the "limit" argument
to xfs_rtfind_forw.

Sadly, all four of these patches were written by this author and even
his own usage of this function and rt testing were inadequate to get
this fixed quickly.

Original-problem: fb3c3de2f6 ("xfs: add a couple of queries to iterate free extents in the rtbitmap")
Not-fixed-by: 8ad560d256 ("xfs: strengthen rtalloc query range checks")
Not-fixed-by: a0e5c435ba ("xfs: fix xfs_rtalloc_rec units")
Fixes: a3a374bf18 ("xfs: fix off-by-one error in xfs_rtalloc_query_range")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2020-10-16 15:34:28 -07:00
Linus Torvalds
071a0578b0 Merge tag 'ovl-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:

 - Improve performance for certain container setups by introducing a
   "volatile" mode

 - ioctl improvements

 - continue preparation for unprivileged overlay mounts

* tag 'ovl-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: use generic vfs_ioc_setflags_prepare() helper
  ovl: support [S|G]ETFLAGS and FS[S|G]ETXATTR ioctls for directories
  ovl: rearrange ovl_can_list()
  ovl: enumerate private xattrs
  ovl: pass ovl_fs down to functions accessing private xattrs
  ovl: drop flags argument from ovl_do_setxattr()
  ovl: adhere to the vfs_ vs. ovl_do_ conventions for xattrs
  ovl: use ovl_do_getxattr() for private xattr
  ovl: fold ovl_getxattr() into ovl_get_redirect_xattr()
  ovl: clean up ovl_getxattr() in copy_up.c
  duplicate ovl_getxattr()
  ovl: provide a mount option "volatile"
  ovl: check for incompatible features in work dir
2020-10-16 15:29:46 -07:00
Linus Torvalds
fad70111d5 Merge tag 'afs-fixes-20201016' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull afs updates from David Howells:
 "A collection of fixes to fix afs_cell struct refcounting, thereby
  fixing a slew of related syzbot bugs:

   - Fix the cell tree in the netns to use an rwsem rather than RCU.

     There seem to be some problems deriving from the use of RCU and a
     seqlock to walk the rbtree, but it's not entirely clear what since
     there are several different failures being seen.

     Changing things to use an rwsem instead makes it more robust. The
     extra performance derived from using RCU isn't necessary in this
     case since the only time we're looking up a cell is during mount or
     when cells are being manually added.

   - Fix the refcounting by splitting the usage counter into a memory
     refcount and an active users counter. The usage counter was doing
     double duty, keeping track of whether a cell is still in use and
     keeping track of when it needs to be destroyed - but this makes the
     clean up tricky. Separating these out simplifies the logic.

   - Fix purging a cell that has an alias. A cell alias pins the cell
     it's an alias of, but the alias is always later in the list. Trying
     to purge in a single pass causes rmmod to hang in such a case.

   - Fix cell removal. If a cell's manager is requeued whilst it's
     removing itself, the manager will run again and re-remove itself,
     causing problems in various places. Follow Hillf Danton's
     suggestion to insert a more terminal state that causes the manager
     to do nothing post-removal.

  In additional to the above, two other changes:

   - Add a tracepoint for the cell refcount and active users count. This
     helped with debugging the above and may be useful again in future.

   - Downgrade an assertion to a print when a still-active server is
     seen during purging. This was happening as a consequence of
     incomplete cell removal before the servers were cleaned up"

* tag 'afs-fixes-20201016' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Don't assert on unpurgeable server records
  afs: Add tracing for cell refcount and active user count
  afs: Fix cell removal
  afs: Fix cell purging with aliases
  afs: Fix cell refcounting by splitting the usage counter
  afs: Fix rapid cell addition/removal by not using RCU on cells tree
2020-10-16 15:22:41 -07:00