Commit Graph

1044901 Commits

Author SHA1 Message Date
Edwin Peer
beb55fcf95 bnxt_en: fix read of stored FW_PSID version on P5 devices
P5 devices store NVM arrays using a different internal representation.
This implementation detail permeates into the HWRM API, requiring the
caller to explicitly index the array elements in HWRM_NVM_GET_VARIABLE
on these devices. Conversely, older devices do not support the indexed
mode of operation and require reading the raw NVM content.

Fixes: db28b6c77f ("bnxt_en: Fix devlink info's stored fw.psid version format.")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05 20:43:03 +01:00
Edwin Peer
1656db6723 bnxt_en: fix stored FW_PSID version masks
The FW_PSID version components are 8 bits wide, not 4.

Fixes: db28b6c77f ("bnxt_en: Fix devlink info's stored fw.psid version format.")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05 20:43:03 +01:00
Linus Torvalds
27151f1778 Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo:
 "New features:

   - Improvements for the flamegraph python script, including:
       - Display perf.data header
       - Display PIDs of user stacks
       - Added option to change color scheme
       - Default to blue/green color scheme to improve accessibility
       - Correctly identify kernel stacks when debuginfo is available

   - Improvements for 'perf bench futex':
       - Add --mlockall parameter
       - Add --broadcast and --pi to the 'requeue' sub benchmark

   - Add support for PMU aliases.

   - Introduce an ARM Coresight ETE decoder.

   - Add a 'perf bench' entry for evlist open/close operations, to help
     quantify improvements with multithreading 'perf record'.

   - Allow reporting the [un]throttle PERF_RECORD_ meta event in 'perf
     script's python scripting.

   - Add a 'perf test' entry for PMU aliases.

   - Add a 'perf test' entry for 'perf record/perf report/perf script'
     pipe mode.

  Fixes:

   - perf script dlfilter (API for filtering via dynamically loaded
     shared object introduced in v5.14) fixes and a 'perf test' entry
     for it.

   - Fix get_current_dir_name() compilation on Android.

   - Fix issues with asciidoc and double dashes uses.

   - Fix memory leaks in the BTF handling code.

   - Fix leftover problems in the Documentation from the infrastructure
     originally lifted from the git codebase.

   - Fix *probe_vfs_getname.sh 'perf test' failures.

   - Handle fd gaps in 'perf test's test__dso_data_reopen().

   - Make sure to show disasembly warnings for 'perf annotate --stdio'.

   - Fix output from pipe to file and vice-versa in 'perf
     record/report/script'.

   - Correct 'perf data -h' output.

   - Fix wrong comm in system-wide mode with 'perf record --delay'.

   - Do not allow --for-each-cgroup without cpu in 'perf stat'

   - Make 'perf test --skip' work on shell tests.

   - Fix libperf's verbose printing.

  Misc improvements:

   - Preparatory patches for multithreading various 'perf record' phases
     (synthesizing, opening, recording, etc).

   - Add sparse context/locking annotations in compiler-types.h, also to
     help with the multithreading effort.

   - Optimize the generation of the arch specific erno tables used in
     'perf trace'.

   - Optimize libperf's perf_cpu_map__max().

   - Improve ARM's CoreSight warnings.

   - Report collisions in AUX records.

   - Improve warnings for the LLVM 'perf test' entry.

   - Improve the PMU events 'perf test' codebase.

   - perf test: Do not compare overheads in the zstd comp test

   - Better support annotation on ARM.

   - Update 'perf trace's cmd string table to decode sys_bpf() first
     arg.

  Vendor events:

   - Add JSON events and metrics for Intel's Ice Lake, Tiger Lake and
     Elhart Lake.

   - Update JSON eventsand metrics for Intel's Cascade Lake and Sky Lake
     servers.

  Hardware tracing:

   - Improvements for the ARM hardware tracing auxtrace support"

* tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (130 commits)
  perf tests: Add test for PMU aliases
  perf pmu: Add PMU alias support
  perf session: Report collisions in AUX records
  perf script python: Allow reporting the [un]throttle PERF_RECORD_ meta event
  perf build: Report failure for testing feature libopencsd
  perf cs-etm: Show a warning for an unknown magic number
  perf cs-etm: Print the decoder name
  perf cs-etm: Create ETE decoder
  perf cs-etm: Update OpenCSD decoder for ETE
  perf cs-etm: Fix typo
  perf cs-etm: Save TRCDEVARCH register
  perf cs-etm: Refactor out ETMv4 header saving
  perf cs-etm: Initialise architecture based on TRCIDR1
  perf cs-etm: Refactor initialisation of decoder params.
  tools build: Fix feature detect clean for out of source builds
  perf evlist: Add evlist__for_each_entry_from() macro
  perf evsel: Handle precise_ip fallback in evsel__open_cpu()
  perf evsel: Move bpf_counter__install_pe() to success path in evsel__open_cpu()
  perf evsel: Move test_attr__open() to success path in evsel__open_cpu()
  perf evsel: Move ignore_missing_thread() to fallback code
  ...
2021-09-05 11:56:18 -07:00
Linus Torvalds
58ca241587 Merge tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:

 - simplify the Kconfig use of FTRACE and TRACE_IRQFLAGS_SUPPORT

 - bootconfig can now start histograms

 - bootconfig supports group/all enabling

 - histograms now can put values in linear size buckets

 - execnames can be passed to synthetic events

 - introduce "event probes" that attach to other events and can retrieve
   data from pointers of fields, or record fields as different types (a
   pointer to a string as a string instead of just a hex number)

 - various fixes and clean ups

* tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (35 commits)
  tracing/doc: Fix table format in histogram code
  selftests/ftrace: Add selftest for testing duplicate eprobes and kprobes
  selftests/ftrace: Add selftest for testing eprobe events on synthetic events
  selftests/ftrace: Add test case to test adding and removing of event probe
  selftests/ftrace: Fix requirement check of README file
  selftests/ftrace: Add clear_dynamic_events() to test cases
  tracing: Add a probe that attaches to trace events
  tracing/probes: Reject events which have the same name of existing one
  tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs
  tracing/probe: Change traceprobe_set_print_fmt() to take a type
  tracing/probes: Use struct_size() instead of defining custom macros
  tracing/probes: Allow for dot delimiter as well as slash for system names
  tracing/probe: Have traceprobe_parse_probe_arg() take a const arg
  tracing: Have dynamic events have a ref counter
  tracing: Add DYNAMIC flag for dynamic events
  tracing: Replace deprecated CPU-hotplug functions.
  MAINTAINERS: Add an entry for os noise/latency
  tracepoint: Fix kerneldoc comments
  bootconfig/tracing/ktest: Update ktest example for boot-time tracing
  tools/bootconfig: Use per-group/all enable option in ftrace2bconf script
  ...
2021-09-05 11:50:41 -07:00
Linus Torvalds
e07af26266 Merge tag 'arc-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:
 "Finally a big pile of changes for ARC (atomics/mm). These are from our
  internal arc64 tree, preparing mainline for eventual arc64 support.
  I'm spreading them out to avoid tsunami of patches in one release.

   - MM rework:
       - Implement up to 4 paging levels
       - Enable STRICT_MM_TYPECHECK
       - switch pgtable_t back to 'struct page *'

   - Atomics rework / implement relaxed accessors

   - Retire legacy MMUv1,v2; ARC750 cores

   - A few other build errors, typos"

* tag 'arc-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (33 commits)
  ARC: mm: vmalloc sync from kernel to user table to update PMD ...
  ARC: mm: support 4 levels of page tables
  ARC: mm: support 3 levels of page tables
  ARC: mm: switch to asm-generic/pgalloc.h
  ARC: mm: switch pgtable_t back to struct page *
  ARC: mm: hack to allow 2 level build with 4 level code
  ARC: mm: disintegrate pgtable.h into levels and flags
  ARC: mm: disintegrate mmu.h (arcv2 bits out)
  ARC: mm: move MMU specific bits out of entry code ...
  ARC: mm: move MMU specific bits out of ASID allocator
  ARC: mm: non-functional code movement/cleanup
  ARC: mm: pmd_populate* to use the canonical set_pmd (and drop pmd_set)
  ARC: ioremap: use more commonly used PAGE_KERNEL based uncached flag
  ARC: mm: Enable STRICT_MM_TYPECHECKS
  ARC: mm: Fixes to allow STRICT_MM_TYPECHECKS
  ARC: mm: move mmu/cache externs out to setup.h
  ARC: mm: remove tlb paranoid code
  ARC: mm: use SCRATCH_DATA0 register for caching pgdir in ARCv2 only
  ARC: retire MMUv1 and MMUv2 support
  ARC: retire ARC750 support
  ...
2021-09-05 11:43:03 -07:00
Linus Torvalds
063df71a57 Merge tag 'riscv-for-linus-5.15-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:

 - support PC-relative instructions (auipc and branches) in kprobes

 - support for forced IRQ threading

 - support for the hlt/nohlt kernel command line options, via the
   generic idle loop

 - show the edge/level triggered behavior of interrupts
   in /proc/interrupts

 - a handful of cleanups to our address mapping mechanisms

 - support for allocating gigantic hugepages via CMA

 - support for the undefined behavior sanitizer (UBSAN)

 - a handful of cleanups to the VDSO that allow the kernel to build with
   LLD.

 - support for hugepage migration

* tag 'riscv-for-linus-5.15-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits)
  riscv: add support for hugepage migration
  RISC-V: Fix VDSO build for !MMU
  riscv: use strscpy to replace strlcpy
  riscv: explicitly use symbol offsets for VDSO
  riscv: Enable Undefined Behavior Sanitizer UBSAN
  riscv: Keep the riscv Kconfig selects sorted
  riscv: Support allocating gigantic hugepages using CMA
  riscv: fix the global name pfn_base confliction error
  riscv: Move early fdt mapping creation in its own function
  riscv: Simplify BUILTIN_DTB device tree mapping handling
  riscv: Use __maybe_unused instead of #ifdefs around variable declarations
  riscv: Get rid of map_size parameter to create_kernel_page_table
  riscv: Introduce va_kernel_pa_offset for 32-bit kernel
  riscv: Optimize kernel virtual address conversion macro
  dt-bindings: riscv: add starfive jh7100 bindings
  riscv: Enable GENERIC_IRQ_SHOW_LEVEL
  riscv: Enable idle generic idle loop
  riscv: Allow forced irq threading
  riscv: Implement thread_struct whitelist for hardened usercopy
  riscv: kprobes: implement the branch instructions
  ...
2021-09-05 11:31:23 -07:00
Linus Torvalds
3fe617ccaf Enable '-Werror' by default for all kernel builds
... but make it a config option so that broken environments can disable
it when required.

We really should always have a clean build, and will disable specific
over-eager warnings as required, if we can't fix them.  But while I
fairly religiously enforce that in my own tree, it doesn't get enforced
by various build robots that don't necessarily report warnings.

So this just makes '-Werror' a default compiler flag, but allows people
to disable it for their configuration if they have some particular
issues.

Occasionally, new compiler versions end up enabling new warnings, and it
can take a while before we have them fixed (or the warnings disabled if
that is what it takes), so the config option allows for that situation.

Hopefully this will mean that I get fewer pull requests that have new
warnings that were not noticed by various automation we have in place.

Knock wood.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-05 11:24:05 -07:00
Linus Torvalds
fd47ff55c9 Merge tag 'usb-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull more USB updates from Greg KH:
 "Here are some straggler USB-serial changes for 5.15-rc1.

  These were not included in the first pull request as they came in
  "late" from Johan and I had missed them in my pull request earlier
  this week.

  Nothing big in here, just some USB to serial driver updates and fixes.
  All of these were in linux-next before I pulled them into my tree, and
  have been in linux-next all this week from my tree with no reported
  problems"

* tag 'usb-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: pl2303: fix GL type detection
  USB: serial: replace symbolic permissions by octal permissions
  USB: serial: cp210x: determine fw version for CP2105 and CP2108
  USB: serial: cp210x: clean up type detection
  USB: serial: cp210x: clean up set-chars request
  USB: serial: cp210x: clean up control-request timeout
  USB: serial: cp210x: fix flow-control error handling
  USB: serial: cp210x: fix control-characters error handling
  USB: serial: io_edgeport: drop unused descriptor helper
2021-09-05 11:19:15 -07:00
Rafał Miłecki
63f8428b40 net: dsa: b53: Fix IMP port setup on BCM5301x
Broadcom's b53 switches have one IMP (Inband Management Port) that needs
to be programmed using its own designed register. IMP port may be
different than CPU port - especially on devices with multiple CPU ports.

For that reason it's required to explicitly note IMP port index and
check for it when choosing a register to use.

This commit fixes BCM5301x support. Those switches use CPU port 5 while
their IMP port is 8. Before this patch b53 was trying to program port 5
with B53_PORT_OVERRIDE_CTRL instead of B53_GMII_PORT_OVERRIDE_CTRL(5).

It may be possible to also replace "cpu_port" usages with
dsa_is_cpu_port() but that is out of the scope of thix BCM5301x fix.

Fixes: 967dd82ffc ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05 19:02:38 +01:00
Willem de Bruijn
8a0ed250f9 ip_gre: validate csum_start only on pull
The GRE tunnel device can pull existing outer headers in ipge_xmit.
This is a rare path, apparently unique to this device. The below
commit ensured that pulling does not move skb->data beyond csum_start.

But it has a false positive if ip_summed is not CHECKSUM_PARTIAL and
thus csum_start is irrelevant.

Refine to exclude this. At the same time simplify and strengthen the
test.

Simplify, by moving the check next to the offending pull, making it
more self documenting and removing an unnecessary branch from other
code paths.

Strengthen, by also ensuring that the transport header is correct and
therefore the inner headers will be after skb_reset_inner_headers.
The transport header is set to csum_start in skb_partial_csum_set.

Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
Fixes: 1d011c4803 ("ip_gre: add validation for csum_start")
Reported-by: Ido Schimmel <idosch@idosch.org>
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05 18:59:32 +01:00
Linus Torvalds
6b6dc4f40c Merge tag 'mtd/for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD updates from Miquel Raynal:
 "MTD changes:
   - blkdevs:
       - Simplify the refcounting in blktrans_{open, release}
       - Simplify blktrans_getgeo
       - Remove blktrans_ref_mutex
       - Simplify blktrans_dev_get
       - Use lockdep_assert_held
       - Don't hold del_mtd_blktrans_dev in blktrans_{open, release}
   - ftl:
       - Don't cast away the type when calling add_mtd_blktrans_dev
       - Don't cast away the type when calling add_mtd_blktrans_dev
       - Use container_of() rather than cast
       - Fix use-after-free
       - Add discard support
       - Allow use of MTD_RAM for testing purposes
   - concat:
       - Check _read, _write callbacks existence before assignment
       - Judge callback existence based on the master
   - maps:
       - Maps: remove dead MTD map driver for PMC-Sierra MSP boards
   - mtdblock:
       - Warn if added for a NAND device
       - Add comment about UBI block devices
       - Update old JFFS2 mention in Kconfig
   - partitions:
       - Redboot: convert to YAML

  NAND core changes:
   - Repair Miquel Raynal's email address in MAINTAINERS
   - Fix a couple of spelling mistakes in Kconfig
   - bbt: Skip bad blocks when searching for the BBT in NAND
   - Remove never changed ret variable

  Raw NAND changes:
   - cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()'
   - intel: Fix error handling in probe
   - omap: Fix kernel doc warning on 'calcuate' typo
   - gpmc: Fix the ECC bytes vs. OOB bytes equation

  SPI-NAND core changes:
   - Properly fill the OOB area.
   - Fix comment

  SPI-NAND drivers changes:
   - macronix: Add Quad support for serial NAND flash"

* tag 'mtd/for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (30 commits)
  mtd: rawnand: cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()'
  mtd_blkdevs: simplify the refcounting in blktrans_{open, release}
  mtd_blkdevs: simplify blktrans_getgeo
  mtd_blkdevs: remove blktrans_ref_mutex
  mtd_blkdevs: simplify blktrans_dev_get
  mtd/rfd_ftl: don't cast away the type when calling add_mtd_blktrans_dev
  mtd/ftl: don't cast away the type when calling add_mtd_blktrans_dev
  mtd_blkdevs: use lockdep_assert_held
  mtd_blkdevs: don't hold del_mtd_blktrans_dev in blktrans_{open, release}
  mtd: rawnand: intel: Fix error handling in probe
  mtd: mtdconcat: Check _read, _write callbacks existence before assignment
  mtd: mtdconcat: Judge callback existence based on the master
  mtd: maps: remove dead MTD map driver for PMC-Sierra MSP boards
  mtd: rfd_ftl: use container_of() rather than cast
  mtd: rfd_ftl: fix use-after-free
  mtd: rfd_ftl: add discard support
  mtd: rfd_ftl: allow use of MTD_RAM for testing purposes
  mtdblock: Warn if added for a NAND device
  mtd: spinand: macronix: Add Quad support for serial NAND flash
  mtdblock: Add comment about UBI block devices
  ...
2021-09-05 10:50:12 -07:00
Geert Uytterhoeven
0319b848b1 binfmt: a.out: Fix bogus semicolon
fs/binfmt_aout.c: In function ‘load_aout_library’:
    fs/binfmt_aout.c:311:27: error: expected ‘)’ before ‘;’ token
      311 |    MAP_FIXED | MAP_PRIVATE;
	  |                           ^
    fs/binfmt_aout.c:309:10: error: too few arguments to function ‘vm_mmap’
      309 |  error = vm_mmap(file, start_addr, ex.a_text + ex.a_data,
	  |          ^~~~~~~
    In file included from fs/binfmt_aout.c:12:
    include/linux/mm.h:2626:35: note: declared here
     2626 | extern unsigned long __must_check vm_mmap(struct file *, unsigned long,
	  |                                   ^~~~~~~

Fix this by reverting the accidental replacement of a comma by a
semicolon.

Fixes: 42be8b4253 ("binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()")
Reported-by: noreply@ellerman.id.au
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-05 10:15:05 -07:00
David Decotigny
0a4fd8df07 bonding: complain about missing route only once for A/B ARP probes
On configs where there is no confirgured direct route to the target of
the ARP probes, these probes are still sent and may be replied to
properly, so no need to repeatedly complain about the missing route.

Signed-off-by: David Decotigny <ddecotig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05 13:16:51 +01:00
Antonio Quartulli
e5dd729460 ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address
GRE interfaces are not Ether-like and therefore it is not
possible to generate the v6LL address the same way as (for example)
GRETAP devices.

With default settings, a GRE interface will attempt generating its v6LL
address using the EUI64 approach, but this will fail when the local
endpoint of the GRE tunnel is set to "any". In this case the GRE
interface will end up with no v6LL address, thus violating RFC4291.

SIT interfaces already implement a different logic to ensure that a v6LL
address is always computed.

Change the GRE v6LL generation logic to follow the same approach as SIT.
This way GRE interfaces will always have a v6LL address as well.

Behaviour of GRETAP interfaces has not been changed as they behave like
classic Ether-like interfaces.

To avoid code duplication sit_add_v4_addrs() has been renamed to
add_v4_addrs() and adapted to handle also the IP6GRE/GRE cases.

Signed-off-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05 13:14:37 +01:00
Song Yoong Siang
81d0885d68 net: stmmac: Fix overall budget calculation for rxtx_napi
tx_done is not used for napi_complete_done(). Thus, NAPI busy polling
mechanism by gro_flush_timeout and napi_defer_hard_irqs will not able
be triggered after a packet is transmitted when there is no receive
packet.

Fix this by taking the maximum value between tx_done and rx_done as
overall budget completed by the rxtx NAPI poll to ensure XDP Tx ZC
operation is continuously polling for next Tx frame. This gives
benefit of lower packet submission processing latency and jitter
under XDP Tx ZC mode.

Performance of tx-only using xdp-sock on Intel ADL-S platform is
the same with and without this patch.

root@intel-corei7-64:~# ./xdpsock -i enp0s30f4 -t -z -q 1 -n 10
 sock0@enp0s30f4:1 txonly xdp-drv
                   pps            pkts           10.00
rx                 0              0
tx                 511630         8659520

 sock0@enp0s30f4:1 txonly xdp-drv
                   pps            pkts           10.00
rx                 0              0
tx                 511625         13775808

 sock0@enp0s30f4:1 txonly xdp-drv
                   pps            pkts           10.00
rx                 0              0
tx                 511619         18892032

Fixes: 132c32ee5b ("net: stmmac: Add TX via XDP zero-copy socket")
Cc: <stable@vger.kernel.org> # 5.13.x
Co-developed-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05 13:02:20 +01:00
Christophe JAILLET
45010c080e iwlwifi: pnvm: Fix a memory leak in 'iwl_pnvm_get_from_fs()'
A firmware is requested but never released in this function. This leads to
a memory leak in the normal execution path.

Add the missing 'release_firmware()' call.
Also introduce a temp variable (new_len) in order to keep the value of
'pnvm->size' after the firmware has been released.

Fixes: cdda18fbbe ("iwlwifi: pnvm: move file loading code to a separate function")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Luca Coelho <luca@coelho.fi>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1b5d80f54c1dbf85710fd285243932943b498fe7.1630614969.git.christophe.jaillet@wanadoo.fr
2021-09-05 09:11:10 +03:00
Zhang Yi
cc883236b7 ext4: drop unnecessary journal handle in delalloc write
After we factor out the inline data write procedure from
ext4_da_write_end(), we don't need to start journal handle for the cases
of both buffer overwrite and append-write. If we need to update
i_disksize, mark_inode_dirty() do start handle and update inode buffer.
So we could just remove all the journal handle codes in the delalloc
write procedure.

After this patch, we could get a lot of performance improvement. Below
is the Unixbench comparison data test on my machine with 'Intel Xeon
Gold 5120' CPU and nvme SSD backend.

Test cmd:

  ./Run -c 56 -i 3 fstime fsbuffer fsdisk

Before this patch:

  System Benchmarks Partial Index           BASELINE       RESULT   INDEX
  File Copy 1024 bufsize 2000 maxblocks       3960.0     422965.0   1068.1
  File Copy 256 bufsize 500 maxblocks         1655.0     105077.0   634.9
  File Copy 4096 bufsize 8000 maxblocks       5800.0    1429092.0   2464.0
                                                                    ======
  System Benchmarks Index Score (Partial Only)                      1186.6

After this patch:

  System Benchmarks Partial Index           BASELINE       RESULT   INDEX
  File Copy 1024 bufsize 2000 maxblocks       3960.0     732716.0   1850.3
  File Copy 256 bufsize 500 maxblocks         1655.0     184940.0   1117.5
  File Copy 4096 bufsize 8000 maxblocks       5800.0    2427152.0   4184.7
                                                                    ======
  System Benchmarks Index Score (Partial Only)                      2053.0

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210716122024.1105856-5-yi.zhang@huawei.com
2021-09-04 23:38:18 -04:00
Zhang Yi
6984aef598 ext4: factor out write end code of inline file
Now that the inline_data file write end procedure are falled into the
common write end functions, it is not clear. Factor them out and do
some cleanup. This patch also drop ext4_da_write_inline_data_end()
and switch to use ext4_write_inline_data_end() instead because we also
need to do the same error processing if we failed to write data into
inline entry.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210716122024.1105856-4-yi.zhang@huawei.com
2021-09-04 23:38:18 -04:00
Zhang Yi
55ce2f649b ext4: correct the error path of ext4_write_inline_data_end()
Current error path of ext4_write_inline_data_end() is not correct.

Firstly, it should pass out the error value if ext4_get_inode_loc()
return fail, or else it could trigger infinite loop if we inject error
here. And then it's better to add inode to orphan list if it return fail
in ext4_journal_stop(), otherwise we could not restore inline xattr
entry after power failure. Finally, we need to reset the 'ret' value if
ext4_write_inline_data_end() return success in ext4_write_end() and
ext4_journalled_write_end(), otherwise we could not get the error return
value of ext4_journal_stop().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210716122024.1105856-3-yi.zhang@huawei.com
2021-09-04 23:38:18 -04:00
Zhang Yi
4df031ff58 ext4: check and update i_disksize properly
After commit 3da40c7b08 ("ext4: only call ext4_truncate when size <=
isize"), i_disksize could always be updated to i_size in ext4_setattr(),
and we could sure that i_disksize <= i_size since holding inode lock and
if i_disksize < i_size there are delalloc writes pending in the range
upto i_size. If the end of the current write is <= i_size, there's no
need to touch i_disksize since writeback will push i_disksize upto
i_size eventually. So we can switch to check i_size instead of
i_disksize in ext4_da_write_end() when write to the end of the file.
we also could remove ext4_mark_inode_dirty() together because we defer
inode dirtying to generic_write_end() or ext4_da_write_inline_data_end().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210716122024.1105856-2-yi.zhang@huawei.com
2021-09-04 23:38:18 -04:00
Christian Schoenebeck
9c4d94dc9a net/9p: increase default msize to 128k
Let's raise the default msize value to 128k.

The 'msize' option defines the maximum message size allowed for any
message being transmitted (in both directions) between 9p server and 9p
client during a 9p session.

Currently the default 'msize' is just 8k, which is way too conservative.
Such a small 'msize' value has quite a negative performance impact,
because individual 9p messages have to be split up far too often into
numerous smaller messages to fit into this message size limitation.

A default value of just 8k also has a much higher probablity of hitting
short-read issues like: https://gitlab.com/qemu-project/qemu/-/issues/409

Unfortunately user feedback showed that many 9p users are not aware that
this option even exists, nor the negative impact it might have if it is
too low.

Link: http://lkml.kernel.org/r/61ea0f0faaaaf26dd3c762eabe4420306ced21b9.1630770829.git.linux_oss@crudebyte.com
Link: https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01003.html
Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2021-09-05 08:36:44 +09:00
Christian Schoenebeck
9210fc0a3b net/9p: use macro to define default msize
Use a macro to define the default value for the 'msize' option
at one place instead of using two separate integer literals.

Link: http://lkml.kernel.org/r/28bb651ae0349a7d57e8ddc92c1bd5e62924a912.1630770829.git.linux_oss@crudebyte.com
Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2021-09-05 08:36:44 +09:00
Dominique Martinet
22bb3b7929 net/9p: increase tcp max msize to 1MB
Historically TCP has been limited to 64K buffers, but increasing
msize provides huge performance benefits especially as latency
increase so allow for bigger buffers.

Ideally further improvements could change the allocation from the
current contiguous chunk in slab (kmem_cache) to some scatter-gather
compatible API...

Note this only increases the max possible setting, not the default
value.

Link: http://lkml.kernel.org/r/YTQB5jCbvhmCWzNd@codewreck.org
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2021-09-05 08:35:56 +09:00
Yang Li
0097ae5f7a NTB: perf: Fix an error code in perf_setup_inbuf()
When the function IS_ALIGNED() returns false, the value of ret is 0.
So, we set ret to -EINVAL to indicate this error.

Clean up smatch warning:
drivers/ntb/test/ntb_perf.c:602 perf_setup_inbuf() warn: missing error
code 'ret'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2021-09-04 18:36:04 -04:00
Yang Li
319f83ac98 NTB: Fix an error code in ntb_msit_probe()
When the value of nm->isr_ctx is false, the value of ret is 0.
So, we set ret to -ENOMEM to indicate this error.

Clean up smatch warning:
drivers/ntb/test/ntb_msi_test.c:373 ntb_msit_probe() warn: missing
error code 'ret'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2021-09-04 18:36:04 -04:00
Dave Jiang
f3b6b10fcc ntb: intel: remove invalid email address in header comment
Remove Jon's old email address.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2021-09-04 18:36:04 -04:00
Linus Torvalds
49624efa65 Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux
Pull MAP_DENYWRITE removal from David Hildenbrand:
 "Remove all in-tree usage of MAP_DENYWRITE from the kernel and remove
  VM_DENYWRITE.

  There are some (minor) user-visible changes:

   - We no longer deny write access to shared libaries loaded via legacy
     uselib(); this behavior matches modern user space e.g. dlopen().

   - We no longer deny write access to the elf interpreter after exec
     completed, treating it just like shared libraries (which it often
     is).

   - We always deny write access to the file linked via /proc/pid/exe:
     sys_prctl(PR_SET_MM_MAP/EXE_FILE) will fail if write access to the
     file cannot be denied, and write access to the file will remain
     denied until the link is effectivel gone (exec, termination,
     sys_prctl(PR_SET_MM_MAP/EXE_FILE)) -- just as if exec'ing the file.

  Cross-compiled for a bunch of architectures (alpha, microblaze, i386,
  s390x, ...) and verified via ltp that especially the relevant tests
  (i.e., creat07 and execve04) continue working as expected"

* tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux:
  fs: update documentation of get_write_access() and friends
  mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()
  mm: remove VM_DENYWRITE
  binfmt: remove in-tree usage of MAP_DENYWRITE
  kernel/fork: always deny write access to current MM exe_file
  kernel/fork: factor out replacing the current MM exe_file
  binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()
2021-09-04 11:35:47 -07:00
Linus Torvalds
f7464060f7 Merge git://github.com/Paragon-Software-Group/linux-ntfs3
Merge NTFSv3 filesystem from Konstantin Komarov:
 "This patch adds NTFS Read-Write driver to fs/ntfs3.

  Having decades of expertise in commercial file systems development and
  huge test coverage, we at Paragon Software GmbH want to make our
  contribution to the Open Source Community by providing implementation
  of NTFS Read-Write driver for the Linux Kernel.

  This is fully functional NTFS Read-Write driver. Current version works
  with NTFS (including v3.1) and normal/compressed/sparse files and
  supports journal replaying.

  We plan to support this version after the codebase once merged, and
  add new features and fix bugs. For example, full journaling support
  over JBD will be added in later updates"

Link: https://lore.kernel.org/lkml/20210729134943.778917-1-almaz.alexandrovich@paragon-software.com/
Link: https://lore.kernel.org/lkml/aa4aa155-b9b2-9099-b7a2-349d8d9d8fbd@paragon-software.com/

* git://github.com/Paragon-Software-Group/linux-ntfs3: (35 commits)
  fs/ntfs3: Change how module init/info messages are displayed
  fs/ntfs3: Remove GPL boilerplates from decompress lib files
  fs/ntfs3: Remove unnecessary condition checking from ntfs_file_read_iter
  fs/ntfs3: Fix integer overflow in ni_fiemap with fiemap_prep()
  fs/ntfs3: Restyle comments to better align with kernel-doc
  fs/ntfs3: Rework file operations
  fs/ntfs3: Remove fat ioctl's from ntfs3 driver for now
  fs/ntfs3: Restyle comments to better align with kernel-doc
  fs/ntfs3: Fix error handling in indx_insert_into_root()
  fs/ntfs3: Potential NULL dereference in hdr_find_split()
  fs/ntfs3: Fix error code in indx_add_allocate()
  fs/ntfs3: fix an error code in ntfs_get_acl_ex()
  fs/ntfs3: add checks for allocation failure
  fs/ntfs3: Use kcalloc/kmalloc_array over kzalloc/kmalloc
  fs/ntfs3: Do not use driver own alloc wrappers
  fs/ntfs3: Use kernel ALIGN macros over driver specific
  fs/ntfs3: Restyle comment block in ni_parse_reparse()
  fs/ntfs3: Remove unused including <linux/version.h>
  fs/ntfs3: Fix fall-through warnings for Clang
  fs/ntfs3: Fix one none utf8 char in source file
  ...
2021-09-04 11:15:50 -07:00
Linus Torvalds
6abaa83c73 Merge tag 'f2fs-for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "In this cycle, we've addressed some performance issues such as lock
  contention, misbehaving compress_cache, allowing extent_cache for
  compressed files, and new sysfs to adjust ra_size for fadvise.

  In order to diagnose the performance issues quickly, we also added an
  iostat which shows the IO latencies periodically.

  On the stability side, we've found two memory leakage cases in the
  error path in compression flow. And, we've also fixed various corner
  cases in fiemap, quota, checkpoint=disable, zstd, and so on.

  Enhancements:
   - avoid long checkpoint latency by releasing nat_tree_lock
   - collect and show iostats periodically
   - support extent_cache for compressed files
   - add a sysfs entry to manage ra_size given fadvise(POSIX_FADV_SEQUENTIAL)
   - report f2fs GC status via sysfs
   - add discard_unit=%s in mount option to handle zoned device

  Bug fixes:
   - fix two memory leakages when an error happens in the compressed IO flow
   - fix commpress_cache to get the right LBA
   - fix fiemap to deal with compressed case correctly
   - fix wrong EIO returns due to SBI_NEED_FSCK
   - fix missing writes when enabling checkpoint back
   - fix quota deadlock
   - fix zstd level mount option

  In addition to the above major updates, we've cleaned up several code
  paths such as dio, unnecessary operations, debugfs/f2fs/status, sanity
  check, and typos"

* tag 'f2fs-for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits)
  f2fs: should put a page beyond EOF when preparing a write
  f2fs: deallocate compressed pages when error happens
  f2fs: enable realtime discard iff device supports discard
  f2fs: guarantee to write dirty data when enabling checkpoint back
  f2fs: fix to unmap pages from userspace process in punch_hole()
  f2fs: fix unexpected ENOENT comes from f2fs_map_blocks()
  f2fs: fix to account missing .skipped_gc_rwsem
  f2fs: adjust unlock order for cleanup
  f2fs: Don't create discard thread when device doesn't support realtime discard
  f2fs: rebuild nat_bits during umount
  f2fs: introduce periodic iostat io latency traces
  f2fs: separate out iostat feature
  f2fs: compress: do sanity check on cluster
  f2fs: fix description about main_blkaddr node
  f2fs: convert S_IRUGO to 0444
  f2fs: fix to keep compatibility of fault injection interface
  f2fs: support fault injection for f2fs_kmem_cache_alloc()
  f2fs: compress: allow write compress released file after truncate to zero
  f2fs: correct comment in segment.h
  f2fs: improve sbi status info in debugfs/f2fs/status
  ...
2021-09-04 10:48:47 -07:00
Phillip Potter
55d1308bdf cdrom: update uniform CD-ROM maintainership in MAINTAINERS file
Update maintainership for the uniform CD-ROM driver from Jens Axboe to
Phillip Potter in MAINTAINERS file, to reflect the attempt to pass on
maintainership of this driver to a different individual. Also remove
URL to site which is no longer active.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Phillip Potter <phil@philpotter.co.uk>
Link: https://lore.kernel.org/r/20210904174030.1103-1-phil@philpotter.co.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-04 11:47:41 -06:00
Linus Torvalds
0961f0c00e Merge tag 'nfs-for-5.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Better client responsiveness when server isn't replying
   - Use refcount_t in sunrpc rpc_client refcount tracking
   - Add srcaddr and dst_port to the sunrpc sysfs info files
   - Add basic support for connection sharing between servers with multiple NICs`

  Bugfixes and Cleanups:
   - Sunrpc tracepoint cleanups
   - Disconnect after ib_post_send() errors to avoid deadlocks
   - Fix for tearing down rpcrdma_reps
   - Fix a potential pNFS layoutget livelock loop
   - pNFS layout barrier fixes
   - Fix a potential memory corruption in rpc_wake_up_queued_task_set_status()
   - Fix reconnection locking
   - Fix return value of get_srcport()
   - Remove rpcrdma_post_sends()
   - Remove pNFS dead code
   - Remove copy size restriction for inter-server copies
   - Overhaul the NFS callback service
   - Clean up sunrpc TCP socket shutdowns
   - Always provide aligned buffers to RPC read layers"

* tag 'nfs-for-5.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (39 commits)
  NFS: Always provide aligned buffers to the RPC read layers
  NFSv4.1 add network transport when session trunking is detected
  SUNRPC enforce creation of no more than max_connect xprts
  NFSv4 introduce max_connect mount options
  SUNRPC add xps_nunique_destaddr_xprts to xprt_switch_info in sysfs
  SUNRPC keep track of number of transports to unique addresses
  NFSv3: Delete duplicate judgement in nfs3_async_handle_jukebox
  SUNRPC: Tweak TCP socket shutdown in the RPC client
  SUNRPC: Simplify socket shutdown when not reusing TCP ports
  NFSv4.2: remove restriction of copy size for inter-server copy.
  NFS: Clean up the synopsis of callback process_op()
  NFS: Extract the xdr_init_encode/decode() calls from decode_compound
  NFS: Remove unused callback void decoder
  NFS: Add a private local dispatcher for NFSv4 callback operations
  SUNRPC: Eliminate the RQ_AUTHERR flag
  SUNRPC: Set rq_auth_stat in the pg_authenticate() callout
  SUNRPC: Add svc_rqst::rq_auth_stat
  SUNRPC: Add dst_port to the sysfs xprt info file
  SUNRPC: Add srcaddr as a file in sysfs
  sunrpc: Fix return value of get_srcport()
  ...
2021-09-04 10:25:26 -07:00
Christophe JAILLET
ecbd690b52 octeontx2-af: Fix some memory leaks in the error handling path of 'cgx_lmac_init()'
Memory allocated before 'lmac' is stored in 'cgx->lmac_idmap[]' must be
freed explicitly. Otherwise, in case of error, it will leak.

Rename the 'err_irq' label to better describe what is done at this place in
the error handling path.

Fixes: 6f14078e3e ("octeontx2-af: DMAC filter support in MAC block")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-04 13:07:00 +01:00
Christophe JAILLET
d863ca67bb octeontx2-af: Add a 'rvu_free_bitmap()' function
In order to match 'rvu_alloc_bitmap()', add a 'rvu_free_bitmap()' function

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-04 13:07:00 +01:00
Yang Li
7db8263a12 ethtool: Fix an error code in cxgb2.c
When adapter->registered_device_map is NULL, the value of err is
uncertain, we set err to -EINVAL to avoid ambiguity.

Clean up smatch warning:
drivers/net/ethernet/chelsio/cxgb/cxgb2.c:1114 init_one() warn: missing
error code 'err'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-04 10:52:06 +01:00
Dinghao Liu
9ddbc2a00d qlcnic: Remove redundant unlock in qlcnic_pinit_from_rom
Previous commit 68233c583a removes the qlcnic_rom_lock()
in qlcnic_pinit_from_rom(), but remains its corresponding
unlock function, which is odd. I'm not very sure whether the
lock is missing, or the unlock is redundant. This bug is
suggested by a static analysis tool, please advise.

Fixes: 68233c583a ("qlcnic: updated reset sequence")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-04 10:50:59 +01:00
Eric Dumazet
c7c5e6ff53 fq_codel: reject silly quantum parameters
syzbot found that forcing a big quantum attribute would crash hosts fast,
essentially using this:

tc qd replace dev eth0 root fq_codel quantum 4294967295

This is because fq_codel_dequeue() would have to loop
~2^31 times in :

	if (flow->deficit <= 0) {
		flow->deficit += q->quantum;
		list_move_tail(&flow->flowchain, &q->old_flows);
		goto begin;
	}

SFQ max quantum is 2^19 (half a megabyte)
Lets adopt a max quantum of one megabyte for FQ_CODEL.

Fixes: 4b549a2ef4 ("fq_codel: Fair Queue Codel AQM")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-04 10:49:46 +01:00
Vlastimil Babka
bd0e7491a9 mm, slub: convert kmem_cpu_slab protection to local_lock
Embed local_lock into struct kmem_cpu_slab and use the irq-safe versions of
local_lock instead of plain local_irq_save/restore. On !PREEMPT_RT that's
equivalent, with better lockdep visibility. On PREEMPT_RT that means better
preemption.

However, the cost on PREEMPT_RT is the loss of lockless fast paths which only
work with cpu freelist. Those are designed to detect and recover from being
preempted by other conflicting operations (both fast or slow path), but the
slow path operations assume they cannot be preempted by a fast path operation,
which is guaranteed naturally with disabled irqs. With local locks on
PREEMPT_RT, the fast paths now also need to take the local lock to avoid races.

In the allocation fastpath slab_alloc_node() we can just defer to the slowpath
__slab_alloc() which also works with cpu freelist, but under the local lock.
In the free fastpath do_slab_free() we have to add a new local lock protected
version of freeing to the cpu freelist, as the existing slowpath only works
with the page freelist.

Also update the comment about locking scheme in SLUB to reflect changes done
by this series.

[ Mike Galbraith <efault@gmx.de>: use local_lock() without irq in PREEMPT_RT
  scope; debugging of RT crashes resulting in put_cpu_partial() locking changes ]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2021-09-04 10:22:01 +02:00
Vlastimil Babka
25c00c506e mm, slub: use migrate_disable() on PREEMPT_RT
We currently use preempt_disable() (directly or via get_cpu_ptr()) to stabilize
the pointer to kmem_cache_cpu. On PREEMPT_RT this would be incompatible with
the list_lock spinlock. We can use migrate_disable() instead, but that
increases overhead on !PREEMPT_RT as it's an unconditional function call.

In order to get the best available mechanism on both PREEMPT_RT and
!PREEMPT_RT, introduce private slub_get_cpu_ptr() and slub_put_cpu_ptr()
wrappers and use them.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2021-09-04 10:21:32 +02:00
Vlastimil Babka
e0a043aa41 mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg
Jann Horn reported [1] the following theoretically possible race:

  task A: put_cpu_partial() calls preempt_disable()
  task A: oldpage = this_cpu_read(s->cpu_slab->partial)
  interrupt: kfree() reaches unfreeze_partials() and discards the page
  task B (on another CPU): reallocates page as page cache
  task A: reads page->pages and page->pobjects, which are actually
  halves of the pointer page->lru.prev
  task B (on another CPU): frees page
  interrupt: allocates page as SLUB page and places it on the percpu partial list
  task A: this_cpu_cmpxchg() succeeds

  which would cause page->pages and page->pobjects to end up containing
  halves of pointers that would then influence when put_cpu_partial()
  happens and show up in root-only sysfs files. Maybe that's acceptable,
  I don't know. But there should probably at least be a comment for now
  to point out that we're reading union fields of a page that might be
  in a completely different state.

Additionally, the this_cpu_cmpxchg() approach in put_cpu_partial() is only safe
against s->cpu_slab->partial manipulation in ___slab_alloc() if the latter
disables irqs, otherwise a __slab_free() in an irq handler could call
put_cpu_partial() in the middle of ___slab_alloc() manipulating ->partial
and corrupt it. This becomes an issue on RT after a local_lock is introduced
in later patch. The fix means taking the local_lock also in put_cpu_partial()
on RT.

After debugging this issue, Mike Galbraith suggested [2] that to avoid
different locking schemes on RT and !RT, we can just protect put_cpu_partial()
with disabled irqs (to be converted to local_lock_irqsave() later) everywhere.
This should be acceptable as it's not a fast path, and moving the actual
partial unfreezing outside of the irq disabled section makes it short, and with
the retry loop gone the code can be also simplified. In addition, the race
reported by Jann should no longer be possible.

[1] https://lore.kernel.org/lkml/CAG48ez1mvUuXwg0YPH5ANzhQLpbphqk-ZS+jbRz+H66fvm4FcA@mail.gmail.com/
[2] https://lore.kernel.org/linux-rt-users/e3470ab357b48bccfbd1f5133b982178a7d2befb.camel@gmx.de/

Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2021-09-04 10:20:10 +02:00
Vlastimil Babka
a2b4ae8bfd mm, slub: make slab_lock() disable irqs with PREEMPT_RT
We need to disable irqs around slab_lock() (a bit spinlock) to make it
irq-safe. Most calls to slab_lock() are nested under spin_lock_irqsave() which
doesn't disable irqs on PREEMPT_RT, so add explicit disabling with PREEMPT_RT.
The exception is cmpxchg_double_slab() which already disables irqs, so use a
__slab_[un]lock() variant without irq disable there.

slab_[un]lock() thus needs a flags pointer parameter, which is unused on !RT.
free_debug_processing() now has two flags variables, which looks odd, but only
one is actually used - the one used in spin_lock_irqsave() on !RT and the one
used in slab_lock() on RT.

As a result, __cmpxchg_double_slab() and cmpxchg_double_slab() become
effectively identical on RT, as both will disable irqs, which is necessary on
RT as most callers of this function also rely on irqsaving lock operations.
Thus, assert that irqs are already disabled in __cmpxchg_double_slab() only on
!RT and also change the VM_BUG_ON assertion to the more standard lockdep_assert
one.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2021-09-04 10:17:33 +02:00
Sebastian Andrzej Siewior
94ef0304e2 mm: slub: make object_map_lock a raw_spinlock_t
The variable object_map is protected by object_map_lock. The lock is always
acquired in debug code and within already atomic context

Make object_map_lock a raw_spinlock_t.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2021-09-04 10:16:45 +02:00
Geert Uytterhoeven
3e204d6b76 Input: adc-keys - drop bogus __refdata annotation
As the ADC ladder input driver does not have any code or data located in
initmem, there is no need to annotate the adc_keys_driver structure with
__refdata.  Drop the annotation, to avoid suppressing future section
warnings.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/7091e8213602be64826fd689a7337246d218f3b1.1626255421.git.geert+renesas@glider.be
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2021-09-03 23:49:58 -07:00
Colin Ian King
7ec7c72fbf Input: Fix spelling mistake in Kconfig "useable" -> "usable"
There is a spelling mistake in the Kconfig text. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210705100230.7583-1-colin.king@canonical.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2021-09-03 23:49:57 -07:00
Colin Ian King
ca595ac271 Input: Fix spelling mistake in Kconfig "Modul" -> "Module"
There is a spelling mistake in the Kconfig text. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210704095702.37567-1-colin.king@canonical.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2021-09-03 23:49:55 -07:00
Namjae Jeon
303fff2b8c ksmbd: add validation for ndr read/write functions
If ndr->length is smaller than expected size, ksmbd can access invalid
access in ndr->data. This patch add validation to check ndr->offset is
over ndr->length. and added exception handling to check return value of
ndr read/write function.

Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03 23:29:45 -05:00
Namjae Jeon
687c59e702 ksmbd: remove unused ksmbd_file_table_flush function
ksmbd_file_table_flush is a leftover from SMB1. This function is no longer
needed as SMB1 has been removed from ksmbd.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03 23:29:45 -05:00
Hyunchul Lee
72d6cbb533 ksmbd: smbd: fix dma mapping error in smb_direct_post_send_data
Becase smb direct header is mapped and msg->num_sge
already is incremented, the decrement should be
removed from the condition.

Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03 23:29:45 -05:00
Per Forlin
d475866eee ksmbd: Reduce error log 'speed is unknown' to debug
This log happens on servers with a network bridge since
the bridge does not have a specified link speed.
This is not a real error so change the error log to debug instead.

Signed-off-by: Per Forlin <perfn@axis.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03 23:29:45 -05:00
Christian Brauner
28a5d3de9d ksmbd: defer notify_change() call
When ownership is changed we might in certain scenarios loose the
ability to alter the inode after we changed ownership. This can e.g.
happen when we are on an idmapped mount where uid 0 is mapped to uid
1000 and uid 1000 is mapped to uid 0.
A caller with fs*id 1000 will be able to create files as *id 1000 on
disk. They will also be able to change ownership of files owned by *id 0
to *id 1000 but they won't be able to change ownership in the other
direction. This means acl operations following notify_change() would
fail. Move the notify_change() call after the acls have been updated.
This guarantees that we don't end up with spurious "hash value diff"
warnings later on because we managed to change ownership but didn't
manage to alter acls.

Cc: Steve French <stfrench@microsoft.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Hyunchul Lee <hyc.lee@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03 23:29:45 -05:00
Christian Brauner
db7fb6fe3d ksmbd: remove setattr preparations in set_file_basic_info()
Permission checking and copying over ownership information is the task
of the underlying filesystem not ksmbd. The order is also wrong here.
This modifies the inode before notify_change(). If notify_change() fails
this will have changed ownership nonetheless. All of this is unnecessary
though since the underlying filesystem's ->setattr handler will do all
this (if required) by itself.

Cc: Steve French <stfrench@microsoft.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Hyunchul Lee <hyc.lee@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03 23:29:45 -05:00