xprt_destory() claims XPRT_LOCKED and then calls del_timer_sync().
Both xprt_unlock_connect() and xprt_release() call
->release_xprt()
which drops XPRT_LOCKED and *then* xprt_schedule_autodisconnect()
which calls mod_timer().
This may result in mod_timer() being called *after* del_timer_sync().
When this happens, the timer may fire long after the xprt has been freed,
and run_timer_softirq() will probably crash.
The pairing of ->release_xprt() and xprt_schedule_autodisconnect() is
always called under ->transport_lock. So if we take ->transport_lock to
call del_timer_sync(), we can be sure that mod_timer() will run first
(if it runs at all).
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Pull tracing updates from Steven Rostedt:
- New user_events interface. User space can register an event with the
kernel describing the format of the event. Then it will receive a
byte in a page mapping that it can check against. A privileged task
can then enable that event like any other event, which will change
the mapped byte to true, telling the user space application to start
writing the event to the tracing buffer.
- Add new "ftrace_boot_snapshot" kernel command line parameter. When
set, the tracing buffer will be saved in the snapshot buffer at boot
up when the kernel hands things over to user space. This will keep
the traces that happened at boot up available even if user space boot
up has tracing as well.
- Have TRACE_EVENT_ENUM() also update trace event field type
descriptions. Thus if a static array defines its size with an enum,
the user space trace event parsers can still know how to parse that
array.
- Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
TRACE_EVENT() macro, but will attach to an existing tracepoint. This
will make one tracepoint be able to trace different content and not
be stuck at only what the original TRACE_EVENT() macro exports.
- Fixes to tracing error logging.
- Better saving of cmdlines to PIDs when tracing (use the wakeup events
for mapping).
* tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits)
tracing: Have type enum modifications copy the strings
user_events: Add trace event call as root for low permission cases
tracing/user_events: Use alloc_pages instead of kzalloc() for register pages
tracing: Add snapshot at end of kernel boot up
tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
tracing: Fix strncpy warning in trace_events_synth.c
user_events: Prevent dyn_event delete racing with ioctl add/delete
tracing: Add TRACE_CUSTOM_EVENT() macro
tracing: Move the defines to create TRACE_EVENTS into their own files
tracing: Add sample code for custom trace events
tracing: Allow custom events to be added to the tracefs directory
tracing: Fix last_cmd_set() string management in histogram code
user_events: Fix potential uninitialized pointer while parsing field
tracing: Fix allocation of last_cmd in last_cmd_set()
user_events: Add documentation file
user_events: Add sample code for typical usage
user_events: Add self-test for validator boundaries
user_events: Add self-test for perf_event integration
user_events: Add self-test for dynamic_events integration
user_events: Add self-test for ftrace integration
...
Pull RTLA tracing tool updates from Steven Rostedt:
"Real Time Analysis Tool updatesfor 5.18:
- Support for adjusting tracing_threashold
- Add -a (auto) option to make it easier for users to debug in the field
- Add -e option to add more events to the trace
- Add --trigger option to add triggers to events
- Add --filter option to filter events
- Add support to save histograms to the file
- Add --dma-latency to set /dev/cpu_dma_latency
- Other fixes and cleanups"
* tag 'trace-rtla-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
rtla: Tools main loop cleanup
rtla/timerlat: Add --dma-latency option
rtla/osnoise: Fix osnoise hist stop tracing message
rtla: Check for trace off also in the trace instance
rtla/trace: Save event histogram output to a file
rtla: Add --filter support
rtla/trace: Add trace event filter helpers
rtla: Add --trigger support
rtla/trace: Add trace event trigger helpers
rtla: Add -e/--event support
rtla/trace: Add trace events helpers
rtla/timerlat: Add the automatic trace option
rtla/osnoise: Add the automatic trace option
rtla/osnoise: Add an option to set the threshold
rtla/osnoise: Add support to adjust the tracing_thresh
Pull printk updates from Petr Mladek:
- Make %pK behave the same as %p for kptr_restrict == 0 also with
no_hash_pointers parameter
- Ignore the default console in the device tree also when console=null
or console="" is used on the command line
- Document console=null and console="" behavior
- Prevent a deadlock and a livelock caused by console_lock in panic()
- Make console_lock available for panicking CPU
- Fast query for the next to-be-used sequence number
- Use the expected return values in printk.devkmsg __setup handler
- Use the correct atomic operations in wake_up_klogd() irq_work handler
- Avoid possible unaligned access when handling %4cc printing format
* tag 'printk-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: fix return value of printk.devkmsg __setup handler
vsprintf: Fix %pK with kptr_restrict == 0
printk: make suppress_panic_printk static
printk: Set console_set_on_cmdline=1 when __add_preferred_console() is called with user_specified == true
Docs: printk: add 'console=null|""' to admin/kernel-parameters
printk: use atomic updates for klogd work
printk: Drop console_sem during panic
printk: Avoid livelock with heavy printk during panic
printk: disable optimistic spin during panic
printk: Add panic_in_progress helper
vsprintf: Move space out of string literals in fourcc_string()
vsprintf: Fix potential unaligned access
printk: ringbuffer: Improve prb_next_seq() performance
cpsw_ethtool_begin directly returns the result of pm_runtime_get_sync
when successful.
pm_runtime_get_sync returns -error code on failure and 0 on successful
resume but also 1 when the device is already active. So the common case
for cpsw_ethtool_begin is to return 1. That leads to inconsistent calls
to pm_runtime_put in the call-chain so that pm_runtime_put is called
one too many times and as result leaving the cpsw dev behind suspended.
The suspended cpsw dev leads to an access violation later on by
different parts of the cpsw driver.
Fix this by calling the return-friendly pm_runtime_resume_and_get
function.
Fixes: d43c65b05b ("ethtool: runtime-resume netdev parent in ethnl_ops_begin")
Signed-off-by: Jan Sondhauss <jan.sondhauss@wago.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Link: https://lore.kernel.org/r/20220323084725.65864-1-jan.sondhauss@wago.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexander Lobakin says:
====================
ice: avoid sleeping/scheduling in atomic contexts
The `ice_misc_intr() + ice_send_event_to_aux()` infamous pair failed
once again.
Fix yet another (hopefully last one) 'scheduling while atomic' splat
and finally plug the hole to gracefully return prematurely when
invoked in wrong context instead of panicking.
====================
Link: https://lore.kernel.org/r/20220323124353.2762181-1-alexandr.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ice_send_event_to_aux() eventually descends to mutex_lock()
(-> might_sched()), so it must not be called under non-task
context. However, at least two fixes have happened already for the
bug splats occurred due to this function being called from atomic
context.
To make the emergency landings softer, bail out early when executed
in non-task context emitting a warn splat only once. This way we
trade some events being potentially lost for system stability and
avoid any related hangs and crashes.
Fixes: 348048e724 ("ice: Implement iidc operations")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There's a kernel BUG splat on processing aux critical error
interrupts in ice_misc_intr():
[ 2100.917085] BUG: scheduling while atomic: swapper/15/0/0x00010000
...
[ 2101.060770] Call Trace:
[ 2101.063229] <IRQ>
[ 2101.065252] dump_stack+0x41/0x60
[ 2101.068587] __schedule_bug.cold.100+0x4c/0x58
[ 2101.073060] __schedule+0x6a4/0x830
[ 2101.076570] schedule+0x35/0xa0
[ 2101.079727] schedule_preempt_disabled+0xa/0x10
[ 2101.084284] __mutex_lock.isra.7+0x310/0x420
[ 2101.088580] ? ice_misc_intr+0x201/0x2e0 [ice]
[ 2101.093078] ice_send_event_to_aux+0x25/0x70 [ice]
[ 2101.097921] ice_misc_intr+0x220/0x2e0 [ice]
[ 2101.102232] __handle_irq_event_percpu+0x40/0x180
[ 2101.106965] handle_irq_event_percpu+0x30/0x80
[ 2101.111434] handle_irq_event+0x36/0x53
[ 2101.115292] handle_edge_irq+0x82/0x190
[ 2101.119148] handle_irq+0x1c/0x30
[ 2101.122480] do_IRQ+0x49/0xd0
[ 2101.125465] common_interrupt+0xf/0xf
[ 2101.129146] </IRQ>
...
As Andrew correctly mentioned previously[0], the following call
ladder happens:
ice_misc_intr() <- hardirq
ice_send_event_to_aux()
device_lock()
mutex_lock()
might_sleep()
might_resched() <- oops
Add a new PF state bit which indicates that an aux critical error
occurred and serve it in ice_service_task() in process context.
The new ice_pf::oicr_err_reg is read-write in both hardirq and
process contexts, but only 3 bits of non-critical data probably
aren't worth explicit synchronizing (and they're even in the same
byte [31:24]).
[0] https://lore.kernel.org/all/YeSRUVmrdmlUXHDn@lunn.ch
Fixes: 348048e724 ("ice: Implement iidc operations")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Michal Kubiak <michal.kubiak@intel.com>
Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All packets on ingress (except for jumbo) are terminated with a 4-bytes
CRC checksum. It's the responsability of the driver to strip those 4
bytes. Unfortunately a change dating back to March 2017 re-shuffled some
code and made the CRC stripping code effectively dead.
This change re-orders that part a bit such that the datalen is
immediately altered if needed.
Fixes: 4902a92270 ("drivers: net: xgene: Add workaround for errata 10GE_8/ENET_11")
Cc: stable@vger.kernel.org
Signed-off-by: Stephane Graber <stgraber@ubuntu.com>
Tested-by: Stephane Graber <stgraber@ubuntu.com>
Link: https://lore.kernel.org/r/20220322224205.752795-1-stgraber@ubuntu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The struct folio is not declared in cacheflush.h so we need to provide
a forward declaration as otherwise users of this header file may get
warnings.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 522a0032af ("Add linux/cacheflush.h")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Raw NAND core changes:
* Rework of_get_nand_bus_width()
* Remove of_get_nand_on_flash_bbt() wrapper
* Protect access to rawnand devices while in suspend
* bindings: Document the wp-gpios property
Rax NAND controller driver changes:
* atmel: Fix refcount issue in atmel_nand_controller_init
* nandsim:
- Add NS_PAGE_BYTE_SHIFT macro to replace the repeat pattern
- Merge repeat codes in ns_switch_state
- Replace overflow check with kzalloc to single kcalloc
* rockchip: Fix platform_get_irq.cocci warning
* stm32_fmc2: Add NAND Write Protect support
* pl353: Set the nand chip node as the flash node
* brcmnand: Fix sparse warnings in bcma_nand
* omap_elm: Remove redundant variable 'errors'
* gpmi:
- Support fast edo timings for mx28
- Validate controller clock rate
- Fix controller timings setting
* brcmnand:
- Add BCMA shim
- BCMA controller uses command shift of 0
- Allow platform data instantation
- Add platform data structure for BCMA
- Allow working without interrupts
- Move OF operations out of brcmnand_init_cs()
- Avoid pdev in brcmnand_init_cs()
- Allow SoC to provide I/O operations
- Assign soc as early as possible
Onenand changes:
* Check for error irq
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
During lockless buffered reads, filemap_read() holds page cache page
references while trying to copy data to the user-space buffer. The
calling process isn't holding the inode glock, but the page references
it holds prevent those pages from being removed from the page cache, and
that prevents the underlying inode glock from being moved to another
node. Thus, we can end up in the same kinds of distributed deadlock
situations as with normal (non-lockless) buffered reads.
Fix that by disabling page faults during lockless reads as well.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Fix the fault-in window size logic:
* Use a maximum window size of 1 MiB instead of BIO_MAX_VECS * PAGE_SIZE.
The previous window size was always one page because the pages variable
was accidentally being defined and then redefined in
should_fault_in_pages().
* The nr_dirtied heuristic for guessing when there might be memory
pressure often results in very small window sizes. Don't let
nr_dirtied drop below 8 pages (as btrfs does).
* Compute the window size in units of bytes, not pages.
* Account for page overlap (unaligned iterators).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
If a trace event has in its TP_printk():
"%*.s", len, len ? __get_str(string) : NULL
It is perfectly valid if len is zero and passing in the NULL.
Unfortunately, the runtime string check at time of reading the trace sees
the NULL and flags it as a bad string and produces a WARN_ON().
Handle this case by passing into the test function if the format has an
asterisk (star) and if so, if the length is zero, then mark it as safe.
Link: https://lore.kernel.org/all/YjsWzuw5FbWPrdqq@bfoster/
Cc: stable@vger.kernel.org
Reported-by: Brian Foster <bfoster@redhat.com>
Tested-by: Brian Foster <bfoster@redhat.com>
Fixes: 9a6944fee6 ("tracing: Add a verifier to check string pointers for trace events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
It is not recommened to use platform_get_resource(pdev, IORESOURCE_IRQ)
for requesting IRQ's resources any more, as they can be not ready yet in
case of DT-booting.
platform_get_irq() instead is a recommended way for getting IRQ even if
it was not retrieved earlier.
It also makes code simpler because we're getting "int" value right away
and no conversion from resource to int is required.
The print function dev_err() is redundant because platform_get_irq()
already prints an error.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220309035607.2080884-1-chi.minghao@zte.com.cn
x86/ACPI boards with an arizona WM5102 codec ship with either Windows or
Android as factory installed OS.
The ACPI fwnode for the codec on Android boards misses 2 things compared
to the Windows boards (this is hardcoded in the Android board kernels):
1. There is no CLKE ACPI method to enabe the 32 KHz clock the codec needs
for jack-detection.
2. The GPIOs used by the codec are not listed in the fwnode for the codec.
The ACPI tables on x86/ACPI boards shipped with Android being incomplete
happens a lot. The special drivers/platform/x86/x86-android-tablets.c
module contains DMI based per model handling to compensate for this.
This module will enable the 32KHz clock through the pinctrl framework
to fix 1. and it will also register a gpio-lookup table for all GPIOs
needed by the codec + machine driver, including the GPIOs coming from
the codec itself.
Add an arizona_spi_acpi_android_probe() function which waits for the
x86-android-tablets to have set things up before continue with probing
the arizona WM5102 codec.
Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220307173844.199135-2-hdegoede@redhat.com
x86/ACPI boards with an arizona WM5102 codec ship with either Windows or
Android as factory installed OS.
The ACPI fwnode describing the codec differs depending on the factory OS,
and the current arizona_spi_acpi_probe() function is tailored for use
with the Windows board ACPI tables.
Split out the Windows board ACPI tables specific bits into a new
arizona_spi_acpi_windows_probe() function in preparation for also
adding support for the Android board ACPI tables.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220307173844.199135-1-hdegoede@redhat.com
This matches pinctrl.yaml requirement and fixes:
Documentation/devicetree/bindings/mfd/brcm,cru.example.dt.yaml: pin-controller@1c0: $nodename:0: 'pin-controller@1c0' does not match '^(pinctrl|pinmux)(@[0-9a-f]+)?$'
From schema: Documentation/devicetree/bindings/pinctrl/brcm,ns-pinmux.yaml
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220216112928.5330-1-zajec5@gmail.com
Clang static analysis reports this issue
livepatch-shadow-fix1.c:113:2: warning: Use of
memory after it is freed
pr_info("%s: dummy @ %p, prevented leak @ %p\n",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The pointer is freed in the previous statement.
Reorder the pr_info to report before the free.
Similar issue in livepatch-shadow-fix2.c
Note that it is a false positive. pr_info() just prints the address.
The freed memory is not accessed. Well, the static analyzer could not
know this easily.
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: David Vernet <void@manifault.com>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
[pmladek@suse.com: Note about that it was false positive.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220320015143.2208591-1-trix@redhat.com
We only really need to recycle the buffer when going async for a file
type that has an indefinite reponse time (eg non-file/bdev). And for
files that to arm poll, the async worker will arm poll anyway and the
buffer will get recycled there.
In that latter case, we're not holding ctx->uring_lock. Ensure we take
the issue_flags into account and acquire it if we need to.
Fixes: b1c6264575 ("io_uring: recycle provided buffers if request goes async")
Reported-by: Stefan Roesch <shr@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
syzbot reports a recent regression:
BUG: KASAN: use-after-free in __wake_up_common+0x637/0x650 kernel/sched/wait.c:101
Read of size 8 at addr ffff888011e8a130 by task syz-executor413/3618
CPU: 0 PID: 3618 Comm: syz-executor413 Tainted: G W 5.17.0-syzkaller-01402-g8565d64430f8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0x8d/0x303 mm/kasan/report.c:255
__kasan_report mm/kasan/report.c:442 [inline]
kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
__wake_up_common+0x637/0x650 kernel/sched/wait.c:101
__wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:138
tty_release+0x657/0x1200 drivers/tty/tty_io.c:1781
__fput+0x286/0x9f0 fs/file_table.c:317
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
exit_task_work include/linux/task_work.h:32 [inline]
do_exit+0xaff/0x29d0 kernel/exit.c:806
do_group_exit+0xd2/0x2f0 kernel/exit.c:936
__do_sys_exit_group kernel/exit.c:947 [inline]
__se_sys_exit_group kernel/exit.c:945 [inline]
__x64_sys_exit_group+0x3a/0x50 kernel/exit.c:945
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f439a1fac69
which is due to leaving the request on the waitqueue mistakenly. The
reproducer is using a tty device, which means we end up arming the same
poll queue twice (it uses the same poll waitqueue for both), but in
io_poll_wake() we always just clear REQ_F_SINGLE_POLL regardless of which
entry triggered. This leaves one waitqueue potentially armed after we're
done, which then blows up in tty when the waitqueue is attempted removed.
We have no room to store this information, so simply encode it in the
wait_queue_entry->private where we store the io_kiocb request pointer.
Fixes: 91eac1c69c ("io_uring: cache poll/double-poll state with a request flag")
Reported-by: syzbot+09ad4050dd3a120bfccd@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The previous commit:
1bc84c40088 ("io_uring: remove poll entry from list when canceling all")
removed a potential overflow condition for the poll references. They
are currently limited to 20-bits, even if we have 31-bits available. The
upper bit is used to mark for cancelation.
Bump the poll ref space to 31-bits, making that kind of situation much
harder to trigger in general. We'll separately add overflow checking
and handling.
Fixes: aa43477b04 ("io_uring: poll rework")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The Kconfig symbols required for the Allwinner F1C100 (MACH_SUNIV) are
currently not selected by any defconfig. sunxi_defconfig only covers the
v7 SoCs, but the F1C100s is ARMv5, so we cannot share a single image.
Add the required symbols to multi_v5_defconfig, to give people some sane
default config when playing around with this chip. This is probably more
important as there are surely not many distros out there supporting
ARMv5 out of the box.
This allows my LicheePi Nano board to boot to a busybox prompt.
The zImage size grows by about 50 KB, in detail:
text data bss dec hex filename
10510000 4400700 687740 15598440 ee0368 vmlinux-old
10588592 4469096 686812 15744500 f03df4 vmlinux-new
14922908 arch/arm/boot/Image-old
15067388 arch/arm/boot/Image-new
6388016 arch/arm/boot/zImage-old
6440064 arch/arm/boot/zImage-new
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20220317183043.948432-6-andre.przywara@arm.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Add quirks to not fail the initialization and to have quick resume
latency after cold/warm reboot.
Signed-off-by: Monish Kumar R <monish.kumar.r@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>