- Use PCI_INTERRUPT_* definitions from PCI core instead of custom ones
(Pali Rohár)
- Derive MSI number from bit(s) set in PCIE_MSI_STATUS_REG, not from
PCIE_MSI_PAYLOAD_REG (Pali Rohár)
- Align multi-MSI vectors to power of two (Pali Rohár)
- Rewrite IRQ code to use chained IRQ handler (Pali Rohár)
- Check return value of generic_handle_domain_irq() and warn about spurious
interrupts (Pali Rohár)
- Make MSI irq_chip structures static to driver (Marek Behún)
- Make msi_domain_info structure static to driver (Marek Behún)
- Use dev_fwnode() instead of of_node_to_fwnode(dev->of_node) (Marek Behún)
- Refactor unmasking of summary MSI interrupt (Pali Rohár)
- Add support for masking MSI interrupts and leave them masked at setup
(Pali Rohár)
- Set MSI doorbell address to address of struct advk_pcie (Pali Rohár)
- Enable MSI-X support (Pali Rohár)
- Add support for ERR interrupt on emulated bridge (Pali Rohár)
- Fix read of PCI_EXP_RTSTA_PME bit on emulated bridge (Pali Rohár)
- Optimize writing PCI_EXP_RTCTL_PMEIE and PCI_EXP_RTSTA_PME on emulated
bridge (Pali Rohár)
- Add support for PME interrupts (Pali Rohár)
- Fix support for PME requester on emulated bridge (Pali Rohár)
- Use separate INTA interrupt for emulated Root Port so PME and AER
interrupt is not shared with downstream devices (Pali Rohár)
- Remove irq_mask_ack() callback for INTx interrupts (Pali Rohár)
- Don't mask legacy INTx interrupts when mapping (Pali Rohár)
- Drop unnecessary "__maybe_unused" from advk_pcie_disable_phy() (Marek
Behún)
- Update comment about why we check for link being up before issuing a
config request (Marek Behún)
* remotes/lorenzo/pci/aardvark:
PCI: aardvark: Update comment about link going down after link-up
PCI: aardvark: Drop __maybe_unused from advk_pcie_disable_phy()
PCI: aardvark: Don't mask irq when mapping
PCI: aardvark: Remove irq_mask_ack() callback for INTx interrupts
PCI: aardvark: Use separate INTA interrupt for emulated root bridge
PCI: aardvark: Fix support for PME requester on emulated bridge
PCI: aardvark: Add support for PME interrupts
PCI: aardvark: Optimize writing PCI_EXP_RTCTL_PMEIE and PCI_EXP_RTSTA_PME on emulated bridge
PCI: aardvark: Fix reading PCI_EXP_RTSTA_PME bit on emulated bridge
PCI: aardvark: Add support for ERR interrupt on emulated bridge
PCI: aardvark: Enable MSI-X support
PCI: aardvark: Fix setting MSI address
PCI: aardvark: Add support for masking MSI interrupts
PCI: aardvark: Refactor unmasking summary MSI interrupt
PCI: aardvark: Use dev_fwnode() instead of of_node_to_fwnode(dev->of_node)
PCI: aardvark: Make msi_domain_info structure a static driver structure
PCI: aardvark: Make MSI irq_chip structures static driver structures
PCI: aardvark: Check return value of generic_handle_domain_irq() when processing INTx IRQ
PCI: aardvark: Rewrite IRQ code to chained IRQ handler
PCI: aardvark: Fix support for MSI interrupts
PCI: aardvark: Fix reading MSI interrupt number
PCI: aardvark: Replace custom PCIE_CORE_INT_* macros with PCI_INTERRUPT_*
- Move vgaarb.c from drivers/gpu/vga to drivers/pci (Bjorn Helgaas)
- Factor out default VGA device selection (Huacai Chen)
- Move firmware default device detection to ADD_DEVICE path so we can
select a default device regardless of whether it is enumerated before or
after vga_arb_device_init() (Huacai Chen)
- Move non-legacy VGA detection to ADD_DEVICE path (Huacai Chen)
- Move disabled VGA device detection to ADD_DEVICE path (Huacai Chen)
* pci/vga:
PCI/VGA: Replace full MIT license text with SPDX identifier
PCI/VGA: Use unsigned format string to print lock counts
PCI/VGA: Log bridge control messages when adding devices
PCI/VGA: Remove empty vga_arb_device_card_gone()
PCI/VGA: Move disabled VGA device detection to ADD_DEVICE path
PCI/VGA: Move non-legacy VGA detection to ADD_DEVICE path
PCI/VGA: Move firmware default device detection to ADD_DEVICE path
PCI/VGA: Factor out default VGA device selection
PCI/VGA: Factor out vga_select_framebuffer_device()
PCI/VGA: Move vga_arb_integrated_gpu() earlier in file
PCI/VGA: Move vgaarb to drivers/pci
- Clear pciehp cmd_busy bit when command completes in polling mode to avoid
spurious timeouts (Liguang Zhang)
- Add quirk to work around Qualcomm hardware defect in Command Completed
signaling (Manivannan Sadhasivam)
* pci/hotplug:
PCI: pciehp: Add Qualcomm quirk for Command Completed erratum
PCI: pciehp: Clear cmd_busy bit in polling mode
- Support BAR sizes up to 8TB (Dongdong Liu)
- Reduce warnings on hardware that doesn't support 8- or 16-bit PCI writes
and hence may corrupt RW1C bits (Mark Tomlinson)
* pci/enumeration:
PCI: Reduce warnings on possible RW1C corruption
PCI: Support BAR sizes up to 8TB
- Add and use #defines for normal and subtractive PCI bridges (Pali Rohár)
- Set all 24 bits of PCI class code for iproc (Pali Rohár)
* pci/bridge-class-codes:
PCI: iproc: Set all 24 bits of PCI class code
PCI: Add defines for normal and subtractive PCI bridges
- Replace acpi_bus_get_device() with acpi_fetch_acpi_dev() so we can drop
acpi_bus_get_device() (Rafael J. Wysocki)
* pci/acpi:
PCI/ACPI: Replace acpi_bus_get_device() with acpi_fetch_acpi_dev()
Add calls to rseq_signal_deliver() and rseq_syscall() to introduce RSEQ
support.
1. Call the rseq_signal_deliver() function to fixup on the pre-signal
frame when a signal is delivered on top of a restartable sequence
critical section.
2. Check that system calls are not invoked from within rseq critical
sections by invoking rseq_signal() from ret_from_syscall(). With
CONFIG_DEBUG_RSEQ, such behavior results in termination of the
process with SIGSEGV.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull scheduler updates from Ingo Molnar:
- Cleanups for SCHED_DEADLINE
- Tracing updates/fixes
- CPU Accounting fixes
- First wave of changes to optimize the overhead of the scheduler
build, from the fast-headers tree - including placeholder *_api.h
headers for later header split-ups.
- Preempt-dynamic using static_branch() for ARM64
- Isolation housekeeping mask rework; preperatory for further changes
- NUMA-balancing: deal with CPU-less nodes
- NUMA-balancing: tune systems that have multiple LLC cache domains per
node (eg. AMD)
- Updates to RSEQ UAPI in preparation for glibc usage
- Lots of RSEQ/selftests, for same
- Add Suren as PSI co-maintainer
* tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
sched/headers: ARM needs asm/paravirt_api_clock.h too
sched/numa: Fix boot crash on arm64 systems
headers/prep: Fix header to build standalone: <linux/psi.h>
sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y
cgroup: Fix suspicious rcu_dereference_check() usage warning
sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers
sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains
sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity()
sched/deadline,rt: Remove unused functions for !CONFIG_SMP
sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently
sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy()
sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file
sched/deadline: Remove unused def_dl_bandwidth
sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
sched/tracing: Don't re-read p->state when emitting sched_switch event
sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
sched/cpuacct: Remove redundant RCU read lock
sched/cpuacct: Optimize away RCU read lock
sched/cpuacct: Fix charge percpu cpuusage
sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies
...
When analyzing with 'perf script', it's useful to understand the
captured instruction and the next sequential instruction.
To calculate the address of the next sequential instruction, the length
of the captured instruction is required.
For example, you can’t know the next sequential instruction after an
unconditional branch unless you calculate that based on its length.
For branch stacks, 'perf script' only prints the instruction bytes with
'brstackinsn', but lacks the instruction length.
Add 'brstackinsnlen' to print the instruction length.
$ perf script -F ip,brstackinsn,brstackinsnlen --xed
7fa555be8f75
_start:
00007fa555be8090 mov %rsp, %rdi ilen: 3
00007fa555be8093 callq 0x7fa555be8ea0 ilen: 5 # PRED 102 cycles [102] 0.02 IPC
_dl_start+38:
00007fa555be8ec6 movq %rdx,0x227853(%rip) ilen: 7
00007fa555be8ecd leaq 0x227f94(%rip),%rdx ilen: 7
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/1647871212-184070-1-git-send-email-kan.liang@linux.intel.com
[ Added the new field to tools/perf/Documentation/perf-script.txt ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This fixes the near-silence of the headphone jack on the ALC256-based
Samsung Galaxy Book Flex Alpha (NP730QCJ). The magic verbs were found
through trial and error, using known ALC298 hacks as inspiration. The
fixup is auto-enabled only when the NP730QCJ is detected. It can be
manually enabled using model=alc256-samsung-headphone.
Signed-off-by: Matt Kramer <mccleetus@gmail.com>
Link: https://lore.kernel.org/r/3168355.aeNJFYEL58@linus
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull locking updates from Ingo Molnar:
"Changes in this cycle were:
Bitops & cpumask:
- Always inline various generic helpers, to improve code generation,
but also for instrumentation, found by noinstr validation.
- Add a x86-specific cpumask_clear_cpu() helper to improve code
generation
Atomics:
- Fix atomic64_{read_acquire,set_release} fallbacks
Lockdep:
- Fix /proc/lockdep output loop iteration for classes
- Fix /proc/lockdep potential access to invalid memory
- Add Mark Rutland as reviewer for atomic primitives
- Minor cleanups
Jump labels:
- Clean up the code a bit
Misc:
- Add __sched annotations to percpu rwsem primitives
- Enable RT_MUTEXES on PREEMPT_RT by default
- Stray v8086_mode() inlining fix, result of noinstr objtool
validation"
* tag 'locking-core-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
jump_label: Refactor #ifdef of struct static_key
jump_label: Avoid unneeded casts in STATIC_KEY_INIT_{TRUE,FALSE}
locking/lockdep: Iterate lock_classes directly when reading lockdep files
x86/ptrace: Always inline v8086_mode() for instrumentation
cpumask: Add a x86-specific cpumask_clear_cpu() helper
locking: Enable RT_MUTEXES by default on PREEMPT_RT.
locking/local_lock: Make the empty local_lock_*() function a macro.
atomics: Fix atomic64_{read_acquire,set_release} fallbacks
locking: Add missing __sched attributes
cpumask: Always inline helpers which use bit manipulation functions
asm-generic/bitops: Always inline all bit manipulation helpers
locking/lockdep: Avoid potential access of invalid memory in lock_class
lockdep: Use memset_startat() helper in reinit_class()
MAINTAINERS: add myself as reviewer for atomics
Wait for the page to be written to the cache before we allow it
to be modified
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Tests 72 and 78 for ALSA in kselftest fail due to reading
inconsistent values from some devices on a VirtualBox
Virtual Machine using the snd_intel8x0 driver for the AC'97
Audio Controller device.
Taking for example test number 72, this is what the test reports:
"Surround Playback Volume.0 expected 1 but read 0, is_volatile 0"
"Surround Playback Volume.1 expected 0 but read 1, is_volatile 0"
These errors repeat for each value from 0 to 31.
Taking a look at these error messages it is possible to notice
that the written values are read back swapped.
When the write is performed, these values are initially stored in
an array used to sanity-check them and write them in the pcmreg
array. To write them, the two one-byte values are packed together
in a two-byte variable through bitwise operations: the first
value is shifted left by one byte and the second value is stored in the
right byte through a bitwise OR. When reading the values back,
right shifts are performed to retrieve the previously stored
bytes. These shifts are executed in the wrong order, thus
reporting the values swapped as shown above.
This patch fixes this mistake by reversing the read
operations' order.
Signed-off-by: Giacomo Guiduzzi <guiduzzi.giacomo@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220322200653.15862-1-guiduzzi.giacomo@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Generating the version kernel tag relies on "git describe" command to
get the latest Linus kernel tag.
However, when working from clones of Linus' git we may not have the latest
tag. For example, when working on Arnaldo's acme.git, we can have this:
$ git branch
perf/core
$ head -n 5 ../../Makefile | tail -n 4
VERSION = 5
PATCHLEVEL = 17
SUBLEVEL = 0
EXTRAVERSION = -rc3
$ git describe --abbrev=0 --match "v[0-9].[0-9]*"
v4.13-rc5
Indeed using tags is a problem as it relies on tags being pulled from
Linus' git (and pushed to the clone).
In commit a4147f0f91 ("perf tools: Fix perf version generation")
Robert introduced a change to use the kernelversion rule to generate the
kernel tag when no git tags are available.
However, as mentioned above, the tag we generate may be incorrect, so
just always use kernelversion to get the tag (apart from building perf
out of tree).
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Link: https://lore.kernel.org/r/1645449409-158238-3-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With CONFIG_X86_KERNEL_IBT=y and a version of ld.lld prior to 14.0.0,
there are numerous objtool warnings along the lines of:
warning: objtool: .plt+0x6: indirect jump found in RETPOLINE build
This is a known issue that has been resolved in ld.lld 14.0.0. Prevent
CONFIG_X86_KERNEL_IBT from being selectable when using one of these
problematic ld.lld versions.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220318230747.3900772-3-nathan@kernel.org
Commit 156ff4a544 ("x86/ibt: Base IBT bits") added a check for a crash
with 'clang -fcf-protection=branch -mfentry -pg', which intended to
exclude Clang versions older than 14.0.0 from selecting
CONFIG_X86_KERNEL_IBT.
clang-11 does not have the issue that the check is testing for, so
CONFIG_X86_KERNEL_IBT is selectable. Unfortunately, there is a different
crash in clang-11 that was fixed in clang-12. To make matters worse,
that crash does not appear to be entirely deterministic, as the same
input to the compiler will sometimes crash and other times not, which
makes dynamically checking for the crash like the '-pg' one unreliable.
To make everything work properly for all common versions of clang, use a
hard version check of 14.0.0, as that will be the first release upstream
that has both bugs properly fixed.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220318230747.3900772-2-nathan@kernel.org
Pull x86 perf event updates from Ingo Molnar:
- Fix address filtering for Intel/PT,ARM/CoreSight
- Enable Intel/PEBS format 5
- Allow more fixed-function counters for x86
- Intel/PT: Enable not recording Taken-Not-Taken packets
- Add a few branch-types
* tag 'perf-core-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Fix the build on !CONFIG_PHYS_ADDR_T_64BIT
perf: Add irq and exception return branch types
perf/x86/intel/uncore: Make uncore_discovery clean for 64 bit addresses
perf/x86/intel/pt: Add a capability and config bit for disabling TNTs
perf/x86/intel/pt: Add a capability and config bit for event tracing
perf/x86/intel: Increase max number of the fixed counters
KVM: x86: use the KVM side max supported fixed counter
perf/x86/intel: Enable PEBS format 5
perf/core: Allow kernel address filter when not filtering the kernel
perf/x86/intel/pt: Fix address filter config for 32-bit kernel
perf/core: Fix address filter parser for multiple filters
x86: Share definition of __is_canonical_address()
perf/x86/intel/pt: Relax address filter validation
snd_pcm_reset() is a non-atomic operation, and it's allowed to run
during the PCM stream running. It implies that the manipulation of
hw_ptr and other parameters might be racy.
This patch adds the PCM stream lock at appropriate places in
snd_pcm_*_reset() actions for covering that.
Cc: <stable@vger.kernel.org>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20220322171325.4355-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
We have no protection against concurrent PCM buffer preallocation
changes via proc files, and it may potentially lead to UAF or some
weird problem. This patch applies the PCM open_mutex to the proc
write operation for avoiding the racy proc writes and the PCM stream
open (and further operations).
Cc: <stable@vger.kernel.org>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20220322170720.3529-5-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Like the previous fixes to hw_params and hw_free ioctl races, we need
to paper over the concurrent prepare ioctl calls against hw_params and
hw_free, too.
This patch implements the locking with the existing
runtime->buffer_mutex for prepare ioctls. Unlike the previous case
for snd_pcm_hw_hw_params() and snd_pcm_hw_free(), snd_pcm_prepare() is
performed to the linked streams, hence the lock can't be applied
simply on the top. For tracking the lock in each linked substream, we
modify snd_pcm_action_group() slightly and apply the buffer_mutex for
the case stream_lock=false (formerly there was no lock applied)
there.
Cc: <stable@vger.kernel.org>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20220322170720.3529-4-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In the current PCM design, the read/write syscalls (as well as the
equivalent ioctls) are allowed before the PCM stream is running, that
is, at PCM PREPARED state. Meanwhile, we also allow to re-issue
hw_params and hw_free ioctl calls at the PREPARED state that may
change or free the buffers, too. The problem is that there is no
protection against those mix-ups.
This patch applies the previously introduced runtime->buffer_mutex to
the read/write operations so that the concurrent hw_params or hw_free
call can no longer interfere during the operation. The mutex is
unlocked before scheduling, so we don't take it too long.
Cc: <stable@vger.kernel.org>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20220322170720.3529-3-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Currently we have neither proper check nor protection against the
concurrent calls of PCM hw_params and hw_free ioctls, which may result
in a UAF. Since the existing PCM stream lock can't be used for
protecting the whole ioctl operations, we need a new mutex to protect
those racy calls.
This patch introduced a new mutex, runtime->buffer_mutex, and applies
it to both hw_params and hw_free ioctl code paths. Along with it, the
both functions are slightly modified (the mmap_count check is moved
into the state-check block) for code simplicity.
Reported-by: Hu Jiahui <kirin.say@gmail.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20220322170720.3529-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Ensure that pNFS file commit allocations in rpciod/nfsiod callbacks can
fail in low memory mode, so that the threads don't block and loop
forever.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Ensure that pNFS flexfile allocations in rpciod/nfsiod callbacks can
fail in low memory mode, so that the threads don't block and loop
forever.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Ensure that pNFS allocations that can be called from rpciod/nfsiod
callback can fail in low memory mode, so that the threads don't block
and loop forever.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
In a low memory situation, allow the NFS writeback code to fail without
getting stuck in infinite loops in mempool_alloc().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The concern is that since nfsiod is sometimes required to kick off a
commit, it can get locked up waiting forever in mempool_alloc() instead
of failing gracefully and leaving the commit until later.
Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY,
then fall back to a non-blocking attempt to allocate from the memory
pool.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
As for rpc_malloc(), we first try allocating from the slab, then fall
back to a non-waiting allocation from the mempool.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When in a low memory situation, we do want rpciod to kick off direct
reclaim in the case where that helps, however we don't want it looping
forever in mempool_alloc().
So first try allocating from the slab using GFP_KERNEL | __GFP_NORETRY,
and then fall back to a GFP_NOWAIT allocation from the mempool.
Ditto for rpc_alloc_task()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The current code checks for whether or not the socket is in a writeable
state after we get an EAGAIN. That is racy, since we've dropped the
socket lock, so the amount of free buffer may have changed.
Instead, let's check whether the socket is writeable before we try to
write to it. If that was the case, we do expect the message to be at
least partially sent unless we're in a low memory situation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The socket's SOCKWQ_ASYNC_NOSPACE can be cleared by various actors in
the socket layer, so replace it with our own flag in the transport
sock_state field.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The socket layer requires that we use the socket lock to protect changes
to the sock->sk_write_pending field and others.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Since the RPC client uses a non-blocking connect(), we do not expect to
see it return '0' under normal circumstances.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Avoid socket state races due to repeated calls to ->connect() using the
same socket. If connect() returns 0 due to the connection having
completed, but we are in fact in a closing state, then we may leave the
XPRT_CONNECTING flag set on the transport.
Reported-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
Fixes: 3be232f11a ("SUNRPC: Prevent immediate close+reconnect")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>