commit 39d3454c35 upstream.
Building with gcc 4.9.2 reveals a latent bug in the PCI accessors
for Footbridge platforms, which causes a fatal alignment fault
while accessing IO memory. Fix this by making the assembly volatile.
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 21b200d091 upstream.
Assuming
- //HOST/a is mounted on /mnt
- //HOST/b is mounted on /mnt/b
On a slow connection, running 'df' and killing it while it's
processing /mnt/b can make cifs_get_inode_info() returns -ERESTARTSYS.
This triggers the following chain of events:
=> the dentry revalidation fail
=> dentry is put and released
=> superblock associated with the dentry is put
=> /mnt/b is unmounted
This patch makes cifs_d_revalidate() return the error instead of 0
(invalid) when cifs_revalidate_dentry() fails, except for ENOENT (file
deleted) and ESTALE (file recreated).
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Suggested-by: Shyam Prasad N <nspmangalore@gmail.com>
Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
CC: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4a6106354 upstream.
xhci driver may in some special cases need to copy small amounts
of payload data to a bounce buffer in order to meet the boundary
and alignment restrictions set by the xHCI specification.
In the majority of these cases the data is in a sg list, and
driver incorrectly assumed data is always in urb->sg when using
the bounce buffer.
If data instead is contiguous, and in urb->transfer_buffer, we may still
need to bounce buffer a small part if data starts very close (less than
packet size) to a 64k boundary.
Check if sg list is used before copying data to/from it.
Fixes: f9c589e142 ("xhci: TD-fragment, align the unsplittable case with a bounce buffer")
Cc: stable@vger.kernel.org
Reported-by: Andreas Hartmann <andihartmann@01019freenet.de>
Tested-by: Andreas Hartmann <andihartmann@01019freenet.de>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210203113702.436762-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0188b87899 upstream.
Our system encountered a re-init error when re-registering same kretprobe,
where the kretprobe_instance in rp->free_instances is illegally accessed
after re-init.
Implementation to avoid re-registration has been introduced for kprobe
before, but lags for register_kretprobe(). We must check if kprobe has
been re-registered before re-initializing kretprobe, otherwise it will
destroy the data struct of kretprobe registered, which can lead to memory
leak, system crash, also some unexpected behaviors.
We use check_kprobe_rereg() to check if kprobe has been re-registered
before running register_kretprobe()'s body, for giving a warning message
and terminate registration process.
Link: https://lkml.kernel.org/r/20210128124427.2031088-1-bobo.shaobowang@huawei.com
Cc: stable@vger.kernel.org
Fixes: 1f0ab40976 ("kprobes: Prevent re-registration of the same kprobe")
[ The above commit should have been done for kretprobes too ]
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f670e9f9c8 upstream.
dwc2_hsotg_process_req_status uses ep_from_windex() to retrieve
the endpoint for the index provided in the wIndex request param.
In a test-case with a rndis gadget running and sending a malformed
packet to it like:
dev.ctrl_transfer(
0x82, # bmRequestType
0x00, # bRequest
0x0000, # wValue
0x0001, # wIndex
0x00 # wLength
)
it is possible to cause a crash:
[ 217.533022] dwc2 ff300000.usb: dwc2_hsotg_process_req_status: USB_REQ_GET_STATUS
[ 217.559003] Unable to handle kernel read from unreadable memory at virtual address 0000000000000088
...
[ 218.313189] Call trace:
[ 218.330217] ep_from_windex+0x3c/0x54
[ 218.348565] usb_gadget_giveback_request+0x10/0x20
[ 218.368056] dwc2_hsotg_complete_request+0x144/0x184
This happens because ep_from_windex wants to compare the endpoint
direction even if index_to_ep() didn't return an endpoint due to
the direction not matching.
The fix is easy insofar that the actual direction check is already
happening when calling index_to_ep() which will return NULL if there
is no endpoint for the targeted direction, so the offending check
can go away completely.
Fixes: c6f5c050e2 ("usb: dwc2: gadget: add bi-directional endpoint support")
Cc: stable@vger.kernel.org
Reported-by: Gerhard Klostermeier <gerhard.klostermeier@syss.de>
Signed-off-by: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
Link: https://lore.kernel.org/r/20210127103919.58215-1-heiko@sntech.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 88c7a9fd9b ]
When sending a packet, we will prepend it with an LAPB header.
This modifies the shared parts of a cloned skb, so we should copy the
skb rather than just clone it, before we prepend the header.
In "Documentation/networking/driver.rst" (the 2nd point), it states
that drivers shouldn't modify the shared parts of a cloned skb when
transmitting.
The "dev_queue_xmit_nit" function in "net/core/dev.c", which is called
when an skb is being sent, clones the skb and sents the clone to
AF_PACKET sockets. Because the LAPB drivers first remove a 1-byte
pseudo-header before handing over the skb to us, if we don't copy the
skb before prepending the LAPB header, the first byte of the packets
received on AF_PACKET sockets can be corrupted.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20210201055706.415842-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Right now SUBLEVEL is overflowing, and some userspace may start treating
4.9.256 as 4.10. While out of tree modules have different ways of
extracting the version number (and we're generally ok with breaking
them), we do care about breaking userspace and it would appear that this
overflow might do just that.
Our rules around userspace ABI in the stable kernel are pretty simple:
we don't break it. Thus, while userspace may be checking major/minor, it
shouldn't be doing anything with sublevel.
This patch applies a big band-aid to the 4.9 and 4.4 kernels in the form
of clamping their sublevel to 255.
The clamp is done for the purpose of LINUX_VERSION_CODE only, and
extracting the version number from the Makefile or "make kernelversion"
will continue to work as intended.
We might need to do it later in newer trees, but maybe we'll have a
better solution by then, so I'm ignoring that problem for now.
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1d489151e9 ]
Thanks to a recent binutils change which doesn't generate unused
symbols, it's now possible for thunk_64.o be completely empty without
CONFIG_PREEMPTION: no text, no data, no symbols.
We could edit the Makefile to only build that file when
CONFIG_PREEMPTION is enabled, but that will likely create confusion
if/when the thunks end up getting used by some other code again.
Just ignore it and move on.
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1254
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 764907293e ]
While testing live partition mobility, we have observed occasional crashes
of the Linux partition. What we've seen is that during the live migration,
for specific configurations with large amounts of memory, slow network
links, and workloads that are changing memory a lot, the partition can end
up being suspended for 30 seconds or longer. This resulted in the following
scenario:
CPU 0 CPU 1
------------------------------- ----------------------------------
scsi_queue_rq migration_store
-> blk_mq_start_request -> rtas_ibm_suspend_me
-> blk_add_timer -> on_each_cpu(rtas_percpu_suspend_me
_______________________________________V
|
V
-> IPI from CPU 1
-> rtas_percpu_suspend_me
-> __rtas_suspend_last_cpu
-- Linux partition suspended for > 30 seconds --
-> for_each_online_cpu(cpu)
plpar_hcall_norets(H_PROD
-> scsi_dispatch_cmd
-> scsi_times_out
-> scsi_abort_command
-> queue_delayed_work
-> ibmvfc_queuecommand_lck
-> ibmvfc_send_event
-> ibmvfc_send_crq
- returns H_CLOSED
<- returns SCSI_MLQUEUE_HOST_BUSY
-> __blk_mq_requeue_request
-> scmd_eh_abort_handler
-> scsi_try_to_abort_cmd
- returns SUCCESS
-> scsi_queue_insert
Normally, the SCMD_STATE_COMPLETE bit would protect against the command
completion and the timeout, but that doesn't work here, since we don't
check that at all in the SCSI_MLQUEUE_HOST_BUSY path.
In this case we end up calling scsi_queue_insert on a request that has
already been queued, or possibly even freed, and we crash.
The patch below simply increases the default I/O timeout to avoid this race
condition. This is also the timeout value that nearly all IBM SAN storage
recommends setting as the default value.
Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b2b0f16fa6 ]
A race condition exists between the response handler getting called because
of exchange_mgr_reset() (which clears out all the active XIDs) and the
response we get via an interrupt.
Sequence of events:
rport ba0200: Port timeout, state PLOGI
rport ba0200: Port entered PLOGI state from PLOGI state
xid 1052: Exchange timer armed : 20000 msecs xid timer armed here
rport ba0200: Received LOGO request while in state PLOGI
rport ba0200: Delete port
rport ba0200: work event 3
rport ba0200: lld callback ev 3
bnx2fc: rport_event_hdlr: event = 3, port_id = 0xba0200
bnx2fc: ba0200 - rport not created Yet!!
/* Here we reset any outstanding exchanges before
freeing rport using the exch_mgr_reset() */
xid 1052: Exchange timer canceled
/* Here we got two responses for one xid */
xid 1052: invoking resp(), esb 20000000 state 3
xid 1052: invoking resp(), esb 20000000 state 3
xid 1052: fc_rport_plogi_resp() : ep->resp_active 2
xid 1052: fc_rport_plogi_resp() : ep->resp_active 2
Skip the response if the exchange is already completed.
Link: https://lore.kernel.org/r/20201215194731.2326-1-jhasan@marvell.com
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
fixup_pi_state_owner() tries to ensure that the state of the rtmutex,
pi_state and the user space value related to the PI futex are consistent
before returning to user space. In case that the user space value update
faults and the fault cannot be resolved by faulting the page in via
fault_in_user_writeable() the function returns with -EFAULT and leaves
the rtmutex and pi_state owner state inconsistent.
A subsequent futex_unlock_pi() operates on the inconsistent pi_state and
releases the rtmutex despite not owning it which can corrupt the RB tree of
the rtmutex and cause a subsequent kernel stack use after free.
It was suggested to loop forever in fixup_pi_state_owner() if the fault
cannot be resolved, but that results in runaway tasks which is especially
undesired when the problem happens due to a programming error and not due
to malice.
As the user space value cannot be fixed up, the proper solution is to make
the rtmutex and the pi_state consistent so both have the same owner. This
leaves the user space value out of sync. Any subsequent operation on the
futex will fail because the 10th rule of PI futexes (pi_state owner and
user space value are consistent) has been violated.
As a consequence this removes the inept attempts of 'fixing' the situation
in case that the current task owns the rtmutex when returning with an
unresolvable fault by unlocking the rtmutex which left pi_state::owner and
rtmutex::owner out of sync in a different and only slightly less dangerous
way.
Fixes: 1b7558e457 ("futexes: fix fault handling in futex_lock_pi")
Reported-by: gzobqq@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c5cade200a ]
Updating pi_state::owner is done at several places with the same
code. Provide a function for it and use that at the obvious places.
This is also a preparation for a bug fix to avoid yet another copy of the
same code or alternatively introducing a completely unpenetratable mess of
gotos.
Originally-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 04b79c5520 ]
If that unexpected case of inconsistent arguments ever happens then the
futex state is left completely inconsistent and the printk is not really
helpful. Replace it with a warning and make the state consistent.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1e2f0eaf0 upstream.
Julia reported futex state corruption in the following scenario:
waiter waker stealer (prio > waiter)
futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
timeout=[N ms])
futex_wait_requeue_pi()
futex_wait_queue_me()
freezable_schedule()
<scheduled out>
futex(LOCK_PI, uaddr2)
futex(CMP_REQUEUE_PI, uaddr,
uaddr2, 1, 0)
/* requeues waiter to uaddr2 */
futex(UNLOCK_PI, uaddr2)
wake_futex_pi()
cmp_futex_value_locked(uaddr2, waiter)
wake_up_q()
<woken by waker>
<hrtimer_wakeup() fires,
clears sleeper->task>
futex(LOCK_PI, uaddr2)
__rt_mutex_start_proxy_lock()
try_to_take_rt_mutex() /* steals lock */
rt_mutex_set_owner(lock, stealer)
<preempted>
<scheduled in>
rt_mutex_wait_proxy_lock()
__rt_mutex_slowlock()
try_to_take_rt_mutex() /* fails, lock held by stealer */
if (timeout && !timeout->task)
return -ETIMEDOUT;
fixup_owner()
/* lock wasn't acquired, so,
fixup_pi_state_owner skipped */
return -ETIMEDOUT;
/* At this point, we've returned -ETIMEDOUT to userspace, but the
* futex word shows waiter to be the owner, and the pi_mutex has
* stealer as the owner */
futex_lock(LOCK_PI, uaddr2)
-> bails with EDEADLK, futex word says we're owner.
And suggested that what commit:
73d786bd04 ("futex: Rework inconsistent rt_mutex/futex_q state")
removes from fixup_owner() looks to be just what is needed. And indeed
it is -- I completely missed that requeue_pi could also result in this
case. So we need to restore that, except that subsequent patches, like
commit:
16ffa12d74 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")
changed all the locking rules. Even without that, the sequence:
- if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
- locked = 1;
- goto out;
- }
- raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
- owner = rt_mutex_owner(&q->pi_state->pi_mutex);
- if (!owner)
- owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
- raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
- ret = fixup_pi_state_owner(uaddr, q, owner);
already suggests there were races; otherwise we'd never have to look
at next_owner.
So instead of doing 3 consecutive wait_lock sections with who knows
what races, we do it all in a single section. Additionally, the usage
of pi_state->owner in fixup_owner() was only safe because only the
rt_mutex owner would modify it, which this additional case wrecks.
Luckily the values can only change away and not to the value we're
testing, this means we can do a speculative test and double check once
we have the wait_lock.
Fixes: 73d786bd04 ("futex: Rework inconsistent rt_mutex/futex_q state")
Reported-by: Julia Cartwright <julia@ni.com>
Reported-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Julia Cartwright <julia@ni.com>
Tested-by: Gratian Crisan <gratian.crisan@ni.com>
Cc: Darren Hart <dvhart@infradead.org>
Link: https://lkml.kernel.org/r/20171208124939.7livp7no2ov65rrc@hirez.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Lee: Back-ported to solve a dependency]
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Upstream commit 73d786bd04 ]
There is a weird state in the futex_unlock_pi() path when it interleaves
with a concurrent futex_lock_pi() at the point where it drops hb->lock.
In this case, it can happen that the rt_mutex wait_list and the futex_q
disagree on pending waiters, in particular rt_mutex will find no pending
waiters where futex_q thinks there are. In this case the rt_mutex unlock
code cannot assign an owner.
The futex side fixup code has to cleanup the inconsistencies with quite a
bunch of interesting corner cases.
Simplify all this by changing wake_futex_pi() to return -EAGAIN when this
situation occurs. This then gives the futex_lock_pi() code the opportunity
to continue and the retried futex_unlock_pi() will now observe a coherent
state.
The only problem is that this breaks RT timeliness guarantees. That
is, consider the following scenario:
T1 and T2 are both pinned to CPU0. prio(T2) > prio(T1)
CPU0
T1
lock_pi()
queue_me() <- Waiter is visible
preemption
T2
unlock_pi()
loops with -EAGAIN forever
Which is undesirable for PI primitives. Future patches will rectify
this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104151.850383690@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[Lee: Back-ported to solve a dependency]
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9def3b1a07 upstream.
Since commit c40aaaac10 ("iommu/vt-d: Gracefully handle DMAR units
with no supported address widths") dmar.c needs struct iommu_device to
be selected. We can drop this dependency by not dereferencing struct
iommu_device if IOMMU_API is not selected and by reusing the information
stored in iommu->drhd->ignored instead.
This fixes the following build error when IOMMU_API is not selected:
drivers/iommu/dmar.c: In function ‘free_iommu’:
drivers/iommu/dmar.c:1139:41: error: ‘struct iommu_device’ has no member named ‘ops’
1139 | if (intel_iommu_enabled && iommu->iommu.ops) {
^
Fixes: c40aaaac10 ("iommu/vt-d: Gracefully handle DMAR units with no supported address widths")
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20201013073055.11262-1-brgl@bgdev.pl
Signed-off-by: Joerg Roedel <jroedel@suse.de>
[ - context change due to moving drivers/iommu/dmar.c to
drivers/iommu/intel/dmar.c
- set the drhr in the iommu like in upstream commit b1012ca8dc
("iommu/vt-d: Skip TE disabling on quirky gfx dedicated iommu") ]
Signed-off-by: Filippo Sironi <sironi@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3d372c4edf ]
If we spin for a long time in memory reads that (for some reason in
hardware) take a long time, then we'll eventually get messages such
as
watchdog: BUG: soft lockup - CPU#2 stuck for 24s! [kworker/2:2:272]
This is because the reading really does take a very long time, and
we don't schedule, so we're hogging the CPU with this task, at least
if CONFIG_PREEMPT is not set, e.g. with CONFIG_PREEMPT_VOLUNTARY=y.
Previously I misinterpreted the situation and thought that this was
only going to happen if we had interrupts disabled, and then fixed
this (which is good anyway, however), but that didn't always help;
looking at it again now I realized that the spin unlock will only
reschedule if CONFIG_PREEMPT is used.
In order to avoid this issue, change the code to cond_resched() if
we've been spinning for too long here.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes: 04516706bb ("iwlwifi: pcie: limit memory read spin time")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130253.217a9d6a6a12.If964cb582ab0aaa94e81c4ff3b279eaafda0fd3f@changeid
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56ce7c25ae ]
When setting xfrm replay_window to values higher than 32, a rare
page-fault occurs in xfrm_replay_advance_bmp:
BUG: unable to handle page fault for address: ffff8af350ad7920
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD ad001067 P4D ad001067 PUD 0
Oops: 0002 [#1] SMP PTI
CPU: 3 PID: 30 Comm: ksoftirqd/3 Kdump: loaded Not tainted 5.4.52-050452-generic #202007160732
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:xfrm_replay_advance_bmp+0xbb/0x130
RSP: 0018:ffffa1304013ba40 EFLAGS: 00010206
RAX: 000000000000010d RBX: 0000000000000002 RCX: 00000000ffffff4b
RDX: 0000000000000018 RSI: 00000000004c234c RDI: 00000000ffb3dbff
RBP: ffffa1304013ba50 R08: ffff8af330ad7920 R09: 0000000007fffffa
R10: 0000000000000800 R11: 0000000000000010 R12: ffff8af29d6258c0
R13: ffff8af28b95c700 R14: 0000000000000000 R15: ffff8af29d6258fc
FS: 0000000000000000(0000) GS:ffff8af339ac0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8af350ad7920 CR3: 0000000015ee4000 CR4: 00000000001406e0
Call Trace:
xfrm_input+0x4e5/0xa10
xfrm4_rcv_encap+0xb5/0xe0
xfrm4_udp_encap_rcv+0x140/0x1c0
Analysis revealed offending code is when accessing:
replay_esn->bmp[nr] |= (1U << bitnr);
with 'nr' being 0x07fffffa.
This happened in an SMP system when reordering of packets was present;
A packet arrived with a "too old" sequence number (outside the window,
i.e 'diff > replay_window'), and therefore the following calculation:
bitnr = replay_esn->replay_window - (diff - pos);
yields a negative result, but since bitnr is u32 we get a large unsigned
quantity (in crash dump above: 0xffffff4b seen in ecx).
This was supposed to be protected by xfrm_input()'s former call to:
if (x->repl->check(x, skb, seq)) {
However, the state's spinlock x->lock is *released* after '->check()'
is performed, and gets re-acquired before '->advance()' - which gives a
chance for a different core to update the xfrm state, e.g. by advancing
'replay_esn->seq' when it encounters more packets - leading to a
'diff > replay_window' situation when original core continues to
xfrm_replay_advance_bmp().
An attempt to fix this issue was suggested in commit bcf66bf54a
("xfrm: Perform a replay check after return from async codepaths"),
by calling 'x->repl->recheck()' after lock is re-acquired, but fix
applied only to asyncronous crypto algorithms.
Augment the fix, by *always* calling 'recheck()' - irrespective if we're
using async crypto.
Fixes: 0ebea8ef35 ("[IPSEC]: Move state lock into x->type->input")
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 0c5b7a501e upstream.
Otherwise, the newly create element shows no timeout when listing the
ruleset. If the set definition does not specify a default timeout, then
the set element only shows the expiration time, but not the timeout.
This is a problem when restoring a stateful ruleset listing since it
skips the timeout policy entirely.
Fixes: 22fe54d5fe ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a88afa46b8 upstream.
When the kernel is configured to use the Thumb-2 instruction set
"suspend-to-memory" fails to resume. Observed on a Colibri iMX6ULL
(i.MX 6ULL) and Apalis iMX6 (i.MX 6Q).
It looks like the CPU resumes unconditionally in ARM instruction mode
and then chokes on the presented Thumb-2 code it should execute.
Fix this by using the arm instruction set for all code in
suspend-imx6.S.
Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com>
Fixes: df595746fa ("ARM: imx: add suspend in ocram support for i.mx6q")
Acked-by: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 27af8e2c90 upstream.
We have the following potential deadlock condition:
========================================================
WARNING: possible irq lock inversion dependency detected
5.10.0-rc2+ #25 Not tainted
--------------------------------------------------------
swapper/3/0 just changed the state of lock:
ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200
but this lock took another, HARDIRQ-READ-unsafe lock in the past:
(&trig->leddev_list_lock){.+.?}-{2:2}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&trig->leddev_list_lock);
local_irq_disable();
lock(&host->lock);
lock(&trig->leddev_list_lock);
<Interrupt>
lock(&host->lock);
*** DEADLOCK ***
no locks held by swapper/3/0.
the shortest dependencies between 2nd lock and 1st lock:
-> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 {
HARDIRQ-ON-R at:
lock_acquire+0x15f/0x420
_raw_read_lock+0x42/0x90
led_trigger_event+0x2b/0x70
rfkill_global_led_trigger_worker+0x94/0xb0
process_one_work+0x240/0x560
worker_thread+0x58/0x3d0
kthread+0x151/0x170
ret_from_fork+0x1f/0x30
IN-SOFTIRQ-R at:
lock_acquire+0x15f/0x420
_raw_read_lock+0x42/0x90
led_trigger_event+0x2b/0x70
kbd_bh+0x9e/0xc0
tasklet_action_common.constprop.0+0xe9/0x100
tasklet_action+0x22/0x30
__do_softirq+0xcc/0x46d
run_ksoftirqd+0x3f/0x70
smpboot_thread_fn+0x116/0x1f0
kthread+0x151/0x170
ret_from_fork+0x1f/0x30
SOFTIRQ-ON-R at:
lock_acquire+0x15f/0x420
_raw_read_lock+0x42/0x90
led_trigger_event+0x2b/0x70
rfkill_global_led_trigger_worker+0x94/0xb0
process_one_work+0x240/0x560
worker_thread+0x58/0x3d0
kthread+0x151/0x170
ret_from_fork+0x1f/0x30
INITIAL READ USE at:
lock_acquire+0x15f/0x420
_raw_read_lock+0x42/0x90
led_trigger_event+0x2b/0x70
rfkill_global_led_trigger_worker+0x94/0xb0
process_one_work+0x240/0x560
worker_thread+0x58/0x3d0
kthread+0x151/0x170
ret_from_fork+0x1f/0x30
}
... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10
... acquired at:
_raw_read_lock+0x42/0x90
led_trigger_blink_oneshot+0x3b/0x90
ledtrig_disk_activity+0x3c/0xa0
ata_qc_complete+0x26/0x450
ata_do_link_abort+0xa3/0xe0
ata_port_freeze+0x2e/0x40
ata_hsm_qc_complete+0x94/0xa0
ata_sff_hsm_move+0x177/0x7a0
ata_sff_pio_task+0xc7/0x1b0
process_one_work+0x240/0x560
worker_thread+0x58/0x3d0
kthread+0x151/0x170
ret_from_fork+0x1f/0x30
-> (&host->lock){-...}-{2:2} ops: 69 {
IN-HARDIRQ-W at:
lock_acquire+0x15f/0x420
_raw_spin_lock_irqsave+0x52/0xa0
ata_bmdma_interrupt+0x27/0x200
__handle_irq_event_percpu+0xd5/0x2b0
handle_irq_event+0x57/0xb0
handle_edge_irq+0x8c/0x230
asm_call_irq_on_stack+0xf/0x20
common_interrupt+0x100/0x1c0
asm_common_interrupt+0x1e/0x40
native_safe_halt+0xe/0x10
arch_cpu_idle+0x15/0x20
default_idle_call+0x59/0x1c0
do_idle+0x22c/0x2c0
cpu_startup_entry+0x20/0x30
start_secondary+0x11d/0x150
secondary_startup_64_no_verify+0xa6/0xab
INITIAL USE at:
lock_acquire+0x15f/0x420
_raw_spin_lock_irqsave+0x52/0xa0
ata_dev_init+0x54/0xe0
ata_link_init+0x8b/0xd0
ata_port_alloc+0x1f1/0x210
ata_host_alloc+0xf1/0x130
ata_host_alloc_pinfo+0x14/0xb0
ata_pci_sff_prepare_host+0x41/0xa0
ata_pci_bmdma_prepare_host+0x14/0x30
piix_init_one+0x21f/0x600
local_pci_probe+0x48/0x80
pci_device_probe+0x105/0x1c0
really_probe+0x221/0x490
driver_probe_device+0xe9/0x160
device_driver_attach+0xb2/0xc0
__driver_attach+0x91/0x150
bus_for_each_dev+0x81/0xc0
driver_attach+0x1e/0x20
bus_add_driver+0x138/0x1f0
driver_register+0x91/0xf0
__pci_register_driver+0x73/0x80
piix_init+0x1e/0x2e
do_one_initcall+0x5f/0x2d0
kernel_init_freeable+0x26f/0x2cf
kernel_init+0xe/0x113
ret_from_fork+0x1f/0x30
}
... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10
... acquired at:
__lock_acquire+0x9da/0x2370
lock_acquire+0x15f/0x420
_raw_spin_lock_irqsave+0x52/0xa0
ata_bmdma_interrupt+0x27/0x200
__handle_irq_event_percpu+0xd5/0x2b0
handle_irq_event+0x57/0xb0
handle_edge_irq+0x8c/0x230
asm_call_irq_on_stack+0xf/0x20
common_interrupt+0x100/0x1c0
asm_common_interrupt+0x1e/0x40
native_safe_halt+0xe/0x10
arch_cpu_idle+0x15/0x20
default_idle_call+0x59/0x1c0
do_idle+0x22c/0x2c0
cpu_startup_entry+0x20/0x30
start_secondary+0x11d/0x150
secondary_startup_64_no_verify+0xa6/0xab
This lockdep splat is reported after:
commit e918188611 ("locking: More accurate annotations for read_lock()")
To clarify:
- read-locks are recursive only in interrupt context (when
in_interrupt() returns true)
- after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call
write_lock(&trig->leddev_list_lock) that would be blocked by CPU0
that holds trig->leddev_list_lock in read-mode
- when CPU1 (ata_ac_complete()) tries to read-lock
trig->leddev_list_lock, it would be blocked by the write-lock waiter
on CPU2 (because we are not in interrupt context, so the read-lock is
not recursive)
- at this point if an interrupt happens on CPU0 and
ata_bmdma_interrupt() is executed it will try to acquire host->lock,
that is held by CPU1, that is currently blocked by CPU2, so:
* CPU0 blocked by CPU1
* CPU1 blocked by CPU2
* CPU2 blocked by CPU0
*** DEADLOCK ***
The deadlock scenario is better represented by the following schema
(thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the
detailed explanation of the deadlock condition):
CPU 0: CPU 1: CPU 2:
----- ----- -----
led_trigger_event():
read_lock(&trig->leddev_list_lock);
<workqueue>
ata_hsm_qc_complete():
spin_lock_irqsave(&host->lock);
write_lock(&trig->leddev_list_lock);
ata_port_freeze():
ata_do_link_abort():
ata_qc_complete():
ledtrig_disk_activity():
led_trigger_blink_oneshot():
read_lock(&trig->leddev_list_lock);
// ^ not in in_interrupt() context, so could get blocked by CPU 2
<interrupt>
ata_bmdma_interrupt():
spin_lock_irqsave(&host->lock);
Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so
that no interrupt can happen in between, preventing the deadlock
condition.
Apply the same change to led_trigger_blink_setup() as well, since the
same deadlock scenario can also happen in power_supply_update_bat_leds()
-> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context),
and potentially prevent other similar usages.
Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/
Fixes: eb25cb9956 ("leds: convert IDE trigger to common disk trigger")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>