The blk_mq_delay_kick_requeue_list() function is used by the device
mapper and only by the device mapper to rerun the queue and requeue
list after a delay. This function is called once per request that
gets requeued. Modify this function such that the queue is run once
per path change event instead of once per request that is requeued.
Fixes: commit 2849450ad3 ("blk-mq: introduce blk_mq_delay_kick_requeue_list()")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This gets us back to the behavior in 4.12 and earlier.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In dm-integrity target we register integrity profile that have
both generate_fn and verify_fn callbacks set to NULL.
This is used if dm-integrity is stacked under a dm-crypt device
for authenticated encryption (integrity payload contains authentication
tag and IV seed).
In this case the verification is done through own crypto API
processing inside dm-crypt; integrity profile is only holder
of these data. (And memory is owned by dm-crypt as well.)
After the commit (and previous changes)
Commit 7c20f11680
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Jul 3 16:58:43 2017 -0600
bio-integrity: stop abusing bi_end_io
we get this crash:
: BUG: unable to handle kernel NULL pointer dereference at (null)
: IP: (null)
: *pde = 00000000
...
:
: Workqueue: kintegrityd bio_integrity_verify_fn
: task: f48ae180 task.stack: f4b5c000
: EIP: (null)
: EFLAGS: 00210286 CPU: 0
: EAX: f4b5debc EBX: 00001000 ECX: 00000001 EDX: 00000000
: ESI: 00001000 EDI: ed25f000 EBP: f4b5dee8 ESP: f4b5dea4
: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
: CR0: 80050033 CR2: 00000000 CR3: 32823000 CR4: 001406d0
: Call Trace:
: ? bio_integrity_process+0xe3/0x1e0
: bio_integrity_verify_fn+0xea/0x150
: process_one_work+0x1c7/0x5c0
: worker_thread+0x39/0x380
: kthread+0xd6/0x110
: ? process_one_work+0x5c0/0x5c0
: ? kthread_worker_fn+0x100/0x100
: ? kthread_worker_fn+0x100/0x100
: ret_from_fork+0x19/0x24
: Code: Bad EIP value.
: EIP: (null) SS:ESP: 0068:f4b5dea4
: CR2: 0000000000000000
Patch just skip the whole verify workqueue if verify_fn is set to NULL.
Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
Signed-off-by: Milan Broz <gmazyland@gmail.com>
[hch: trivial whitespace fix]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fixes hitting WARN_ON() during initialisation of pre-NV50 GPUs, caused
by the recent changes to support pad macro routing on GM20x.
We currently don't use them here for older GPUs anyway.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
drm/i915 fixes for v4.13-rc5
* tag 'drm-intel-fixes-2017-08-09-1' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: fix backlight invert for non-zero minimum brightness
drm/i915/shrinker: Wrap need_resched() inside preempt-disable
drm/i915/perf: fix flex eu registers programming
drm/i915: Fix out-of-bounds array access in bdw_load_gamma_lut
drm/i915/gvt: Change the max length of mmio_reg_rw from 4 to 8
drm/i915/gvt: Initialize MMIO Block with HW state
drm/i915/gvt: clean workload queue if error happened
drm/i915/gvt: change resetting to resetting_eng
Core Changes:
- dma-buf: Allow multiple sync_files to wrap a single dma-fence (Chris)
Driver Changes:
- rockchip: misc fixes to vop driver from the downstream rockchip tree (Mark)
- Error path cleanups to tc358767 & host1x (Lucas & Paul, respectively)
* tag 'drm-misc-fixes-2017-08-08' of git://anongit.freedesktop.org/git/drm-misc:
drm/rockchip: vop: report error when check resource error
drm/rockchip: vop: round_up pitches to word align
drm/rockchip: vop: fix NV12 video display error
drm/rockchip: vop: fix iommu page fault when resume
dma-buf/sync_file: Allow multiple sync_files to wrap a single dma-fence
drm/bridge: tc358767: fix probe without attached output node
Fix a issue to display system memory region outside a gem buffer.
* tag 'exynos-drm-fixes-for-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos: forbid creating framebuffers from too small GEM buffers
Bunch of msm fixes for 4.13
* 'msm-fixes-4.13-rc3' of git://people.freedesktop.org/~robclark/linux:
drm/msm: gpu: don't abuse dma_alloc for non-DMA allocations
drm/msm: gpu: call qcom_mdt interfaces only for ARCH_QCOM
drm/msm/adreno: Prevent unclocked access when retrieving timestamps
drm/msm: Remove __user from __u64 data types
drm/msm: args->fence should be args->flags
drm/msm: Turn off hardware clock gating before reading A5XX registers
drm/msm: Allow hardware clock gating to be toggled
drm/msm: Remove some potentially blocked register ranges
drm/msm/mdp5: Drop clock names with "_clk" suffix
drm/msm/mdp5: Fix typo in encoder_enable path
drm/msm: NULL pointer dereference in drivers/gpu/drm/msm/msm_gem_vma.c
drm/msm: fix WARN_ON in add_vma() with no iommu
drm/msm/dsi: Calculate link clock rates with updated dsi->lanes
drm/msm/mdp5: fix unclocked register access in _cursor_set()
drm/msm: unlock on error in msm_gem_get_iova()
drm/msm: fix an integer overflow test
drm/msm/mdp5: Fix compilation warnings
Florian Westphal says:
====================
rtnetlink: allow selected handlers to run without rtnl
Changes since v1:
In patch 6, don't make ipv6 route handlers lockless, they all have
assumptions on rtnl being held. Other patches are unchanged.
The RTNL mutex is used to serialize both rtnetlink calls and
dump requests.
Its also used to protect other things such as the list of current
net namespaces.
Unfortunately RTNL mutex is a performance issue, e.g. a cpu adding an
ip address prevents other cpus from seemingly unrelated tasks such as
dumping tc classifiers or doing rtnetlink route lookups.
This patch set adds basic infrastructure to start pushing the rtnl lock
down to those places that need it, or even elide it entirely in some cases.
Subsystems can now indicate that their doit() callback can run without
RTNL mutex, such callbacks can then run in parallel.
This will obviously need a lot of followup work; all current
users need to be audited/changed to benefit from this.
Initial no-rtnl spot is netns new/getid.
We have various 'get' handlers that are also a tempting target,
however, several of these depend on rtnl mutex to prevent information
from changing while objects are being read by rtnl handlers; however,
it doesn't appear impossible to change this.
Dumps are another problem entirely, see
commit 2907c35ff6 ("net: hold rtnl again in dump callbacks"),
this patchset doesn't touch dump requests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow callers to tell rtnetlink core that its doit callback
should be invoked without holding rtnl mutex.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Note that netlink dumps still acquire rtnl mutex via the netlink
dump infrastructure.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
instead of rtnl lock/unload at the top level, push it down
to the called function.
This is just an intermediate step, next commit switches protection
of the rtnl_link ops table to rcu, in which case (for dumps) the
rtnl lock is acquired only by the netlink dumper infrastructure
(current lock/unlock/dump/lock/unlock rtnl sequence becomes
rcu lock/rcu unlock/dump).
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
I don't see what prevents rmmod (unregister_all is called) while a dump
is active.
Even if we'd add rtnl lock/unlock pair to unregister_all (as done here),
thats not enough either as rtnl_lock is released right before the dump
process starts.
So this adds a refcount:
* acquire rtnl mutex
* bump refcount
* release mutex
* start the dump
... and make unregister_all remove the callbacks (no new dumps possible)
and then wait until refcount is 0.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change allows us to later indicate to rtnetlink core that certain
doit functions should be called without acquiring rtnl_mutex.
This change should have no effect, we simply replace the last (now
unused) calcit argument with the new flag.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is only a single place in the kernel that regisers the "calcit"
callback (to determine min allocation for dumps).
This is in rtnetlink.c for PF_UNSPEC RTM_GETLINK.
The function that checks for calcit presence at run time will first check
the requested family (which will always fail for !PF_UNSPEC as no callsite
registers this), then falls back to checking PF_UNSPEC.
Therefore we can just check if type is RTM_GETLINK and then do a direct
call. Because of fallback to PF_UNSPEC all RTM_GETLINK types used this
regardless of family.
This has the advantage that we don't need to allocate space for
the function pointer for all the other families.
A followup patch will drop the calcit function pointer from the
rtnl_link callback structure.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
bpf: Add BPF_J{LT,LE,SLT,SLE} instructions
This set adds BPF_J{LT,LE,SLT,SLE} instructions to the BPF
insn set, interpreter, JIT hardening code and all JITs are
also updated to support the new instructions. Basic idea is
to reduce register pressure by avoiding BPF_J{GT,GE,SGT,SGE}
rewrites. Removing the workaround for the rewrites in LLVM,
this can result in shorter BPF programs, less stack usage
and less verification complexity. First patch provides some
more details on rationale and integration.
Thanks a lot!
v1 -> v2:
- Reworded commit msg in patch 1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add test cases to the verifier selftest suite in order to verify that
i) direct packet access, and ii) dynamic map value access is working
with the changes related to the new instructions.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable the newly added jump opcodes, main parts are in two
different areas, namely direct packet access and dynamic map
value access. For the direct packet access, we now allow for
the following two new patterns to match in order to trigger
markings with find_good_pkt_pointers():
Variant 1 (access ok when taking the branch):
0: (61) r2 = *(u32 *)(r1 +76)
1: (61) r3 = *(u32 *)(r1 +80)
2: (bf) r0 = r2
3: (07) r0 += 8
4: (ad) if r0 < r3 goto pc+2
R0=pkt(id=0,off=8,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R10=fp
5: (b7) r0 = 0
6: (95) exit
from 4 to 7: R0=pkt(id=0,off=8,r=8) R1=ctx
R2=pkt(id=0,off=0,r=8) R3=pkt_end R10=fp
7: (71) r0 = *(u8 *)(r2 +0)
8: (05) goto pc-4
5: (b7) r0 = 0
6: (95) exit
processed 11 insns, stack depth 0
Variant 2 (access ok on fall-through):
0: (61) r2 = *(u32 *)(r1 +76)
1: (61) r3 = *(u32 *)(r1 +80)
2: (bf) r0 = r2
3: (07) r0 += 8
4: (bd) if r3 <= r0 goto pc+1
R0=pkt(id=0,off=8,r=8) R1=ctx R2=pkt(id=0,off=0,r=8)
R3=pkt_end R10=fp
5: (71) r0 = *(u8 *)(r2 +0)
6: (b7) r0 = 1
7: (95) exit
from 4 to 6: R0=pkt(id=0,off=8,r=0) R1=ctx
R2=pkt(id=0,off=0,r=0) R3=pkt_end R10=fp
6: (b7) r0 = 1
7: (95) exit
processed 10 insns, stack depth 0
The above two basically just swap the branches where we need
to handle an exception and allow packet access compared to the
two already existing variants for find_good_pkt_pointers().
For the dynamic map value access, we add the new instructions
to reg_set_min_max() and reg_set_min_max_inv() in order to
learn bounds. Verifier test cases for both are added in a
follow-up patch.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work implements jiting of BPF_J{LT,LE} instructions with
BPF_X/BPF_K variants for the nfp eBPF JIT. The two BPF_J{SLT,SLE}
instructions have not been added yet given BPF_J{SGT,SGE} are
not supported yet either.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions
with BPF_X/BPF_K variants for the s390x eBPF JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions
with BPF_X/BPF_K variants for the sparc64 eBPF JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions
with BPF_X/BPF_K variants for the arm64 eBPF JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions
with BPF_X/BPF_K variants for the x86_64 eBPF JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, eBPF only understands BPF_JGT (>), BPF_JGE (>=),
BPF_JSGT (s>), BPF_JSGE (s>=) instructions, this means that
particularly *JLT/*JLE counterparts involving immediates need
to be rewritten from e.g. X < [IMM] by swapping arguments into
[IMM] > X, meaning the immediate first is required to be loaded
into a register Y := [IMM], such that then we can compare with
Y > X. Note that the destination operand is always required to
be a register.
This has the downside of having unnecessarily increased register
pressure, meaning complex program would need to spill other
registers temporarily to stack in order to obtain an unused
register for the [IMM]. Loading to registers will thus also
affect state pruning since we need to account for that register
use and potentially those registers that had to be spilled/filled
again. As a consequence slightly more stack space might have
been used due to spilling, and BPF programs are a bit longer
due to extra code involving the register load and potentially
required spill/fills.
Thus, add BPF_JLT (<), BPF_JLE (<=), BPF_JSLT (s<), BPF_JSLE (s<=)
counterparts to the eBPF instruction set. Modifying LLVM to
remove the NegateCC() workaround in a PoC patch at [1] and
allowing it to also emit the new instructions resulted in
cilium's BPF programs that are injected into the fast-path to
have a reduced program length in the range of 2-3% (e.g.
accumulated main and tail call sections from one of the object
file reduced from 4864 to 4729 insns), reduced complexity in
the range of 10-30% (e.g. accumulated sections reduced in one
of the cases from 116432 to 88428 insns), and reduced stack
usage in the range of 1-5% (e.g. accumulated sections from one
of the object files reduced from 824 to 784b).
The modification for LLVM will be incorporated in a backwards
compatible way. Plan is for LLVM to have i) a target specific
option to offer a possibility to explicitly enable the extension
by the user (as we have with -m target specific extensions today
for various CPU insns), and ii) have the kernel checked for
presence of the extensions and enable them transparently when
the user is selecting more aggressive options such as -march=native
in a bpf target context. (Other frontends generating BPF byte
code, e.g. ply can probe the kernel directly for its code
generation.)
[1] https://github.com/borkmann/llvm/tree/bpf-insns
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn says:
====================
net: zerocopy fixes
Fix two issues introduced in the msg_zerocopy patchset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not use uarg->zerocopy outside msg_zerocopy. In other paths the
field is not explicitly initialized and aliases another field.
Those paths have only one reference so do not need this intermediate
variable. Call uarg->callback directly.
Fixes: 1f8b977ab3 ("sock: enable MSG_ZEROCOPY")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only call mm_unaccount_pinned_pages when releasing a struct ubuf_info
that has initialized its field uarg->mmp.
Before this patch, a vhost-net with experimental_zcopytx can crash in
mm_unaccount_pinned_pages
sock_zerocopy_put
skb_zcopy_clear
skb_release_data
Only sock_zerocopy_alloc initializes this field. Move the unaccount
call from generic sock_zerocopy_put to its specific callback
sock_zerocopy_callback.
Fixes: a91dbff551 ("sock: ulimit on MSG_ZEROCOPY pages")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
1GbE Intel Wired LAN Driver Updates 2017-08-08
This series contains updates to e1000e and igb/igbvf.
Gangfeng Huang fixes an issue with receive network flow classification,
where igb_nfc_filter_exit() was not being called in igb_down() which
would cause the filter tables to "fill up" if a user where to change
the adapter settings (such as speed) which requires a reset of the
adapter.
Cliff Spradlin fixes a timestamping issue, where igb was allowing requests
for hardware timestamping even if it was not configured for hardware
transmit timestamping.
Corinna Vinschen removes the error message that there was an "unexpected
SYS WRAP", when it is actually expected. So remove the message to not
confuse users.
Greg Edwards provides several patches for the mailbox interface between
the PF and VF drivers. Added a mailbox unlock method to be used to unlock
the PF/VF mailbox by the PF. Added a lock around the VF mailbox ops to
prevent the VF from sending another message while the PF is still
processing the previous message. Fixed a "scheduling while atomic" issue
by changing msleep() to mdelay().
Sasha adds support for the next LOM generations i219 (v8 & v9) which
will be available in the next Intel client platform IceLake.
John Linville adds support for a Broadcom PHY to the igb driver, since
there are designs out in the world which use the igb MAC and a third
party PHY. This allows the driver to load and function as expected on
these designs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP offload conflict is dealt with by simply taking what is
in net-next where we have removed all of the UFO handling code
entirely.
The TCP conflict was a case of local variables in a function
being removed from both net and net-next.
In netvsc we had an assignment right next to where a missing
set of u64 stats sync object inits were added.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull pin control fixes from Linus Walleij:
"These are the pin control fixes I have gathered since the return from
my vacation. They boiled in -next a while so let's get them in.
Apart from the documentation build it is purely driver fixes. Which is
nice. The Intel fixes seem kind of important.
- Fix the documentation build as the docs were moved
- Correct the UART pin list on the Intel Merrifield
- Fix pin assignment and number of pins on the Marvell Armada 37xx
pin controller
- Cover the Setzer models in the Chromebook DMI quirk in the Intel
cheryview driver so they start working
- Add the missing "sim" function to the sunxi driver
- Fix USB pin definitions on Uniphier Pro4
- Smatch fix for invalid reference in the zx pin control driver"
* tag 'pinctrl-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: generic: update references to Documentation/pinctrl.txt
pinctrl: intel: merrifield: Correct UART pin lists
pinctrl: armada-37xx: Fix number of pin in south bridge
pinctrl: armada-37xx: Fix the pin 23 on south bridge
pinctrl: cherryview: Add Setzer models to the Chromebook DMI quirk
pinctrl: sunxi: add a missing function of A10/A20 pinctrl driver
pinctrl: uniphier: fix USB3 pin assignment for Pro4
pinctrl: zte: fix dereference of 'data' in zx_set_mux()
Commit 65d8fc777f ("futex: Remove requirement for lock_page() in
get_futex_key()") removed an unnecessary lock_page() with the
side-effect that page->mapping needed to be treated very carefully.
Two defensive warnings were added in case any assumption was missed and
the first warning assumed a correct application would not alter a
mapping backing a futex key. Since merging, it has not triggered for
any unexpected case but Mark Rutland reported the following bug
triggering due to the first warning.
kernel BUG at kernel/futex.c:679!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3
Hardware name: linux,dummy-virt (DT)
task: ffff80001e271780 task.stack: ffff000010908000
PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
pc : [<ffff00000821ac14>] lr : [<ffff00000821ac14>] pstate: 80000145
The fact that it's a bug instead of a warning was due to an unrelated
arm64 problem, but the warning itself triggered because the underlying
mapping changed.
This is an application issue but from a kernel perspective it's a
recoverable situation and the warning is unnecessary so this patch
removes the warning. The warning may potentially be triggered with the
following test program from Mark although it may be necessary to adjust
NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the
system.
#include <linux/futex.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <unistd.h>
#define NR_FUTEX_THREADS 16
pthread_t threads[NR_FUTEX_THREADS];
void *mem;
#define MEM_PROT (PROT_READ | PROT_WRITE)
#define MEM_SIZE 65536
static int futex_wrapper(int *uaddr, int op, int val,
const struct timespec *timeout,
int *uaddr2, int val3)
{
syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3);
}
void *poll_futex(void *unused)
{
for (;;) {
futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1);
}
}
int main(int argc, char *argv[])
{
int i;
mem = mmap(NULL, MEM_SIZE, MEM_PROT,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
printf("Mapping @ %p\n", mem);
printf("Creating futex threads...\n");
for (i = 0; i < NR_FUTEX_THREADS; i++)
pthread_create(&threads[i], NULL, poll_futex, NULL);
printf("Flipping mapping...\n");
for (;;) {
mmap(mem, MEM_SIZE, MEM_PROT,
MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
}
return 0;
}
Reported-and-tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull i2c fixes from Wolfram Sang:
"The main thing is to allow empty id_tables for ACPI to make some
drivers get probed again. It looks a bit bigger than usual because it
needs some internal renaming, too.
Other than that, there is a fix for broken DSTDs, a super simple
enablement for ARM MPS, and two documentation fixes which I'd like to
see in v4.13 already"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: rephrase explanation of I2C_CLASS_DEPRECATED
i2c: allow i2c-versatile for ARM MPS platforms
i2c: designware: Some broken DSTDs use 1MiHz instead of 1MHz
i2c: designware: Print clock freq on invalid clock freq error
i2c: core: Allow empty id_table in ACPI case as well
i2c: mux: pinctrl: mention correct module name in Kconfig help text
Some more fixes for 4.13
* Fix a memory leak in the SAR code;
* Fix a stuck queue case in AP mode;
* Convert a WARN to a simple debug in a legitimate race case (from
which we can recover);
* Fix a severe throughput aggregation on 9000-family devices due to
aggregation issues.
Pull block fixes from Jens Axboe:
"Three patches that should go into this release.
Two of them are from Paolo and fix up some corner cases with BFQ, and
the last patch is from Ming and fixes up a potential usage count
imbalance regression due to the recent NOWAIT work"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: don't leak preempt counter/q_usage_counter when allocating rq failed
block, bfq: consider also in_service_entity to state whether an entity is active
block, bfq: reset in_service_entity if it becomes idle
If the call to TEST_STATEID returns NFS4ERR_OLD_STATEID, then it just
means we raced with other calls to OPEN.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Pull crypto fixes from Herbert Xu:
"Fix two regressions in the inside-secure driver with respect to
hmac(sha1)"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: inside-secure - fix the sha state length in hmac_sha1_setkey
crypto: inside-secure - fix invalidation check in hmac_sha1_setkey
Pull networking fixes from David Miller:
"The pull requests are getting smaller, that's progress I suppose :-)
1) Fix infinite loop in CIPSO option parsing, from Yujuan Qi.
2) Fix remote checksum handling in VXLAN and GUE tunneling drivers,
from Koichiro Den.
3) Missing u64_stats_init() calls in several drivers, from Florian
Fainelli.
4) TCP can set the congestion window to an invalid ssthresh value
after congestion window reductions, from Yuchung Cheng.
5) Fix BPF jit branch generation on s390, from Daniel Borkmann.
6) Correct MIPS ebpf JIT merge, from David Daney.
7) Correct byte order test in BPF test_verifier.c, from Daniel
Borkmann.
8) Fix various crashes and leaks in ASIX driver, from Dean Jenkins.
9) Handle SCTP checksums properly in mlx4 driver, from Davide
Caratti.
10) We can potentially enter tcp_connect() with a cached route
already, due to fastopen, so we have to explicitly invalidate it.
11) skb_warn_bad_offload() can bark in legitimate situations, fix from
Willem de Bruijn"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
net: avoid skb_warn_bad_offload false positives on UFO
qmi_wwan: fix NULL deref on disconnect
ppp: fix xmit recursion detection on ppp channels
rds: Reintroduce statistics counting
tcp: fastopen: tcp_connect() must refresh the route
net: sched: set xt_tgchk_param par.net properly in ipt_init_target
net: dsa: mediatek: add adjust link support for user ports
net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets
qed: Fix a memory allocation failure test in 'qed_mcp_cmd_init()'
hysdn: fix to a race condition in put_log_buffer
s390/qeth: fix L3 next-hop in xmit qeth hdr
asix: Fix small memory leak in ax88772_unbind()
asix: Ensure asix_rx_fixup_info members are all reset
asix: Add rx->ax_skb = NULL after usbnet_skb_return()
bpf: fix selftest/bpf/test_pkt_md_access on s390x
netvsc: fix race on sub channel creation
bpf: fix byte order test in test_verifier
xgene: Always get clk source, but ignore if it's missing for SGMII ports
MIPS: Add missing file for eBPF JIT.
bpf, s390: fix build for libbpf and selftest suite
...
make -C tools/testing/selftests/futex/ run_tests doesn't run the futex
tests.
Running the tests when `dirname $(OUTPUT)` == $(PWD) doesn't work when
the $(OUTPUT) is $(PWD) which is the case when the test is run using
make -C tools/testing/selftests/futex/ run_tests.
Fixes: a8ba798bc8 ("selftests: enable O and KBUILD_OUTPUT")
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
When CPUs start and stop the watchdog, they manipulate shared data
that is normally protected by the lock. Other CPUs can be running
concurrently at this time, so it's a good idea to use locking here
to be on the safe side.
Remove the barrier which is undocumented and didn't do anything.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When the SMP detector finds other CPUs stuck, it iterates over
them and marks them as stuck. This pulls them out of the pending
mask and allows the detector to continue with remaining good
CPUs (if nmi_watchdog=panic is not enabled).
The code to dothat was buggy because when setting a CPU stuck,
if the pending mask became empty, it resets it to keep the
watchdog running. However the iterator will continue to run
over the new pending mask and mark remaining good CPUs sas stuck.
Fix this by doing it with cpumask bitwise operations.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When the watchdog decides to panic, it takes the lock and double
checks everything (to avoid races with the CPU being unstuck or
panic()ed by something else).
The exit label was misplaced and would result in all-CPUs backtrace
and watchdog panic even in the case that the condition was found to be
resolved.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some code can go into a tight loop calling touch_nmi_watchdog (e.g.,
stop_machine CPU hotplug code). This can cause contention on watchdog
locks particularly if all CPUs with watchdog enabled are spinning in
the loops.
Avoid this storm of activity by running the watchdog timer callback
from this path if we have exceeded the timer period since it was last
run.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
- Hard-disable interrupts before taking the lock, which prevents
soft-NMI re-entrancy and therefore can prevent deadlocks.
- Use raw_ variants of local_irq_disable to avoid irq debugging.
- When the lock is contended, spin at low SMT priority, using
loads only, and with interrupts enabled (where possible).
Some stalls have been noticed at high loads that go away with improved
locking. There should not be so much locking contention in the first
place (which is addressed in a subsequent patch), but locking should
still be improved.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When the NMI IPI lock is contended, spin at low SMT priority, using
loads only, and with interrupts enabled (where possible). This
improves behaviour under high contention (e.g., a system crash when
a number of CPUs are trying to enter the debugger).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In commit 05a4a95279 ("kernel/watchdog: split up config options"),
CONFIG_LOCKUP_DETECTOR was split into two separate config options,
CONFIG_HARDLOCKUP_DETECTOR and CONFIG_SOFTLOCKUP_DETECTOR.
Our defconfigs still have CONFIG_LOCKUP_DETECTOR=y, but that is no longer
user selectable, and we don't mention the new options, so we end up with
none of them enabled.
So update the defconfigs to turn on the new SOFT and HARD options, the
end result being the same as what we had previously.
Fixes: 05a4a95279 ("kernel/watchdog: split up config options")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>