To make IPv6 more in line with IPv4 we need to be able to respond
differently to different netdev events. For example, when a netdev is
unregistered all the routes using it as their nexthop device should be
flushed, whereas when the netdev's carrier changes only the 'linkdown'
flag should be toggled.
Currently, this is not possible, as the function that traverses the
routing tables is not aware of the triggering event.
Propagate the triggering event down, so that it could be used in later
patches.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previous patch marked nexthops with the 'dead' and 'linkdown' flags.
Clear these flags when the netdev comes back up.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a netdev is put administratively down or unregistered all the
nexthops using it as their nexthop device should be marked with the
'dead' and 'linkdown' flags.
Currently, when a route is dumped its nexthop device is tested and the
flags are set accordingly. A similar check is performed during route
lookup.
Instead, we can simply mark the nexthops based on netdev events and
avoid checking the netdev's state during route dump and lookup.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By the time fib6_net_exit() is executed all the netdevs in the namespace
have been either unregistered or pushed back to the default namespace.
That is because pernet subsys operations are always ordered before
pernet device operations and therefore invoked after them during
namespace dismantle.
Thus, all the routing tables in the namespace are empty by the time
fib6_net_exit() is invoked and the call to rt6_ifdown() can be removed.
This allows us to simplify the condition in fib6_ifdown() as it's only
ever called with an actual netdev.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-01-07
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add a start of a framework for extending struct xdp_buff without
having the overhead of populating every data at runtime. Idea
is to have a new per-queue struct xdp_rxq_info that holds read
mostly data (currently that is, queue number and a pointer to
the corresponding netdev) which is set up during rxqueue config
time. When a XDP program is invoked, struct xdp_buff holds a
pointer to struct xdp_rxq_info that the BPF program can then
walk. The user facing BPF program that uses struct xdp_md for
context can use these members directly, and the verifier rewrites
context access transparently by walking the xdp_rxq_info and
net_device pointers to load the data, from Jesper.
2) Redo the reporting of offload device information to user space
such that it works in combination with network namespaces. The
latter is reported through a device/inode tuple as similarly
done in other subsystems as well (e.g. perf) in order to identify
the namespace. For this to work, ns_get_path() has been generalized
such that the namespace can be retrieved not only from a specific
task (perf case), but also from a callback where we deduce the
netns (ns_common) from a netdevice. bpftool support using the new
uapi info and extensive test cases for test_offload.py in BPF
selftests have been added as well, from Jakub.
3) Add two bpftool improvements: i) properly report the bpftool
version such that it corresponds to the version from the kernel
source tree. So pick the right linux/version.h from the source
tree instead of the installed one. ii) fix bpftool and also
bpf_jit_disasm build with bintutils >= 2.9. The reason for the
build breakage is that binutils library changed the function
signature to select the disassembler. Given this is needed in
multiple tools, add a proper feature detection to the
tools/build/features infrastructure, from Roman.
4) Implement the BPF syscall command BPF_MAP_GET_NEXT_KEY for the
stacktrace map. It is currently unimplemented, but there are
use cases where user space needs to walk all stacktrace map
entries e.g. for dumping or deleting map entries w/o having to
close and recreate the map. Add BPF selftests along with it,
from Yonghong.
5) Few follow-up cleanups for the bpftool cgroup code: i) rename
the cgroup 'list' command into 'show' as we have it for other
subcommands as well, ii) then alias the 'show' command such that
'list' is accepted which is also common practice in iproute2,
and iii) remove couple of newlines from error messages using
p_err(), from Jakub.
6) Two follow-up cleanups to sockmap code: i) remove the unused
bpf_compute_data_end_sk_skb() function and ii) only build the
sockmap infrastructure when CONFIG_INET is enabled since it's
only aware of TCP sockets at this time, from John.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The RISC-V port doesn't suport a nommu mode, so there is no reason
to provide some code only under a CONFIG_MMU ifdef.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
We were hoping to avoid making this visible to userspace, but it looks
like we're going to have to because QEMU's user-mode emulation doesn't
want to emulate a vDSO. Having vDSO-only system calls was a bit
unothodox anyway, so I think in this case it's OK to just make the
actual system call number public.
This patch simply moves the definition of __NR_riscv_flush_icache
availiable to userspace, which results in the deletion of the now empty
vdso-syscalls.h.
Changes since v1:
* I've moved the definition into uapi/asm/syscalls.h rathen than
uapi/asm/unistd.h. This allows me to keep asm/unistd.h, so we can
keep the syscall table macros sane.
* As a side effect of the above, this no longer disables all system
calls on RISC-V. Whoops!
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
This patch provides a basic defconfig for the RISC-V
architecture that enables enough kernel features to run a
basic Linux distribution on qemu's "virt" board for native
software development. Features include:
- serial console
- virtio block and network device support
- VFAT and ext2/3/4 filesystem support
- NFS client and NFS rootfs support
- an assortment of other kernel features required for
running systemd
It also enables a number of drivers for physical hardware
that target the "SiFive U500" SoC and the corresponding
development platform. These include:
- PCIe host controller support for the FPGA-based U500
development platform (PCIE_XILINX)
- USB host controller support (OHCI/EHCI/XHCI)
- USB HID (keyboard/mouse) support
- USB mass storage support (bulk and UAS)
- SATA support (AHCI)
- ethernet drivers (MACB for a SoC-internal MAC block, microsemi
ethernet phy, E1000E and R8169 for PCIe-connected external devices)
- DRM and framebuffer console support for PCIe-connected
Radeon graphics chips
Signed-off-by: Karsten Merker <merker@debian.org>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
When allocation of underlying block for a page fault fails, we fail the
fault with SIGBUS. However we may well hit ENOSPC just due to lots of
free blocks being held by the running / committing transaction. So
propagate the error from ext4_iomap_begin() and implement do standard
allocation retry loop in ext4_dax_huge_fault().
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Ext4 needs to pass through error from its iomap handler to the page
fault handler so that it can properly detect ENOSPC and force
transaction commit and retry the fault (and block allocation). Add
argument to dax_iomap_fault() for passing such error.
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This reverts commit d5dabd6339.
This patch did absolutely nothing, because ->c_entry_count is unsigned.
In addition if there is a bug in how mbcache maintains its entry count,
it needs to be fixed, not just hacked around. (There is no obvious bug,
though.)
Cc: Jan Kara <jack@suse.cz>
Cc: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull parisc fixes from Helge Deller:
- Many small fixes to show the real physical addresses of devices
instead of hashed addresses.
- One important fix to unbreak 32-bit SMP support: We forgot to 16-byte
align the spinlocks in the assembler code.
- Qemu support: The host will get a chance to sleep when the parisc
guest is idle. We use the same mechanism as the power architecture by
overlaying the "or %r10,%r10,%r10" instruction which is simply a nop
on real hardware.
* 'parisc-4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: qemu idle sleep support
parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
parisc: Show unhashed EISA EEPROM address
parisc: Show unhashed HPA of Dino chip
parisc: Show initial kernel memory layout unhashed
parisc: Show unhashed hardware inventory
Pull apparmor fix from John Johansen:
"This fixes a regression when the kernel feature set is reported as
supporting mount and policy is pinned to a feature set that does not
support mount mediation"
* tag 'apparmor-pr-2018-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: fix regression in mount mediation when feature set is pinned
Pull LED fix from Jacek Anaszewski:
"The commit 2b83ff96f5 for 4.15-rc6, which was fixing LED brightness
setting after clearing delay_off broke the behavior on any alteration
of delay_on{off} properties, due to use of a LED core helper that does
too much for this particular case"
* tag 'led_fixes_for_4.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: core: Fix regression caused by commit 2b83ff96f5
Pull MTD bugfix from Richard Weinberger:
"A single fix for the pxa3xx NAND driver"
* tag 'for-linus-20180107' of git://git.infradead.org/linux-mtd:
mtd: nand: pxa3xx: Fix READOOB implementation
similar to memdup_user(), but does *not* guarantee that result will
be physically contiguous; use only in cases where that's not a requirement
and free it with kvfree().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This driver creates a number of const structures that it stores in the
data field of an of_device_id array.
The data field of an of_device_id structure has type const void *, so
there is no need for a const-discarding cast when putting const values
into such a structure.
Done using Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
Cadence QSPI controller provides direct access mode through which flash
can be accessed in a memory-mapped IO mode. This enables read/write to
flash using memcpy*() functions. This mode provides higher throughput
for both read/write operations when compared to current indirect mode of
operation.
This patch therefore adds support to use QSPI in direct mode. If the
window reserved in SoC's memory map for MMIO access is less that of
flash size(like on most SoCFPGA variants), then the driver falls back
to indirect mode of operation.
On TI's 66AK2G SoC, with ARM running at 600MHz and QSPI at 96MHz
switching to direct mode improves read throughput from 3MB/s to 8MB/s.
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
Move configuring of indirect read/write start address to
cqspi_indirect_*_execute() function and rename cqspi_indirect_*_setup()
function. This will help to reuse cqspi_indirect_*_setup() function for
supporting direct access mode.
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
Commit 2b83ff96f5 ("led: core: Fix brightness setting when setting delay_off=0")
replaced del_timer_sync(&led_cdev->blink_timer) with led_stop_software_blink()
in led_blink_set(), which additionally clears LED_BLINK_SW flag as well as
zeroes blink_delay_on and blink_delay_off properties of the struct led_classdev.
Cleansing of the latter ones wasn't required to fix the original issue but
wasn't considered harmful. It nonetheless turned out to be so in case when
pointer to one or both props is passed to led_blink_set() like in the
ledtrig-timer.c. In such cases zeroes are passed later in delay_on and/or
delay_off arguments to led_blink_setup(), which results either in stopping
the software blinking or setting blinking frequency always to 1Hz.
Avoid using led_stop_software_blink() and add a single call required
to clear LED_BLINK_SW flag, which was the only needed modification to
fix the original issue.
Fixes 2b83ff96f5 ("led: core: Fix brightness setting when setting delay_off=0")
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
According to the comment added to exynos_dt_pmu_match[] in commit
8b283c0254 ("ARM: exynos4/5: convert pmu wakeup to stacked domains"),
the RTC is not able to wake up the system through the PMU on Exynos5410,
unlike Exynos5420.
However, when the RTC DT node got added, it was a straight copy of
the Exynos5420 node, which now causes a warning from dtc.
This removes the incorrect interrupt-parent, which should get the
interrupt working and avoid the warning.
Fixes: e1e146b1b0 ("ARM: dts: exynos: Add RTC and I2C to Exynos5410")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
This reverts commit 6737b08140.
Unlike on Exynos5420-family, on Exynos5410 the PMU is not an interrupt
controller so it should not handle interrupts of RTC. The DTC warning
(addressed by mentioned commit) should be fixed by not routing RTC
interrupts to PMU.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Pull vfs fixes from Al Viro:
- untangle sys_close() abuses in xt_bpf
- deal with register_shrinker() failures in sget()
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix "netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'"
sget(): handle failures of register_shrinker()
mm,vmscan: Make unregister_shrinker() no-op if register_shrinker() failed.
Pull KVM fixes from Radim Krčmář:
"s390:
- Two fixes for potential bitmap overruns in the cmma migration code
x86:
- Clear guest provided GPRs to defeat the Project Zero PoC for CVE
2017-5715"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: vmx: Scrub hardware GPRs at VM-exit
KVM: s390: prevent buffer overrun on memory hotplug during migration
KVM: s390: fix cmma migration for multiple memory slots
syzkaller triggered OOM kills by passing ipt_replace.size = -1
to IPT_SO_SET_REPLACE. The root cause is that SMP_ALIGN() in
xt_alloc_table_info() causes int overflow and the size check passes
when it should not. SMP_ALIGN() is no longer needed leftover.
Remove SMP_ALIGN() call in xt_alloc_table_info().
Reported-by: syzbot+4396883fa8c4f64e0175@syzkaller.appspotmail.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
since commit 82abbf8d2f the verifier rejects the bit-wise
arithmetic on pointers earlier.
The test 'dubious pointer arithmetic' now has less output to match on.
Adjust it.
Fixes: 82abbf8d2f ("bpf: do not allow root to mangle valid pointers")
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add psock NULL check to handle a racing sock event that can get the
sk_callback_lock before this case but after xchg happens causing the
refcnt to hit zero and sock user data (psock) to be null and queued
for garbage collection.
Also add a comment in the code because this is a bit subtle and
not obvious in my opinion.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Yonghong Song says:
====================
The patch set implements bpf syscall command BPF_MAP_GET_NEXT_KEY
for stacktrace map. Patch #1 is the core implementation
and Patch #2 implements a bpf test at tools/testing/selftests/bpf
directory. Please see individual patch comments for details.
Changelog:
v1 -> v2:
- For invalid key (key pointer is non-NULL), sets next_key to be the first valid key.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Added a bpf selftest in test_progs at tools directory for stacktrace.
The test will populate a hashtable map and a stacktrace map
at the same time with the same key, stackid.
The user space will compare both maps, using BPF_MAP_LOOKUP_ELEM
command and BPF_MAP_GET_NEXT_KEY command, to ensure that both have
the same set of keys.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not
supported for stacktrace map. However, there are use cases where
user space wants to enumerate all stacktrace map entries where
BPF_MAP_GET_NEXT_KEY command will be really helpful.
In addition, if user space wants to delete all map entries
in order to save memory and does not want to close the
map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve
performance if map entries are sparsely populated.
The implementation has similar behavior for
BPF_MAP_GET_NEXT_KEY implementation in hashtab. If user provides
a NULL key pointer or an invalid key, the first key is returned.
Otherwise, the first valid key after the input parameter "key"
is returned, or -ENOENT if no valid key can be found.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In the current driver, OOB bytes are accessed in raw mode, and when a
page access is done with NDCR_SPARE_EN set and NDCR_ECC_EN cleared, the
driver must read the whole spare area (64 bytes in case of a 2k page,
16 bytes for a 512 page). The driver was only reading the free OOB
bytes, which was leaving some unread data in the FIFO and was somehow
leading to a timeout.
We could patch the driver to read ->spare_size + ->ecc_size instead of
just ->spare_size when READOOB is requested, but we'd better make
in-band and OOB accesses consistent.
Since the driver is always accessing in-band data in non-raw mode (with
the ECC engine enabled), we should also access OOB data in this mode.
That's particularly useful when using the BCH engine because in this
mode the free OOB bytes are also ECC protected.
Fixes: 43bcfd2bb2 ("mtd: nand: pxa3xx: Add driver-specific ECC BCH support")
Cc: stable@vger.kernel.org
Reported-by: Sean Nyekjær <sean.nyekjaer@prevas.dk>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Tested-by: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Richard Weinberger <richard@nod.at>
eventfd_ctx_get() is not used outside of eventfd.c, so unexport it and
fold it into eventfd_ctx_fileget().
(eventfd_ctx_get() was apparently added years ago for KVM irqfd's, but
was never used.)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
eventfd_ctx_read() is not used outside of eventfd.c, so unexport it and
fold it into eventfd_read(). This slightly simplifies the code and
makes it more analogous to eventfd_write().
(eventfd_ctx_read() was apparently added years ago for KVM irqfd's, but
was never used.)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Nothing actually calls eventfd_file_create() besides the eventfd2()
system call itself. So simplify things by folding it into the system
call and using anon_inode_getfd() instead of anon_inode_getfile(). This
removes over 40 lines with no change in functionality.
(eventfd_file_create() was apparently added years ago for KVM irqfd's,
but was never used.)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>