[ Upstream commit a1fd1ad255 ]
ip_route_input_rcu expects the original ingress device (e.g., for
proper multicast handling). The skb->dev can be changed by l3mdev_ip_rcv,
so dev needs to be saved prior to calling it. This was the behavior prior
to the listify changes.
Fixes: 5fa12739a5 ("net: ipv4: listify ip_rcv_finish")
Cc: Edward Cree <ecree@solarflare.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit be48220edd ]
MPLS does not support nexthops with an MPLS address family.
Specifically, it does not handle RTA_GATEWAY attribute. Make it
clear by returning an error.
Fixes: 03c0566542 ("mpls: Netlink commands to add, remove, and dump routes")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e3818541b4 ]
IPv6 currently does not support nexthops outside of the AF_INET6 family.
Specifically, it does not handle RTA_VIA attribute. If it is passed
in a route add request, the actual route added only uses the device
which is clearly not what the user intended:
$ ip -6 ro add 2001:db8:2::/64 via inet 172.16.1.1 dev eth0
$ ip ro ls
...
2001:db8:2::/64 dev eth0 metric 1024 pref medium
Catch this and fail the route add:
$ ip -6 ro add 2001:db8:2::/64 via inet 172.16.1.1 dev eth0
Error: IPv6 does not support RTA_VIA attribute.
Fixes: 03c0566542 ("mpls: Netlink commands to add, remove, and dump routes")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b6e9e5df4e ]
IPv4 currently does not support nexthops outside of the AF_INET family.
Specifically, it does not handle RTA_VIA attribute. If it is passed
in a route add request, the actual route added only uses the device
which is clearly not what the user intended:
$ ip ro add 172.16.1.0/24 via inet6 2001:db8:1::1 dev eth0
$ ip ro ls
...
172.16.1.0/24 dev eth0
Catch this and fail the route add:
$ ip ro add 172.16.1.0/24 via inet6 2001:db8:1::1 dev eth0
Error: IPv4 does not support RTA_VIA attribute.
Fixes: 03c0566542 ("mpls: Netlink commands to add, remove, and dump routes")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 99e87f56b4 ]
Zero-copy callback flag is not yet set on frag list skb at the moment
xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
leaking grant ref mappings since xenvif_zerocopy_callback() is never
called for these fragments. Those eventually build up and cause Xen
to kill Dom0 as the slots get reused for new mappings:
"d0v0 Attempt to implicitly unmap a granted PTE c010000329fce005"
That behavior is observed under certain workloads where sudden spikes
of page cache writes coexist with active atomic skb allocations from
network traffic. Additionally, rework the logic to deal with frag_list
deallocation in a single place.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a2288d4e35 ]
Occasionally, during the disconnection procedure on XenBus which
includes hash cache deinitialization there might be some packets
still in-flight on other processors. Handling of these packets includes
hashing and hash cache population that finally results in hash cache
data structure corruption.
In order to avoid this we prevent hashing of those packets if there
are no queues initialized. In that case RCU protection of queues guards
the hash cache as well.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 71828b2240 ]
This patch moves setting of the current state into the loop. Otherwise
the task may end up in a busy wait loop if none of the break conditions
are met.
Signed-off-by: Timur Celik <mail@timurcelik.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bfd07f3dd4 ]
When sending multicast messages via blocking socket,
if sending link is congested (tsk->cong_link_cnt is set to 1),
the sending thread will be put into sleeping state. However,
tipc_sk_filter_rcv() is called under socket spin lock but
tipc_wait_for_cond() is not. So, there is no guarantee that
the setting of tsk->cong_link_cnt to 0 in tipc_sk_proto_rcv() in
CPU-1 will be perceived by CPU-0. If that is the case, the sending
thread in CPU-0 after being waken up, will continue to see
tsk->cong_link_cnt as 1 and put the sending thread into sleeping
state again. The sending thread will sleep forever.
CPU-0 | CPU-1
tipc_wait_for_cond() |
{ |
// condition_ = !tsk->cong_link_cnt |
while ((rc_ = !(condition_))) { |
... |
release_sock(sk_); |
wait_woken(); |
| if (!sock_owned_by_user(sk))
| tipc_sk_filter_rcv()
| {
| ...
| tipc_sk_proto_rcv()
| {
| ...
| tsk->cong_link_cnt--;
| ...
| sk->sk_write_space(sk);
| ...
| }
| ...
| }
sched_annotate_sleep(); |
lock_sock(sk_); |
remove_wait_queue(); |
} |
} |
This commit fixes it by adding memory barrier to tipc_sk_proto_rcv()
and tipc_wait_for_cond().
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ff7b11aa48 ]
Commit 9060cb719e ("net: crypto set sk to NULL when af_alg_release.")
fixed a use-after-free in sockfs_setattr() when an AF_ALG socket is
closed concurrently with fchownat(). However, it ignored that many
other proto_ops::release() methods don't set sock->sk to NULL and
therefore allow the same use-after-free:
- base_sock_release
- bnep_sock_release
- cmtp_sock_release
- data_sock_release
- dn_release
- hci_sock_release
- hidp_sock_release
- iucv_sock_release
- l2cap_sock_release
- llcp_sock_release
- llc_ui_release
- rawsock_release
- rfcomm_sock_release
- sco_sock_release
- svc_release
- vcc_release
- x25_release
Rather than fixing all these and relying on every socket type to get
this right forever, just make __sock_release() set sock->sk to NULL
itself after calling proto_ops::release().
Reproducer that produces the KASAN splat when any of these socket types
are configured into the kernel:
#include <pthread.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <unistd.h>
pthread_t t;
volatile int fd;
void *close_thread(void *arg)
{
for (;;) {
usleep(rand() % 100);
close(fd);
}
}
int main()
{
pthread_create(&t, NULL, close_thread, NULL);
for (;;) {
fd = socket(rand() % 50, rand() % 11, 0);
fchownat(fd, "", 1000, 1000, 0x1000);
close(fd);
}
}
Fixes: 86741ec254 ("net: core: Add a UID field to struct sock.")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d25ed413d5 ]
When debugging an issue I found implausible values in state->pause.
Reason in that state->pause isn't initialized and later only single
bits are changed. Also the struct itself isn't initialized in
phylink_resolve(). So better initialize state->pause and other
not yet initialized fields.
v2:
- use right function name in subject
v3:
- initialize additional fields
Fixes: 9525ae8395 ("phylink: add phylink infrastructure")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 232ba3a51c ]
With Micrel KSZ8061 PHY, the link may occasionally not come up after
Ethernet cable connect. The vendor's (Microchip, former Micrel) errata
sheet 80000688A.pdf descripes the problem and possible workarounds in
detail, see below.
The batch implements workaround 1, which permanently fixes the issue.
DESCRIPTION
Link-up may not occur properly when the Ethernet cable is initially
connected. This issue occurs more commonly when the cable is connected
slowly, but it may occur any time a cable is connected. This issue occurs
in the auto-negotiation circuit, and will not occur if auto-negotiation
is disabled (which requires that the two link partners be set to the
same speed and duplex).
END USER IMPLICATIONS
When this issue occurs, link is not established. Subsequent cable
plug/unplaug cycle will not correct the issue.
WORk AROUND
There are four approaches to work around this issue:
1. This issue can be prevented by setting bit 15 in MMD device address 1,
register 2, prior to connecting the cable or prior to setting the
Restart Auto-negotiation bit in register 0h. The MMD registers are
accessed via the indirect access registers Dh and Eh, or via the Micrel
EthUtil utility as shown here:
. if using the EthUtil utility (usually with a Micrel KSZ8061
Evaluation Board), type the following commands:
> address 1
> mmd 1
> iw 2 b61a
. Alternatively, write the following registers to write to the
indirect MMD register:
Write register Dh, data 0001h
Write register Eh, data 0002h
Write register Dh, data 4001h
Write register Eh, data B61Ah
2. The issue can be avoided by disabling auto-negotiation in the KSZ8061,
either by the strapping option, or by clearing bit 12 in register 0h.
Care must be taken to ensure that the KSZ8061 and the link partner
will link with the same speed and duplex. Note that the KSZ8061
defaults to full-duplex when auto-negotiation is off, but other
devices may default to half-duplex in the event of failed
auto-negotiation.
3. The issue can be avoided by connecting the cable prior to powering-up
or resetting the KSZ8061, and leaving it plugged in thereafter.
4. If the above measures are not taken and the problem occurs, link can
be recovered by setting the Restart Auto-Negotiation bit in
register 0h, or by resetting or power cycling the device. Reset may
be either hardware reset or software reset (register 0h, bit 15).
PLAN
This errata will not be corrected in the future revision.
Fixes: 7ab59dc15e ("drivers/net/phy/micrel_phy: Add support for new PHYs")
Signed-off-by: Alexander Onnasch <alexander.onnasch@landisgyr.com>
Signed-off-by: Rajasingh Thavamani <T.Rajasingh@landisgyr.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5845f70638 ]
It can be reproduced by following steps:
1. virtio_net NIC is configured with gso/tso on
2. configure nginx as http server with an index file bigger than 1M bytes
3. use tc netem to produce duplicate packets and delay:
tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90%
4. continually curl the nginx http server to get index file on client
5. BUG_ON is seen quickly
[10258690.371129] kernel BUG at net/core/skbuff.c:4028!
[10258690.371748] invalid opcode: 0000 [#1] SMP PTI
[10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.0.0-rc6 #2
[10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
[10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
[10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
[10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
[10258690.372094] FS: 0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
[10258690.372094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
[10258690.372094] Call Trace:
[10258690.372094] <IRQ>
[10258690.372094] skb_to_sgvec+0x11/0x40
[10258690.372094] start_xmit+0x38c/0x520 [virtio_net]
[10258690.372094] dev_hard_start_xmit+0x9b/0x200
[10258690.372094] sch_direct_xmit+0xff/0x260
[10258690.372094] __qdisc_run+0x15e/0x4e0
[10258690.372094] net_tx_action+0x137/0x210
[10258690.372094] __do_softirq+0xd6/0x2a9
[10258690.372094] irq_exit+0xde/0xf0
[10258690.372094] smp_apic_timer_interrupt+0x74/0x140
[10258690.372094] apic_timer_interrupt+0xf/0x20
[10258690.372094] </IRQ>
In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's
linear data size and nonlinear data size, thus BUG_ON triggered.
Because the skb is cloned and a part of nonlinear data is split off.
Duplicate packet is cloned in netem_enqueue() and may be delayed
some time in qdisc. When qdisc len reached the limit and returns
NET_XMIT_DROP, the skb will be retransmit later in write queue.
the skb will be fragmented by tso_fragment(), the limit size
that depends on cwnd and mss decrease, the skb's nonlinear
data will be split off. The length of the skb cloned by netem
will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(),
the BUG_ON trigger.
To fix it, netem returns NET_XMIT_SUCCESS to upper stack
when it clones a duplicate packet.
Fixes: 35d889d1 ("sch_netem: fix skb leak in netem_enqueue()")
Signed-off-by: Sheng Lan <lansheng@huawei.com>
Reported-by: Qin Ji <jiqin.ji@huawei.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5578de4834 ]
There are two array out-of-bounds memory accesses, one in
cipso_v4_map_lvl_valid(), the other in netlbl_bitmap_walk(). Both
errors are embarassingly simple, and the fixes are straightforward.
As a FYI for anyone backporting this patch to kernels prior to v4.8,
you'll want to apply the netlbl_bitmap_walk() patch to
cipso_v4_bitmap_walk() as netlbl_bitmap_walk() doesn't exist before
Linux v4.8.
Reported-by: Jann Horn <jannh@google.com>
Fixes: 446fda4f26 ("[NetLabel]: CIPSOv4 engine")
Fixes: 3faa8f982f ("netlabel: Move bitmap manipulation functions to the NetLabel core.")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6e46e2d821 ]
The switch maintains u64 counters for the number of octets sent and
received. These are kept as two u32's which need to be combined. Fix
the combing, which wrongly worked on u16's.
Fixes: 80c4627b27 ("dsa: mv88x6xxx: Refactor getting a single statistic")
Reported-by: Chris Healy <Chris.Healy@zii.aero>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 90490ef726 ]
It has been observed that tx queue stalls while downloading
from certain web sites (example www.speedtest.net)
The cause has been tracked down to a corner case where
dma descriptors where not setup properly. And there for a tx
completion interrupt was not signaled.
This fix corrects the problem by properly marking the end of
a multi descriptor transmission.
Fixes: 23f0703c12 ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5e1a99eae8 ]
For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers.
But for ip -6 route, currently we only support tcp, udp and icmp.
Add ICMPv6 support so we can match ipv6-icmp rules for route lookup.
v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to
rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family.
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: eacb9384a3 ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bf48648d65 ]
Incoming packets may have IP header checksum verified by the host.
They may not have IP header checksum computed after coalescing.
This patch re-compute the checksum when necessary, otherwise the
packets may be dropped, because Linux network stack always checks it.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cf1c9ccba7 ]
When IPv6 is compiled but disabled at runtime, geneve_sock_add returns
-EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
operation of bringing up the tunnel.
Ignore failure of IPv6 socket creation for metadata based tunnels caused by
IPv6 not being available.
This is the same fix as what commit d074bf9600 ("vxlan: correctly handle
ipv6.disable module parameter") is doing for vxlan.
Note there's also commit c0a47e44c0 ("geneve: should not call rt6_lookup()
when ipv6 was disabled") which fixes a similar issue but for regular
tunnels, while this patch is needed for metadata based tunnels.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2b3c688538 ]
There have been reports of oversize UDP packets being sent to the
driver to be transmitted, causing error conditions. The issue is
likely caused by the dst of the SKB switching between 'lo' with
64K MTU and the hardware device with a smaller MTU. Patches are
being proposed by Mahesh Bandewar <maheshb@google.com> to fix the
issue.
In the meantime, add a quick length check in the driver to prevent
the error. The driver uses the TX packet size as index to look up an
array to setup the TX BD. The array is large enough to support all MTU
sizes supported by the driver. The oversize TX packet causes the
driver to index beyond the array and put garbage values into the
TX BD. Add a simple check to prevent this.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0e63208915 ]
Fix regression bug introduced in
commit 365ad353c2 ("tipc: reduce risk of user starvation during link
congestion")
Only signal -EDESTADDRREQ for RDM/DGRAM if we don't have a cached
sockaddr.
Fixes: 365ad353c2 ("tipc: reduce risk of user starvation during link congestion")
Signed-off-by: Erik Hugne <erik.hugne@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b33b7cd6fd ]
Some sky2 chips fire IRQ after S3, before the driver is fully resumed:
[ 686.804877] do_IRQ: 1.37 No irq handler for vector
This is likely a platform bug that device isn't fully quiesced during
S3. Use MSI-X, maskable MSI or INTx can prevent this issue from
happening.
Since MSI-X and maskable MSI are not supported by this device, fallback
to use INTx on affected platforms.
BugLink: https://bugs.launchpad.net/bugs/1807259
BugLink: https://bugs.launchpad.net/bugs/1809843
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 901efe1231 ]
The user msg is also copied to the abort packet when doing SCTP_ABORT in
sctp_sendmsg_check_sflags(). When SCTP_SENDALL is set, iov_iter_revert()
should have been called for sending abort on the next asoc with copying
this msg. Otherwise, memcpy_from_msg() in sctp_make_abort_user() will
fail and return error.
Fixes: 4910280503 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 822e44b45e ]
Quectel EG12 (module)/EM12 (M.2 card) is a Cat. 12 LTE modem. The modem
behaves in the same way as the EP06, so the "set DTR"-quirk must be
applied and the diagnostic-interface check performed. Since the
diagnostic-check now applies to more modems, I have renamed the function
from quectel_ep06_diag_detected() to quectel_diag_detected().
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 46b1c18f9d ]
In the series fc8b81a598 ("Merge branch 'lockless-qdisc-series'")
John made the assumption that the data path had no need to read
the qdisc qlen (number of packets in the qdisc).
It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
children.
But pfifo_fast can be used as leaf in class full qdiscs, and existing
logic needs to access the child qlen in an efficient way.
HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
htb_activate() -> WARN_ON()
htb_dequeue_tree() to decide if a class can be htb_deactivated
when it has no more packets.
HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
qdisc_tree_reduce_backlog() also read q.qlen directly.
Using qdisc_qlen_sum() (which iterates over all possible cpus)
in the data path is a non starter.
It seems we have to put back qlen in a central location,
at least for stable kernels.
For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock,
so the existing q.qlen{++|--} are correct.
For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}()
because the spinlock might be not held (for example from
pfifo_fast_enqueue() and pfifo_fast_dequeue())
This patch adds atomic_qlen (in the same location than qlen)
and renames the following helpers, since we want to express
they can be used without qdisc lock, and that qlen is no longer percpu.
- qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec()
- qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc()
Later (net-next) we might revert this patch by tracking all these
qlen uses and replace them by a more efficient method (not having
to access a precise qlen, but an empty/non_empty status that might
be less expensive to maintain/track).
Another possibility is to have a legacy pfifo_fast version that would
be used when used a a child qdisc, since the parent qdisc needs
a spinlock anyway. But then, future lockless qdiscs would also
have the same problem.
Fixes: 7e66016f2c ("net: sched: helpers to sum qlen and qlen for per cpu logic")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 95150f29ae ]
Ports 9 and 10 don't have internal PHY's but are (dependent on the
version) SERDES/SGMII/XAUI/RXAUI ports.
v2:
- fix it for all 88E6x90 family members
Fixes: bc3931557d ("net: dsa: mv88e6xxx: Add number of internal PHYs")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c6195a8bdf ]
When testing another issue I faced the problem that
mv88e6xxx_port_setup_mac() failed due to DUPLEX_UNKNOWN being passed
as argument to mv88e6xxx_port_set_duplex(). We should handle this case
gracefully and return -EOPNOTSUPP, like e.g. mv88e6xxx_port_set_speed()
is doing it.
Fixes: 7f1ae07b51 ("net: dsa: mv88e6xxx: add port duplex setter")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 87c11f1ddb ]
Similar to commit 44f49dd8b5 ("ipmr: fix possible race resulting from
improper usage of IP_INC_STATS_BH() in preemptible context."), we cannot
assume preemption is disabled when incrementing the counter and
accessing a per-CPU variable.
Preemption can be enabled when we add a route in process context that
corresponds to packets stored in the unresolved queue, which are then
forwarded using this route [1].
Fix this by using IP6_INC_STATS() which takes care of disabling
preemption on architectures where it is needed.
[1]
[ 157.451447] BUG: using __this_cpu_add() in preemptible [00000000] code: smcrouted/2314
[ 157.460409] caller is ip6mr_forward2+0x73e/0x10e0
[ 157.460434] CPU: 3 PID: 2314 Comm: smcrouted Not tainted 5.0.0-rc7-custom-03635-g22f2712113f1 #1336
[ 157.460449] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[ 157.460461] Call Trace:
[ 157.460486] dump_stack+0xf9/0x1be
[ 157.460553] check_preemption_disabled+0x1d6/0x200
[ 157.460576] ip6mr_forward2+0x73e/0x10e0
[ 157.460705] ip6_mr_forward+0x9a0/0x1510
[ 157.460771] ip6mr_mfc_add+0x16b3/0x1e00
[ 157.461155] ip6_mroute_setsockopt+0x3cb/0x13c0
[ 157.461384] do_ipv6_setsockopt.isra.8+0x348/0x4060
[ 157.462013] ipv6_setsockopt+0x90/0x110
[ 157.462036] rawv6_setsockopt+0x4a/0x120
[ 157.462058] __sys_setsockopt+0x16b/0x340
[ 157.462198] __x64_sys_setsockopt+0xbf/0x160
[ 157.462220] do_syscall_64+0x14d/0x610
[ 157.462349] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: 0912ea38de ("[IPV6] MROUTE: Add stats in multicast routing module method ip6_mr_forward().")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ecd182cbf4 upstream.
ashmem_pin() is calling range_shrink() without checking whether
range_alloc() succeeded. Also, doing memory allocation with ashmem_mutex
held should be avoided because ashmem_shrink_scan() tries to hold it.
Therefore, move memory allocation for range_alloc() to ashmem_pin_unpin()
and make range_alloc() not to fail.
This patch is mostly meant for backporting purpose for fuzz testing on
stable/distributor kernels, for there is a plan to remove this code in
near future.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: stable@vger.kernel.org
Reviewed-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 479826cc86 upstream.
Add missing break statement in order to prevent the code from falling
through to the default case and return -EINVAL every time.
This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.
Fixes: aa94f28888 ("staging: comedi: ni_660x: tidy up ni_660x_set_pfi_routing()")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit af692e117c upstream.
This patch resolves the following page use-after-free issue,
z_erofs_vle_unzip:
...
for (i = 0; i < nr_pages; ++i) {
...
z_erofs_onlinepage_endio(page); (1)
}
for (i = 0; i < clusterpages; ++i) {
page = compressed_pages[i];
if (page->mapping == mngda) (2)
continue;
/* recycle all individual staging pages */
(void)z_erofs_gather_if_stagingpage(page_pool, page); (3)
WRITE_ONCE(compressed_pages[i], NULL);
}
...
After (1) is executed, page is freed and could be then reused, if
compressed_pages is scanned after that, it could fall info (2) or
(3) by mistake and that could finally be in a mess.
This patch aims to solve the above issue only with little changes
as much as possible in order to make the fix backport easier.
Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e5ceeab69 upstream.
Considering a read request with two decompressed file pages,
If a decompression work cannot be started on the previous page
due to memory pressure but in-memory LTP map lookup is done,
builder->work should be still NULL.
Moreover, if the current page also belongs to the same map,
it won't try to start the decompression work again and then
run into trouble.
This patch aims to solve the above issue only with little changes
as much as possible in order to make the fix backport easier.
kernel message is:
<4>[1051408.015930s]SLUB: Unable to allocate memory on node -1, gfp=0x2408040(GFP_NOFS|__GFP_ZERO)
<4>[1051408.015930s] cache: erofs_compress, object size: 144, buffer size: 144, default order: 0, min order: 0
<4>[1051408.015930s] node 0: slabs: 98, objs: 2744, free: 0
* Cannot allocate the decompression work
<3>[1051408.015960s]erofs: z_erofs_vle_normalaccess_readpages, readahead error at page 1008 of nid 5391488
* Note that the previous page was failed to read
<0>[1051408.015960s]Internal error: Accessing user space memory outside uaccess.h routines: 96000005 [#1] PREEMPT SMP
...
<4>[1051408.015991s]Hardware name: kirin710 (DT)
...
<4>[1051408.016021s]PC is at z_erofs_vle_work_add_page+0xa0/0x17c
<4>[1051408.016021s]LR is at z_erofs_do_read_page+0x12c/0xcf0
...
<4>[1051408.018096s][<ffffff80c6fb0fd4>] z_erofs_vle_work_add_page+0xa0/0x17c
<4>[1051408.018096s][<ffffff80c6fb3814>] z_erofs_vle_normalaccess_readpages+0x1a0/0x37c
<4>[1051408.018096s][<ffffff80c6d670b8>] read_pages+0x70/0x190
<4>[1051408.018127s][<ffffff80c6d6736c>] __do_page_cache_readahead+0x194/0x1a8
<4>[1051408.018127s][<ffffff80c6d59318>] filemap_fault+0x398/0x684
<4>[1051408.018127s][<ffffff80c6d8a9e0>] __do_fault+0x8c/0x138
<4>[1051408.018127s][<ffffff80c6d8f90c>] handle_pte_fault+0x730/0xb7c
<4>[1051408.018127s][<ffffff80c6d8fe04>] __handle_mm_fault+0xac/0xf4
<4>[1051408.018157s][<ffffff80c6d8fec8>] handle_mm_fault+0x7c/0x118
<4>[1051408.018157s][<ffffff80c8c52998>] do_page_fault+0x354/0x474
<4>[1051408.018157s][<ffffff80c8c52af8>] do_translation_fault+0x40/0x48
<4>[1051408.018157s][<ffffff80c6c002f4>] do_mem_abort+0x80/0x100
<4>[1051408.018310s]---[ end trace 9f4009a3283bd78b ]---
Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dd9d3d86b0 upstream.
Here is how this device appears in kernel log:
usb 3-1: new full-speed USB device number 18 using xhci_hcd
usb 3-1: New USB device found, idVendor=0b00, idProduct=3070
usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 3-1: Product: Ingenico 3070
usb 3-1: Manufacturer: Silicon Labs
usb 3-1: SerialNumber: 0001
Apparently this is a POS terminal with embedded USB-to-Serial converter.
Cc: stable@vger.kernel.org
Signed-off-by: Ivan Mironov <mironov.ivan@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a112152f6f upstream.
EROFS has an optimized path called TAIL merging, which is designed
to merge multiple reads and the corresponding decompressions into
one if these requests read continuous pages almost at the same time.
In general, it behaves as follows:
________________________________________________________________
... | TAIL . HEAD | PAGE | PAGE | TAIL . HEAD | ...
_____|_combined page A_|________|________|_combined page B_|____
1 ] -> [ 2 ] -> [ 3
If the above three reads are requested in the order 1-2-3, it will
generate a large work chain rather than 3 individual work chains
to reduce scheduling overhead and boost up sequential read.
However, if Read 2 is processed slightly earlier than Read 1,
currently it still generates 2 individual work chains (chain 1, 2)
but it does in-place decompression for combined page A, moreover,
if chain 2 decompresses ahead of chain 1, it will be a race and
lead to corrupted decompressed page. This patch fixes it.
Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 625c85a62c upstream.
The cpufreq_global_kobject is created using kobject_create_and_add()
helper, which assigns the kobj_type as dynamic_kobj_ktype and show/store
routines are set to kobj_attr_show() and kobj_attr_store().
These routines pass struct kobj_attribute as an argument to the
show/store callbacks. But all the cpufreq files created using the
cpufreq_global_kobject expect the argument to be of type struct
attribute. Things work fine currently as no one accesses the "attr"
argument. We may not see issues even if the argument is used, as struct
kobj_attribute has struct attribute as its first element and so they
will both get same address.
But this is logically incorrect and we should rather use struct
kobj_attribute instead of struct global_attr in the cpufreq core and
drivers and the show/store callbacks should take struct kobj_attribute
as argument instead.
This bug is caught using CFI CLANG builds in android kernel which
catches mismatch in function prototypes for such callbacks.
Reported-by: Donghee Han <dh.han@samsung.com>
Reported-by: Sangkyu Kim <skwith.kim@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2a418cf3f5 upstream.
When calling __put_user(foo(), ptr), the __put_user() macro would call
foo() in between __uaccess_begin() and __uaccess_end(). If that code
were buggy, then those bugs would be run without SMAP protection.
Fortunately, there seem to be few instances of the problem in the
kernel. Nevertheless, __put_user() should be fixed to avoid doing this.
Therefore, evaluate __put_user()'s argument before setting AC.
This issue was noticed when an objtool hack by Peter Zijlstra complained
about genregs_get() and I compared the assembly output to the C source.
[ bp: Massage commit message and fixed up whitespace. ]
Fixes: 11f1a4b975 ("x86: reorganize SMAP handling in user space accesses")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20190225125231.845656645@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1a2930d8a upstream.
The MIPS eBPF JIT calls flush_icache_range() in order to ensure the
icache observes the code that we just wrote. Unfortunately it gets the
end address calculation wrong due to some bad pointer arithmetic.
The struct jit_ctx target field is of type pointer to u32, and as such
adding one to it will increment the address being pointed to by 4 bytes.
Therefore in order to find the address of the end of the code we simply
need to add the number of 4 byte instructions emitted, but we mistakenly
add the number of instructions multiplied by 4. This results in the call
to flush_icache_range() operating on a memory region 4x larger than
intended, which is always wasteful and can cause crashes if we overrun
into an unmapped page.
Fix this by correcting the pointer arithmetic to remove the bogus
multiplication, and use braces to remove the need for a set of brackets
whilst also making it obvious that the target field is a pointer.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: b6bd53f9c4 ("MIPS: Add missing file for eBPF JIT.")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>