Commit Graph

28869 Commits

Author SHA1 Message Date
Wengang Wang
6c3498f9ee rds: rds_ib_device.refcount overflow
commit 4fabb59449 upstream.

Fixes: 3e0249f9c0 ("RDS/IB: add refcount tracking to struct rds_ib_device")

There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.

A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).

115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
118         if (atomic_dec_and_test(&rds_ibdev->refcount))
119                 queue_work(rds_wq, &rds_ibdev->free_work);
120 }

fix is to drop refcount when rds_ib_alloc_fmr failed.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-10 12:20:31 -07:00
Tom Hughes
b07016348b mac80211: clear subdir_stations when removing debugfs
commit 4479004e64 upstream.

If we don't do this, and we then fail to recreate the debugfs
directory during a mode change, then we will fail later trying
to add stations to this now bogus directory:

BUG: unable to handle kernel NULL pointer dereference at 0000006c
IP: [<c0a92202>] mutex_lock+0x12/0x30
Call Trace:
[<c0678ab4>] start_creating+0x44/0xc0
[<c0679203>] debugfs_create_dir+0x13/0xf0
[<f8a938ae>] ieee80211_sta_debugfs_add+0x6e/0x490 [mac80211]

Signed-off-by: Tom Hughes <tom@compton.nu>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-10 12:20:30 -07:00
Al Viro
693f66d621 9p: forgetting to cancel request on interrupted zero-copy RPC
commit a84b69cb6e upstream.

If we'd already sent a request and decide to abort it, we *must*
issue TFLUSH properly and not just blindly reuse the tag, or
we'll get seriously screwed when response eventually arrives
and we confuse it for response to later request that had reused
the same tag.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-03 09:29:47 -07:00
Trond Myklebust
13d0b6aae6 SUNRPC: Fix a memory leak in the backchannel code
commit 88de6af24f upstream.

req->rq_private_buf isn't initialised when xprt_setup_backchannel calls
xprt_free_allocation.

Fixes: fb7a0b9add ("nfs41: New backchannel helper routines")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-03 09:29:47 -07:00
Michal Kazior
5c817fb317 mac80211: prevent possible crypto tx tailroom corruption
commit ab499db80f upstream.

There was a possible race between
ieee80211_reconfig() and
ieee80211_delayed_tailroom_dec(). This could
result in inability to transmit data if driver
crashed during roaming or rekeying and subsequent
skbs with insufficient tailroom appeared.

This race was probably never seen in the wild
because a device driver would have to crash AND
recover within 0.5s which is very unlikely.

I was able to prove this race exists after
changing the delay to 10s locally and crashing
ath10k via debugfs immediately after GTK
rekeying. In case of ath10k the counter went below
0. This was harmless but other drivers which
actually require tailroom (e.g. for WEP ICV or
MMIC) could end up with the counter at 0 instead
of >0 and introduce insufficient skb tailroom
failures because mac80211 would not resize skbs
appropriately anymore.

Fixes: 8d1f7ecd2a ("mac80211: defer tailroom counter manipulation when roaming")
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-03 09:29:46 -07:00
Ilya Dryomov
7546f8cb2d crush: fix a bug in tree bucket decode
commit 82cd003a77 upstream.

struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
should be used.  -Wconversion catches this, but I guess it went
unnoticed in all the noise it spews.  The actual problem (at least for
common crushmaps) isn't the u32 -> u8 truncation though - it's the
advancement by 4 bytes instead of 1 in the crushmap buffer.

Fixes: http://tracker.ceph.com/issues/2759

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-03 09:29:46 -07:00
Alexander Sverdlin
59a460c394 sctp: Fix race between OOTB responce and route removal
[ Upstream commit 29c4afc4e9 ]

There is NULL pointer dereference possible during statistics update if the route
used for OOTB responce is removed at unfortunate time. If the route exists when
we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
ABORT, but in the meantime route is removed under our feet, we take "no_route"
path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).

But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
sctp_transport_set_owner() and therefore there is no asoc associated with this
packet. Probably temporary asoc just for OOTB responces is overkill, so just
introduce a check like in all other places in sctp_packet_transmit(), where
"asoc" is dereferenced.

To reproduce this, one needs to
0. ensure that sctp module is loaded (otherwise ABORT is not generated)
1. remove default route on the machine
2. while true; do
     ip route del [interface-specific route]
     ip route add [interface-specific route]
   done
3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
   responce

On x86_64 the crash looks like this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O    4.0.5-1-ARCH #1
Hardware name: ...
task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000
RIP: 0010:[<ffffffffa05ec9ac>]  [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
RSP: 0018:ffff880127c037b8  EFLAGS: 00010296
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480
RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700
RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af
R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28
R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0
FS:  0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0
Stack:
 ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400
 ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520
 0000000000000000 0000000000000001 0000000000000000 0000000000000000
Call Trace:
 <IRQ>
 [<ffffffffa05c94c5>] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp]
 [<ffffffffa05d6b42>] ? sctp_transport_put+0x52/0x80 [sctp]
 [<ffffffffa05d0bfc>] sctp_do_sm+0xb8c/0x19a0 [sctp]
 [<ffffffff810b0e00>] ? trigger_load_balance+0x90/0x210
 [<ffffffff810e0329>] ? update_process_times+0x59/0x60
 [<ffffffff812c7a40>] ? timerqueue_add+0x60/0xb0
 [<ffffffff810e0549>] ? enqueue_hrtimer+0x29/0xa0
 [<ffffffff8101f599>] ? read_tsc+0x9/0x10
 [<ffffffff8116d4b5>] ? put_page+0x55/0x60
 [<ffffffff810ee1ad>] ? clockevents_program_event+0x6d/0x100
 [<ffffffff81462b68>] ? skb_free_head+0x58/0x80
 [<ffffffffa029a10b>] ? chksum_update+0x1b/0x27 [crc32c_generic]
 [<ffffffff81283f3e>] ? crypto_shash_update+0xce/0xf0
 [<ffffffffa05d3993>] sctp_endpoint_bh_rcv+0x113/0x280 [sctp]
 [<ffffffffa05dd4e6>] sctp_inq_push+0x46/0x60 [sctp]
 [<ffffffffa05ed7a0>] sctp_rcv+0x880/0x910 [sctp]
 [<ffffffffa05ecb50>] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp]
 [<ffffffffa05ecb70>] ? sctp_csum_update+0x20/0x20 [sctp]
 [<ffffffff814b05a5>] ? ip_route_input_noref+0x235/0xd30
 [<ffffffff81051d6b>] ? ack_ioapic_level+0x7b/0x150
 [<ffffffff814b27be>] ip_local_deliver_finish+0xae/0x210
 [<ffffffff814b2e15>] ip_local_deliver+0x35/0x90
 [<ffffffff814b2a15>] ip_rcv_finish+0xf5/0x370
 [<ffffffff814b3128>] ip_rcv+0x2b8/0x3a0
 [<ffffffff81474193>] __netif_receive_skb_core+0x763/0xa50
 [<ffffffff81476c28>] __netif_receive_skb+0x18/0x60
 [<ffffffff81476cb0>] netif_receive_skb_internal+0x40/0xd0
 [<ffffffff814776c8>] napi_gro_receive+0xe8/0x120
 [<ffffffffa03946aa>] rtl8169_poll+0x2da/0x660 [r8169]
 [<ffffffff8147896a>] net_rx_action+0x21a/0x360
 [<ffffffff81078dc1>] __do_softirq+0xe1/0x2d0
 [<ffffffff8107912d>] irq_exit+0xad/0xb0
 [<ffffffff8157d158>] do_IRQ+0x58/0xf0
 [<ffffffff8157b06d>] common_interrupt+0x6d/0x6d
 <EOI>
 [<ffffffff810e1218>] ? hrtimer_start+0x18/0x20
 [<ffffffffa05d65f9>] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp]
 [<ffffffff81020c50>] ? mwait_idle+0x60/0xa0
 [<ffffffff810216ef>] arch_cpu_idle+0xf/0x20
 [<ffffffff810b731c>] cpu_startup_entry+0x3ec/0x480
 [<ffffffff8156b365>] rest_init+0x85/0x90
 [<ffffffff818eb035>] start_kernel+0x48b/0x4ac
 [<ffffffff818ea120>] ? early_idt_handlers+0x120/0x120
 [<ffffffff818ea339>] x86_64_start_reservations+0x2a/0x2c
 [<ffffffff818ea49c>] x86_64_start_kernel+0x161/0x184
Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 <48> 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9
RIP  [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
 RSP <ffff880127c037b8>
CR2: 0000000000000020
---[ end trace 5aec7fd2dc983574 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
drm_kms_helper: panic occurred, switching back to text console
---[ end Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-10 10:40:21 -07:00
Willem de Bruijn
4552ddd07a packet: avoid out of bounds read in round robin fanout
[ Upstream commit 468479e604 ]

PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
f->num_members. It returns the old value unconditionally, but
f->num_members may have changed since the last store. Ensure
that the return value is always < num.

When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.

Fixes: dc99f60069 ("packet: Add fanout support.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-10 10:40:20 -07:00
Eric Dumazet
7eee92bf2e packet: read num_members once in packet_rcv_fanout()
[ Upstream commit f98f4514d0 ]

We need to tell compiler it must not read f->num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()

Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417 ("packet: Add 'cpu' fanout policy.")

Fixes: dc99f60069 ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-10 10:40:20 -07:00
Nikolay Aleksandrov
019b1332af bridge: fix br_stp_set_bridge_priority race conditions
[ Upstream commit 2dab80a8b4 ]

After the ->set() spinlocks were removed br_stp_set_bridge_priority
was left running without any protection when used via sysfs. It can
race with port add/del and could result in use-after-free cases and
corrupted lists. Tested by running port add/del in a loop with stp
enabled while setting priority in a loop, crashes are easily
reproducible.
The spinlocks around sysfs ->set() were removed in commit:
14f98f258f ("bridge: range check STP parameters")
There's also a race condition in the netlink priority support that is
fixed by this change, but it was introduced recently and the fixes tag
covers it, just in case it's needed the commit is:
af615762e9 ("bridge: add ageing_time, stp_state, priority over netlink")

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 14f98f258f ("bridge: range check STP parameters")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-10 10:40:20 -07:00
Nikolay Aleksandrov
80cbc4a31b bridge: fix multicast router rlist endless loop
[ Upstream commit 1a040eaca1 ]

Since the addition of sysfs multicast router support if one set
multicast_router to "2" more than once, then the port would be added to
the hlist every time and could end up linking to itself and thus causing an
endless loop for rlist walkers.
So to reproduce just do:
echo 2 > multicast_router; echo 2 > multicast_router;
in a bridge port and let some igmp traffic flow, for me it hangs up
in br_multicast_flood().
Fix this by adding a check in br_multicast_add_router() if the port is
already linked.
The reason this didn't happen before the addition of multicast_router
sysfs entries is because there's a !hlist_unhashed check that prevents
it.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 0909e11758 ("bridge: Add multicast_router sysfs entries")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-10 10:40:20 -07:00
Michal Kubeček
c1e1eb159e ipv6: update ip6_rt_last_gc every time GC is run
commit 49a18d86f6 upstream.

As pointed out by Eric Dumazet, net->ipv6.ip6_rt_last_gc should
hold the last time garbage collector was run so that we should
update it whenever fib6_run_gc() calls fib6_clean_all(), not only
if we got there from ip6_dst_gc().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-03 19:48:09 -07:00
Michal Kubeček
ba310bc86f ipv6: prevent fib6_run_gc() contention
commit 2ac3ac8f86 upstream.

On a high-traffic router with many processors and many IPv6 dst
entries, soft lockup in fib6_run_gc() can occur when number of
entries reaches gc_thresh.

This happens because fib6_run_gc() uses fib6_gc_lock to allow
only one thread to run the garbage collector but ip6_dst_gc()
doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
returns. On a system with many entries, this can take some time
so that in the meantime, other threads pass the tests in
ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
the lock. They then have to run the garbage collector one after
another which blocks them for quite long.

Resolve this by replacing special value ~0UL of expire parameter
to fib6_run_gc() by explicit "force" parameter to choose between
spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
force=false if gc_thresh is reached but not max_size.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-03 19:48:09 -07:00
Steffen Klassert
528555d05b xfrm: Increase the garbage collector threshold
commit eeb1b73378 upstream.

With the removal of the routing cache, we lost the
option to tweak the garbage collector threshold
along with the maximum routing cache size. So git
commit 703fb94ec ("xfrm: Fix the gc threshold value
for ipv4") moved back to a static threshold.

It turned out that the current threshold before we
start garbage collecting is much to small for some
workloads, so increase it from 1024 to 32768. This
means that we start the garbage collector if we have
more than 32768 dst entries in the system and refuse
new allocations if we are above 65536.

Reported-by: Wolfgang Walter <linux@stwm.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Stephen Hemminger <shemming@brocade.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-03 19:48:09 -07:00
Ian Wilson
7c6300bb49 netfilter: Zero the tuple in nfnl_cthelper_parse_tuple()
commit 78146572b9 upstream.

nfnl_cthelper_parse_tuple() is called from nfnl_cthelper_new(),
nfnl_cthelper_get() and nfnl_cthelper_del().  In each case they pass
a pointer to an nf_conntrack_tuple data structure local variable:

    struct nf_conntrack_tuple tuple;
    ...
    ret = nfnl_cthelper_parse_tuple(&tuple, tb[NFCTH_TUPLE]);

The problem is that this local variable is not initialized, and
nfnl_cthelper_parse_tuple() only initializes two fields: src.l3num and
dst.protonum.  This leaves all other fields with undefined values
based on whatever is on the stack:

    tuple->src.l3num = ntohs(nla_get_be16(tb[NFCTH_TUPLE_L3PROTONUM]));
    tuple->dst.protonum = nla_get_u8(tb[NFCTH_TUPLE_L4PROTONUM]);

The symptom observed was that when the rpc and tns helpers were added
then traffic to port 1536 was being sent to user-space.

Signed-off-by: Ian Wilson <iwilson@brocade.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-03 19:48:08 -07:00
Chen Gang
59bc21951c netfilter: nfnetlink_cthelper: Remove 'const' and '&' to avoid warnings
commit b18c5d15e8 upstream.

The related code can be simplified, and also can avoid related warnings
(with allmodconfig under parisc):

    CC [M]  net/netfilter/nfnetlink_cthelper.o
  net/netfilter/nfnetlink_cthelper.c: In function ‘nfnl_cthelper_from_nlattr’:
  net/netfilter/nfnetlink_cthelper.c:97:9: warning: passing argument 1 o ‘memcpy’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-array-qualifiers]
    memcpy(&help->data, nla_data(attr), help->helper->data_len);
           ^
  In file included from include/linux/string.h:17:0,
                   from include/uapi/linux/uuid.h:25,
                   from include/linux/uuid.h:23,
                   from include/linux/mod_devicetable.h:12,
                   from ./arch/parisc/include/asm/hardware.h:4,
                   from ./arch/parisc/include/asm/processor.h:15,
                   from ./arch/parisc/include/asm/spinlock.h:6,
                   from ./arch/parisc/include/asm/atomic.h:21,
                   from include/linux/atomic.h:4,
                   from ./arch/parisc/include/asm/bitops.h:12,
                   from include/linux/bitops.h:36,
                   from include/linux/kernel.h:10,
                   from include/linux/list.h:8,
                   from include/linux/module.h:9,
                   from net/netfilter/nfnetlink_cthelper.c:11:
  ./arch/parisc/include/asm/string.h:8:8: note: expected ‘void *’ but argument is of type ‘const char (*)[]’
   void * memcpy(void * dest,const void *src,size_t count);
          ^

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-03 19:48:08 -07:00
Johannes Berg
98d94f20a3 cfg80211: wext: clear sinfo struct before calling driver
commit 9c5a18a31b upstream.

Until recently, mac80211 overwrote all the statistics it could
provide when getting called, but it now relies on the struct
having been zeroed by the caller. This was always the case in
nl80211, but wext used a static struct which could even cause
values from one device leak to another.

Using a static struct is OK (as even documented in a comment)
since the whole usage of this function and its return value is
always locked under RTNL. Not clearing the struct for calling
the driver has always been wrong though, since drivers were
free to only fill values they could report, so calling this
for one device and then for another would always have leaked
values from one to the other.

Fix this by initializing the structure in question before the
driver method call.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=99691

Reported-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Reported-by: Alexander Kaltsas <alexkaltsas@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-22 16:55:54 -07:00
Eric Dumazet
a3cfde2a31 udp: fix behavior of wrong checksums
[ Upstream commit beb39db59d ]

We have two problems in UDP stack related to bogus checksums :

1) We return -EAGAIN to application even if receive queue is not empty.
   This breaks applications using edge trigger epoll()

2) Under UDP flood, we can loop forever without yielding to other
   processes, potentially hanging the host, especially on non SMP.

This patch is an attempt to make things better.

We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-22 16:55:51 -07:00
WANG Cong
09982800fd net_sched: invoke ->attach() after setting dev->qdisc
[ Upstream commit 86e363dc3b ]

For mq qdisc, we add per tx queue qdisc to root qdisc
for display purpose, however, that happens too early,
before the new dev->qdisc is finally set, this causes
q->list points to an old root qdisc which is going to be
freed right before assigning with a new one.

Fix this by moving ->attach() after setting dev->qdisc.

For the record, this fixes the following crash:

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 975 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
 list_del corruption. prev->next should be ffff8800d1998ae8, but was 6b6b6b6b6b6b6b6b
 CPU: 1 PID: 975 Comm: tc Not tainted 4.1.0-rc4+ #1019
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000009 ffff8800d73fb928 ffffffff81a44e7f 0000000047574756
  ffff8800d73fb978 ffff8800d73fb968 ffffffff810790da ffff8800cfc4cd20
  ffffffff814e725b ffff8800d1998ae8 ffffffff82381250 0000000000000000
 Call Trace:
  [<ffffffff81a44e7f>] dump_stack+0x4c/0x65
  [<ffffffff810790da>] warn_slowpath_common+0x9c/0xb6
  [<ffffffff814e725b>] ? __list_del_entry+0x5a/0x98
  [<ffffffff81079162>] warn_slowpath_fmt+0x46/0x48
  [<ffffffff81820eb0>] ? dev_graft_qdisc+0x5e/0x6a
  [<ffffffff814e725b>] __list_del_entry+0x5a/0x98
  [<ffffffff814e72a7>] list_del+0xe/0x2d
  [<ffffffff81822f05>] qdisc_list_del+0x1e/0x20
  [<ffffffff81820cd1>] qdisc_destroy+0x30/0xd6
  [<ffffffff81822676>] qdisc_graft+0x11d/0x243
  [<ffffffff818233c1>] tc_get_qdisc+0x1a6/0x1d4
  [<ffffffff810b5eaf>] ? mark_lock+0x2e/0x226
  [<ffffffff817ff8f5>] rtnetlink_rcv_msg+0x181/0x194
  [<ffffffff817ff72e>] ? rtnl_lock+0x17/0x19
  [<ffffffff817ff72e>] ? rtnl_lock+0x17/0x19
  [<ffffffff817ff774>] ? __rtnl_unlock+0x17/0x17
  [<ffffffff81855dc6>] netlink_rcv_skb+0x4d/0x93
  [<ffffffff817ff756>] rtnetlink_rcv+0x26/0x2d
  [<ffffffff818544b2>] netlink_unicast+0xcb/0x150
  [<ffffffff81161db9>] ? might_fault+0x59/0xa9
  [<ffffffff81854f78>] netlink_sendmsg+0x4fa/0x51c
  [<ffffffff817d6e09>] sock_sendmsg_nosec+0x12/0x1d
  [<ffffffff817d8967>] sock_sendmsg+0x29/0x2e
  [<ffffffff817d8cf3>] ___sys_sendmsg+0x1b4/0x23a
  [<ffffffff8100a1b8>] ? native_sched_clock+0x35/0x37
  [<ffffffff810a1d83>] ? sched_clock_local+0x12/0x72
  [<ffffffff810a1fd4>] ? sched_clock_cpu+0x9e/0xb7
  [<ffffffff810def2a>] ? current_kernel_time+0xe/0x32
  [<ffffffff810b4bc5>] ? lock_release_holdtime.part.29+0x71/0x7f
  [<ffffffff810ddebf>] ? read_seqcount_begin.constprop.27+0x5f/0x76
  [<ffffffff810b6292>] ? trace_hardirqs_on_caller+0x17d/0x199
  [<ffffffff811b14d5>] ? __fget_light+0x50/0x78
  [<ffffffff817d9808>] __sys_sendmsg+0x42/0x60
  [<ffffffff817d9838>] SyS_sendmsg+0x12/0x1c
  [<ffffffff81a50e97>] system_call_fastpath+0x12/0x6f
 ---[ end trace ef29d3fb28e97ae7 ]---

For long term, we probably need to clean up the qdisc_graft() code
in case it hides other bugs like this.

Fixes: 95dc19299f ("pkt_sched: give visibility to mq slave qdiscs")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-22 16:55:51 -07:00
Mark Salyzyn
7659c93447 unix/caif: sk_socket can disappear when state is unlocked
[ Upstream commit b48732e4a4 ]

got a rare NULL pointer dereference in clear_bit

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
----
v2: switch to sock_flag(sk, SOCK_DEAD) and added net/caif/caif_socket.c
v3: return -ECONNRESET in upstream caller of wait function for SOCK_DEAD
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-22 16:55:51 -07:00
Thadeu Lima de Souza Cascardo
2f5f714f6d bridge: fix parsing of MLDv2 reports
[ Upstream commit 47cc84ce0c ]

When more than a multicast address is present in a MLDv2 report, all but
the first address is ignored, because the code breaks out of the loop if
there has not been an error adding that address.

This has caused failures when two guests connected through the bridge
tried to communicate using IPv6. Neighbor discoveries would not be
transmitted to the other guest when both used a link-local address and a
static address.

This only happens when there is a MLDv2 querier in the network.

The fix will only break out of the loop when there is a failure adding a
multicast address.

The mdb before the patch:

dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp

After the patch:

dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::fb temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp
dev ovirtmgmt port bond0.86 grp ff02::d temp
dev ovirtmgmt port vnet0 grp ff02::1:ff00:76 temp
dev ovirtmgmt port bond0.86 grp ff02::16 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff00:77 temp
dev ovirtmgmt port bond0.86 grp ff02::1:ff00:def temp
dev ovirtmgmt port bond0.86 grp ff02::1:ffa1:40bf temp

Fixes: 08b202b672 ("bridge br_multicast: IPv6 MLD support.")
Reported-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Tested-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-22 16:55:51 -07:00
Eric W. Biederman
8dd4573191 ipv4: Avoid crashing in ip_error
[ Upstream commit 381c759d99 ]

ip_error does not check if in_dev is NULL before dereferencing it.

IThe following sequence of calls is possible:
CPU A                          CPU B
ip_rcv_finish
    ip_route_input_noref()
        ip_route_input_slow()
                               inetdev_destroy()
    dst_input()

With the result that a network device can be destroyed while processing
an input packet.

A crash was triggered with only unicast packets in flight, and
forwarding enabled on the only network device.   The error condition
was created by the removal of the network device.

As such it is likely the that error code was -EHOSTUNREACH, and the
action taken by ip_error (if in_dev had been accessible) would have
been to not increment any counters and to have tried and likely failed
to send an icmp error as the network device is going away.

Therefore handle this weird case by just dropping the packet if
!in_dev.  It will result in dropping the packet sooner, and will not
result in an actual change of behavior.

Fixes: 251da41301 ("ipv4: Cache ip_error() routes even when not forwarding.")
Reported-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Tested-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Signed-off-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-22 16:55:51 -07:00
Scott Mayhew
c709ca10c5 svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failures
commit 9507271d96 upstream.

In an environment where the KDC is running Active Directory, the
exported composite name field returned in the context could be large
enough to span a page boundary.  Attaching a scratch buffer to the
decoding xdr_stream helps deal with those cases.

The case where we saw this was actually due to behavior that's been
fixed in newer gss-proxy versions, but we're fixing it here too.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-05 23:19:59 -07:00
Ilya Dryomov
d2e3dbf9c3 libceph: request a new osdmap if lingering request maps to no osd
commit b049453221 upstream.

This commit does two things.  First, if there are any homeless
lingering requests, we now request a new osdmap even if the osdmap that
is being processed brought no changes, i.e. if a given lingering
request turned homeless in one of the previous epochs and remained
homeless in the current epoch.  Not doing so leaves us with a stale
osdmap and as a result we may miss our window for reestablishing the
watch and lose notifies.

MON=1 OSD=1:

    # cat linger-needmap.sh
    #!/bin/bash
    rbd create --size 1 test
    DEV=$(rbd map test)
    ceph osd out 0
    rbd map dne/dne # obtain a new osdmap as a side effect (!)
    sleep 1
    ceph osd in 0
    rbd resize --size 2 test
    # rbd info test | grep size -> 2M
    # blockdev --getsize $DEV -> 1M

N.B.: Not obtaining a new osdmap in between "osd out" and "osd in"
above is enough to make it miss that resize notify, but that is a
bug^Wlimitation of ceph watch/notify v1.

Second, homeless lingering requests are now kicked just like those
lingering requests whose mapping has changed.  This is mainly to
recognize that a homeless lingering request makes no sense and to
preserve the invariant that a registered lingering request is not
sitting on any of r_req_lru_item lists.  This spares us a WARN_ON,
which commit ba9d114ec5 ("libceph: clear r_req_lru_item in
__unregister_linger_request()") tried to fix the _wrong_ way.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-05 23:19:54 -07:00
Junling Zheng
521c4dd441 net: socket: Fix the wrong returns for recvmsg and sendmsg
Based on 08adb7dabd upstream.

We found that after v3.10.73, recvmsg might return -EFAULT while -EINVAL
was expected.

We tested it through the recvmsg01 testcase come from LTP testsuit. It set
msg->msg_namelen to -1 and the recvmsg syscall returned errno 14, which is
unexpected (errno 22 is expected):

recvmsg01    4  TFAIL  :  invalid socket length ; returned -1 (expected -1),
errno 14 (expected 22)

Linux mainline has no this bug for commit 08adb7dab fixes it accidentally.
However, it is too large and complex to be backported to LTS 3.10.

Commit 281c9c36 (net: compat: Update get_compat_msghdr() to match
copy_msghdr_from_user() behaviour) made get_compat_msghdr() return
error if msg_sys->msg_namelen was negative, which changed the behaviors
of recvmsg and sendmsg syscall in a lib32 system:

Before commit 281c9c36, get_compat_msghdr() wouldn't fail and it would
return -EINVAL in move_addr_to_user() or somewhere if msg_sys->msg_namelen
was invalid and then syscall returned -EINVAL, which is correct.

And now, when msg_sys->msg_namelen is negative, get_compat_msghdr() will
fail and wants to return -EINVAL, however, the outer syscall will return
-EFAULT directly, which is unexpected.

This patch gets the return value of get_compat_msghdr() as well as
copy_msghdr_from_user(), then returns this expected value if
get_compat_msghdr() fails.

Fixes: 281c9c36 (net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour)
Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Signed-off-by: Hanbing Xu <xuhanbing@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-05 23:19:53 -07:00
David S. Miller
cf30cd6bbc ipv4: Missing sk_nulls_node_init() in ping_unhash().
[ Upstream commit a134f083e7 ]

If we don't do that, then the poison value is left in the ->pprev
backlink.

This can cause crashes if we do a disconnect, followed by a connect().

Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Wen Xu <hotdog3645@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-13 05:15:41 -07:00
Eric Dumazet
bc9f0ea1c7 tcp: avoid looping in tcp_send_fin()
[ Upstream commit 845704a535 ]

Presence of an unbound loop in tcp_send_fin() had always been hard
to explain when analyzing crash dumps involving gigantic dying processes
with millions of sockets.

Lets try a different strategy :

In case of memory pressure, try to add the FIN flag to last packet
in write queue, even if packet was already sent. TCP stack will
be able to deliver this FIN after a timeout event. Note that this
FIN being delivered by a retransmit, it also carries a Push flag
given our current implementation.

By checking sk_under_memory_pressure(), we anticipate that cooking
many FIN packets might deplete tcp memory.

In the case we could not allocate a packet, even with __GFP_WAIT
allocation, then not sending a FIN seems quite reasonable if it allows
to get rid of this socket, free memory, and not block the process from
eventually doing other useful work.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06 21:56:20 +02:00
Eric Dumazet
aac9fda375 tcp: fix possible deadlock in tcp_send_fin()
[ Upstream commit d83769a580 ]

Using sk_stream_alloc_skb() in tcp_send_fin() is dangerous in
case a huge process is killed by OOM, and tcp_mem[2] is hit.

To be able to free memory we need to make progress, so this
patch allows FIN packets to not care about tcp_mem[2], if
skb allocation succeeded.

In a follow-up patch, we might abort tcp_send_fin() infinite loop
in case TIF_MEMDIE is set on this thread, as memory allocator
did its best getting extra memory already.

This patch reverts d22e153718 ("tcp: fix tcp fin memory accounting")

Fixes: d22e153718 ("tcp: fix tcp fin memory accounting")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06 21:56:20 +02:00
Sebastian Pöhn
fb138c4699 ip_forward: Drop frames with attached skb->sk
[ Upstream commit 2ab957492d ]

Initial discussion was:
[FYI] xfrm: Don't lookup sk_policy for timewait sockets

Forwarded frames should not have a socket attached. Especially
tw sockets will lead to panics later-on in the stack.

This was observed with TPROXY assigning a tw socket and broken
policy routing (misconfigured). As a result frame enters
forwarding path instead of input. We cannot solve this in
TPROXY as it cannot know that policy routing is broken.

v2:
Remove useless comment

Signed-off-by: Sebastian Poehn <sebastian.poehn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06 21:56:20 +02:00
Florian Westphal
752b388c92 netfilter: conntrack: disable generic tracking for known protocols
commit db29a9508a upstream.

Given following iptables ruleset:

-P FORWARD DROP
-A FORWARD -m sctp --dport 9 -j ACCEPT
-A FORWARD -p tcp --dport 80 -j ACCEPT
-A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT

One would assume that this allows SCTP on port 9 and TCP on port 80.
Unfortunately, if the SCTP conntrack module is not loaded, this allows
*all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
which we think is a security issue.

This is because on the first SCTP packet on port 9, we create a dummy
"generic l4" conntrack entry without any port information (since
conntrack doesn't know how to extract this information).

All subsequent packets that are unknown will then be in established
state since they will fallback to proto_generic and will match the
'generic' entry.

Our originally proposed version [1] completely disabled generic protocol
tracking, but Jozsef suggests to not track protocols for which a more
suitable helper is available, hence we now mitigate the issue for in
tree known ct protocol helpers only, so that at least NAT and direction
information will still be preserved for others.

 [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html

Joint work with Daniel Borkmann.

Fixes CVE-2014-8160.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Zhiqiang Zhang <zhangzhiqiang.zhang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-29 10:33:59 +02:00
Eric Dumazet
ef15025a4e tcp: tcp_make_synack() should clear skb->tstamp
[ Upstream commit b50edd7812 ]

I noticed tcpdump was giving funky timestamps for locally
generated SYNACK messages on loopback interface.

11:42:46.938990 IP 127.0.0.1.48245 > 127.0.0.2.23850: S
945476042:945476042(0) win 43690 <mss 65495,nop,nop,sackOK,nop,wscale 7>

20:28:58.502209 IP 127.0.0.2.23850 > 127.0.0.1.48245: S
3160535375:3160535375(0) ack 945476043 win 43690 <mss
65495,nop,nop,sackOK,nop,wscale 7>

This is because we need to clear skb->tstamp before
entering lower stack, otherwise net_timestamp_check()
does not set skb->tstamp.

Fixes: 7faee5c0d5 ("tcp: remove TCP_SKB_CB(skb)->when")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-29 10:33:55 +02:00
Neal Cardwell
c31d60c297 tcp: fix FRTO undo on cumulative ACK of SACKed range
[ Upstream commit 666b805150 ]

On processing cumulative ACKs, the FRTO code was not checking the
SACKed bit, meaning that there could be a spurious FRTO undo on a
cumulative ACK of a previously SACKed skb.

The FRTO code should only consider a cumulative ACK to indicate that
an original/unretransmitted skb is newly ACKed if the skb was not yet
SACKed.

The effect of the spurious FRTO undo would typically be to make the
connection think that all previously-sent packets were in flight when
they really weren't, leading to a stall and an RTO.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Fixes: e33099f96d ("tcp: implement RFC5682 F-RTO")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-29 10:33:55 +02:00
D.S. Ljungmark
5a2267373e ipv6: Don't reduce hop limit for an interface
[ Upstream commit 6fd99094de ]

A local route may have a lower hop_limit set than global routes do.

RFC 3756, Section 4.2.7, "Parameter Spoofing"

>   1.  The attacker includes a Current Hop Limit of one or another small
>       number which the attacker knows will cause legitimate packets to
>       be dropped before they reach their destination.

>   As an example, one possible approach to mitigate this threat is to
>   ignore very small hop limits.  The nodes could implement a
>   configurable minimum hop limit, and ignore attempts to set it below
>   said limit.

Signed-off-by: D.S. Ljungmark <ljungmark@modio.se>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-29 10:33:55 +02:00
Michal Kubeček
1b946e381d tcp: prevent fetching dst twice in early demux code
[ Upstream commit d0c294c53a ]

On s390x, gcc 4.8 compiles this part of tcp_v6_early_demux()

        struct dst_entry *dst = sk->sk_rx_dst;

        if (dst)
                dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie);

to code reading sk->sk_rx_dst twice, once for the test and once for
the argument of ip6_dst_check() (dst_check() is inline). This allows
ip6_dst_check() to be called with null first argument, causing a crash.

Protect sk->sk_rx_dst access by ACCESS_ONCE() both in IPv4 and IPv6
TCP early demux code.

Fixes: 41063e9dd1 ("ipv4: Early TCP socket demux.")
Fixes: c7109986db ("ipv6: Early TCP socket demux")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-29 10:33:54 +02:00
Alex Elder
1554b19c40 remove extra definitions of U32_MAX
commit 04f9b74e4d upstream.

Now that the definition is centralized in <linux/kernel.h>, the
definitions of U32_MAX (and related) elsewhere in the kernel can be
removed.

Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Sage Weil <sage@inktank.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-29 10:33:54 +02:00
Alex Elder
b81036aa35 conditionally define U32_MAX
commit 77719536dc upstream.

The symbol U32_MAX is defined in several spots.  Change these
definitions to be conditional.  This is in preparation for the next
patch, which centralizes the definition in <linux/kernel.h>.

Signed-off-by: Alex Elder <elder@linaro.org>
Cc: Sage Weil <sage@inktank.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-29 10:33:54 +02:00
Sasha Levin
85ec36aada net: llc: use correct size for sysctl timeout entries
commit 6b8d9117cc upstream.

The timeout entries are sizeof(int) rather than sizeof(long), which
means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-19 10:10:50 +02:00
Sasha Levin
8e519c3eb9 net: rds: use correct size for max unacked packets and bytes
commit db27ebb111 upstream.

Max unacked packets/bytes is an int while sizeof(long) was used in the
sysctl table.

This means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-19 10:10:50 +02:00
Jiri Slaby
d212dc60a1 core, nfqueue, openvswitch: fix compilation warning
Stable commit "core, nfqueue, openvswitch: Orphan frags in
skb_zerocopy and handle errors", upstream commit
36d5fe6a00, was not correctly backported
and missed to change a const 'from' parameter to non-const.  This
results in a new batch of warnings:

net/netfilter/nfnetlink_queue_core.c: In function ‘nfqnl_zcopy’:
net/netfilter/nfnetlink_queue_core.c:272:2: warning: passing argument 1 of ‘skb_orphan_frags’ discards ‘const’ qualifier from pointer target type [enabled by default]
  if (unlikely(skb_orphan_frags(from, GFP_ATOMIC))) {
  ^
In file included from net/netfilter/nfnetlink_queue_core.c:18:0:
include/linux/skbuff.h:1822:19: note: expected ‘struct sk_buff *’ but argument is of type ‘const struct sk_buff *’
 static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask)
                   ^
net/netfilter/nfnetlink_queue_core.c:273:3: warning: passing argument 1 of ‘skb_tx_error’ discards ‘const’ qualifier from pointer target type [enabled by default]
   skb_tx_error(from);
   ^
In file included from net/netfilter/nfnetlink_queue_core.c:18:0:
include/linux/skbuff.h:630:13: note: expected ‘struct sk_buff *’ but argument is of type ‘const struct sk_buff *’
 extern void skb_tx_error(struct sk_buff *skb);

Remove const from the 'from' parameter, the same as in the upstream
commit.

As far as I can see, this leaked into 3.10, 3.12, and 3.13 already.

Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kamal Mostafa <kamal.mostafa@canonical.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-19 10:10:50 +02:00
Ben Hutchings
75a518b09f tcp: Fix crash in TCP Fast Open
Commit 355a901e6c ("tcp: make connect() mem charging friendly")
changed tcp_send_syn_data() to perform an open-coded copy of the 'syn'
skb rather than using skb_copy_expand().

The open-coded copy does not cover the skb_shared_info::gso_segs
field, so in the new skb it is left set to 0.  When this commit was
backported into stable branches between 3.10.y and 3.16.7-ckty
inclusive, it triggered the BUG() in tcp_transmit_skb().

Since Linux 3.18 the GSO segment count is kept in the
tcp_skb_cb::tcp_gso_segs field and tcp_send_syn_data() does copy the
tcp_skb_cb structure to the new skb, so mainline and newer stable
branches are not affected.

Set skb_shared_info::gso_segs to the correct value of 1.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-19 10:10:47 +02:00
Bob Copeland
58aef0a81c mac80211: drop unencrypted frames in mesh fwding
commit d0c22119f5 upstream.

The mesh forwarding path was not checking that data
frames were protected when running an encrypted network;
add the necessary check.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-13 14:02:11 +02:00
Michal Kazior
1feca9723e mac80211: disable u-APSD queues by default
commit aa75ebc275 upstream.

Some APs experience problems when working with
U-APSD. Decreasing the probability of that
happening by using legacy mode for all ACs but VO
isn't enough.

Cisco 4410N originally forced us to enable VO by
default only because it treated non-VO ACs as
legacy.

However some APs (notably Netgear R7000) silently
reclassify packets to different ACs. Since u-APSD
ACs require trigger frames for frame retrieval
clients would never see some frames (e.g. ARP
responses) or would fetch them accidentally after
a long time.

It makes little sense to enable u-APSD queues by
default because it needs userspace applications to
be aware of it to actually take advantage of the
possible additional powersavings. Implicitly
depending on driver autotrigger frame support
doesn't make much sense.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-13 14:02:11 +02:00
Johannes Berg
9de9cea310 nl80211: ignore HT/VHT capabilities without QoS/WMM
commit 496fcc294d upstream.

As HT/VHT depend heavily on QoS/WMM, it's not a good idea to
let userspace add clients that have HT/VHT but not QoS/WMM.
Since it does so in certain cases we've observed (client is
using HT IEs but not QoS/WMM) just ignore the HT/VHT info at
this point and don't pass it down to the drivers which might
unconditionally use it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-13 14:02:11 +02:00
Julian Anastasov
96b077b4a5 ipvs: rerouting to local clients is not needed anymore
commit 579eb62ac3 upstream.

commit f5a41847ac ("ipvs: move ip_route_me_harder for ICMP")
from 2.6.37 introduced ip_route_me_harder() call for responses to
local clients, so that we can provide valid rt_src after SNAT.
It was used by TCP to provide valid daddr for ip_send_reply().
After commit 0a5ebb8000 ("ipv4: Pass explicit daddr arg to
ip_send_reply()." from 3.0 this rerouting is not needed anymore
and should be avoided, especially in LOCAL_IN.

Fixes 3.12.33 crash in xfrm reported by Florian Wiessner:
"3.12.33 - BUG xfrm_selector_match+0x25/0x2f6"

Reported-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de>
Tested-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-26 15:01:01 +01:00
Julian Anastasov
2fd8357353 ipvs: add missing ip_vs_pe_put in sync code
commit 528c943f3b upstream.

ip_vs_conn_fill_param_sync() gets in param.pe a module
reference for persistence engine from __ip_vs_pe_getbyname()
but forgets to put it. Problem occurs in backup for
sync protocol v1 (2.6.39).

Also, pe_data usually comes in sync messages for
connection templates and ip_vs_conn_new() copies
the pointer only in this case. Make sure pe_data
is not leaked if it comes unexpectedly for normal
connections. Leak can happen only if bogus messages
are sent to backup server.

Fixes: fe5e7a1efb ("IPVS: Backup, Adding Version 1 receive capability")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-26 15:01:01 +01:00
Oliver Hartkopp
82fb39a22b can: add missing initialisations in CAN related skbuffs
commit 969439016d upstream.

When accessing CAN network interfaces with AF_PACKET sockets e.g. by dhclient
this can lead to a skb_under_panic due to missing skb initialisations.

Add the missing initialisations at the CAN skbuff creation times on driver
level (rx path) and in the network layer (tx path).

Reported-by: Austin Schuh <austin@peloton-tech.com>
Reported-by: Daniel Steer <daniel.steer@mclaren.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-26 15:00:58 +01:00
Eric Dumazet
e64a85197b tcp: make connect() mem charging friendly
[ Upstream commit 355a901e6c ]

While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.

We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()

Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)

Instead of open coding memcpy_fromiovecend(), simply use this helper.

This leaves in socket write queue clean fast clone skbs.

This was tested against our fastopen packetdrill tests.

Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-26 15:00:56 +01:00
Catalin Marinas
281c9c3601 net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour
[ Upstream commit 91edd096e2 ]

Commit db31c55a6f (net: clamp ->msg_namelen instead of returning an
error) introduced the clamping of msg_namelen when the unsigned value
was larger than sizeof(struct sockaddr_storage). This caused a
msg_namelen of -1 to be valid. The native code was subsequently fixed by
commit dbb490b965 (net: socket: error on a negative msg_namelen).

In addition, the native code sets msg_namelen to 0 when msg_name is
NULL. This was done in commit (6a2a2b3ae0 net:socket: set msg_namelen
to 0 if msg_name is passed as NULL in msghdr struct from userland) and
subsequently updated by 08adb7dabd (fold verify_iovec() into
copy_msghdr_from_user()).

This patch brings the get_compat_msghdr() in line with
copy_msghdr_from_user().

Fixes: db31c55a6f (net: clamp ->msg_namelen instead of returning an error)
Cc: David S. Miller <davem@davemloft.net>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-26 15:00:56 +01:00
Josh Hunt
175ff19c37 tcp: fix tcp fin memory accounting
[ Upstream commit d22e153718 ]

tcp_send_fin() does not account for the memory it allocates properly, so
sk_forward_alloc can be negative in cases where we've sent a FIN:

ss example output (ss -amn | grep -B1 f4294):
tcp    FIN-WAIT-1 0      1            192.168.0.1:45520         192.0.2.1:8080
	skmem:(r0,rb87380,t0,tb87380,f4294966016,w1280,o0,bl0)
Acked-by: Eric Dumazet <edumazet@google.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-26 15:00:56 +01:00
Al Viro
208370eb74 rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()
[ Upstream commit 7d985ed1dc ]

[I would really like an ACK on that one from dhowells; it appears to be
quite straightforward, but...]

MSG_PEEK isn't passed to ->recvmsg() via msg->msg_flags; as the matter of
fact, neither the kernel users of rxrpc, nor the syscalls ever set that bit
in there.  It gets passed via flags; in fact, another such check in the same
function is done correctly - as flags & MSG_PEEK.

It had been that way (effectively disabled) for 8 years, though, so the patch
needs beating up - that case had never been tested.  If it is correct, it's
-stable fodder.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-26 15:00:56 +01:00