19748 Commits

Author SHA1 Message Date
Eliad Peller
36935521cd mac80211: timeout a single frame in the rx reorder buffer
commit 07ae2dfcf4 upstream.

The current code checks for stored_mpdu_num > 1, causing
the reorder_timer to be triggered indefinitely, but the
frame is never timed-out (until the next packet is received)

Signed-off-by: Eliad Peller <eliad@wizery.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-20 12:48:11 -08:00
Eric Dumazet
8a533666d1 net: fix NULL dereferences in check_peer_redir()
[ Upstream commit d3aaeb38c4, along
  with dependent backports of commits:
     69cce1d140
     9de79c127c
     218fa90f07
     580da35a31
     f7e57044ee
     e049f28883 ]

Gergely Kalman reported crashes in check_peer_redir().

It appears commit f39925dbde (ipv4: Cache learned redirect
information in inetpeer.) added a race, leading to possible NULL ptr
dereference.

Since we can now change dst neighbour, we should make sure a reader can
safely use a neighbour.

Add RCU protection to dst neighbour, and make sure check_peer_redir()
can be called safely by different cpus in parallel.

As neighbours are already freed after one RCU grace period, this patch
should not add typical RCU penalty (cache cold effects)

Many thanks to Gergely for providing a pretty report pointing to the
bug.

Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-13 11:06:13 -08:00
shawnlu
81ecd154d0 tcp: md5: using remote adress for md5 lookup in rst packet
[ Upstream commit 8a622e71f5 ]

md5 key is added in socket through remote address.
remote address should be used in finding md5 key when
sending out reset packet.

Signed-off-by: shawnlu <shawn.lu@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03 09:19:04 -08:00
Neal Cardwell
8b4bb350e1 tcp: fix tcp_trim_head() to adjust segment count with skb MSS
[ Upstream commit 5b35e1e6e9 ]

This commit fixes tcp_trim_head() to recalculate the number of
segments in the skb with the skb's existing MSS, so trimming the head
causes the skb segment count to be monotonically non-increasing - it
should stay the same or go down, but not increase.

Previously tcp_trim_head() used the current MSS of the connection. But
if there was a decrease in MSS between original transmission and ACK
(e.g. due to PMTUD), this could cause tcp_trim_head() to
counter-intuitively increase the segment count when trimming bytes off
the head of an skb. This violated assumptions in tcp_tso_acked() that
tcp_trim_head() only decreases the packet count, so that packets_acked
in tcp_tso_acked() could underflow, leading tcp_clean_rtx_queue() to
pass u32 pkts_acked values as large as 0xffffffff to
ca_ops->pkts_acked().

As an aside, if tcp_trim_head() had really wanted the skb to reflect
the current MSS, it should have called tcp_set_skb_tso_segs()
unconditionally, since a decrease in MSS would mean that a
single-packet skb should now be sliced into multiple segments.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03 09:19:04 -08:00
David S. Miller
f217c4711d rds: Make rds_sock_lock BH rather than IRQ safe.
[ Upstream commit efc3dbc374 ]

rds_sock_info() triggers locking warnings because we try to perform a
local_bh_enable() (via sock_i_ino()) while hardware interrupts are
disabled (via taking rds_sock_lock).

There is no reason for rds_sock_lock to be a hardware IRQ disabling
lock, none of these access paths run in hardware interrupt context.

Therefore making it a BH disabling lock is safe and sufficient to
fix this bug.

Reported-by: Kumar Sanghvi <kumaras@chelsio.com>
Reported-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03 09:19:04 -08:00
James Chapman
1334533665 l2tp: l2tp_ip - fix possible oops on packet receive
[ Upstream commit 68315801db ]

When a packet is received on an L2TP IP socket (L2TPv3 IP link
encapsulation), the l2tpip socket's backlog_rcv function calls
xfrm4_policy_check(). This is not necessary, since it was called
before the skb was added to the backlog. With CONFIG_NET_NS enabled,
xfrm4_policy_check() will oops if skb->dev is null, so this trivial
patch removes the call.

This bug has always been present, but only when CONFIG_NET_NS is
enabled does it cause problems. Most users are probably using UDP
encapsulation for L2TP, hence the problem has only recently
surfaced.

EIP: 0060:[<c12bb62b>] EFLAGS: 00210246 CPU: 0
EIP is at l2tp_ip_recvmsg+0xd4/0x2a7
EAX: 00000001 EBX: d77b5180 ECX: 00000000 EDX: 00200246
ESI: 00000000 EDI: d63cbd30 EBP: d63cbd18 ESP: d63cbcf4
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Call Trace:
 [<c1218568>] sock_common_recvmsg+0x31/0x46
 [<c1215c92>] __sock_recvmsg_nosec+0x45/0x4d
 [<c12163a1>] __sock_recvmsg+0x31/0x3b
 [<c1216828>] sock_recvmsg+0x96/0xab
 [<c10b2693>] ? might_fault+0x47/0x81
 [<c10b2693>] ? might_fault+0x47/0x81
 [<c1167fd0>] ? _copy_from_user+0x31/0x115
 [<c121e8c8>] ? copy_from_user+0x8/0xa
 [<c121ebd6>] ? verify_iovec+0x3e/0x78
 [<c1216604>] __sys_recvmsg+0x10a/0x1aa
 [<c1216792>] ? sock_recvmsg+0x0/0xab
 [<c105a99b>] ? __lock_acquire+0xbdf/0xbee
 [<c12d5a99>] ? do_page_fault+0x193/0x375
 [<c10d1200>] ? fcheck_files+0x9b/0xca
 [<c10d1259>] ? fget_light+0x2a/0x9c
 [<c1216bbb>] sys_recvmsg+0x2b/0x43
 [<c1218145>] sys_socketcall+0x16d/0x1a5
 [<c11679f0>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c100305f>] sysenter_do_call+0x12/0x38
Code: c6 05 8c ea a8 c1 01 e8 0c d4 d9 ff 85 f6 74 07 3e ff 86 80 00 00 00 b9 17 b6 2b c1 ba 01 00 00 00 b8 78 ed 48 c1 e8 23 f6 d9 ff <ff> 76 0c 68 28 e3 30 c1 68 2d 44 41 c1 e8 89 57 01 00 83 c4 0c

Signed-off-by: James Chapman <jchapman@katalix.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03 09:19:04 -08:00
Eric W. Biederman
62252cba28 net caif: Register properly as a pernet subsystem.
[ Upstream commit 8a8ee9aff6 ]

caif is a subsystem and as such it needs to register with
register_pernet_subsys instead of register_pernet_device.

Among other problems using register_pernet_device was resulting in
net_generic being called before the caif_net structure was allocated.
Which has been causing net_generic to fail with either BUG_ON's or by
return NULL pointers.

A more ugly problem that could be caused is packets in flight why the
subsystem is shutting down.

To remove confusion also remove the cruft cause by inappropriately
trying to fix this bug.

With the aid of the previous patch I have tested this patch and
confirmed that using register_pernet_subsys makes the failure go away as
it should.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03 09:19:03 -08:00
Eric Dumazet
561331eae0 netns: fix net_alloc_generic()
[ Upstream commit 073862ba5d ]

When a new net namespace is created, we should attach to it a "struct
net_generic" with enough slots (even empty), or we can hit the following
BUG_ON() :

[  200.752016] kernel BUG at include/net/netns/generic.h:40!
...
[  200.752016]  [<ffffffff825c3cea>] ? get_cfcnfg+0x3a/0x180
[  200.752016]  [<ffffffff821cf0b0>] ? lockdep_rtnl_is_held+0x10/0x20
[  200.752016]  [<ffffffff825c41be>] caif_device_notify+0x2e/0x530
[  200.752016]  [<ffffffff810d61b7>] notifier_call_chain+0x67/0x110
[  200.752016]  [<ffffffff810d67c1>] raw_notifier_call_chain+0x11/0x20
[  200.752016]  [<ffffffff821bae82>] call_netdevice_notifiers+0x32/0x60
[  200.752016]  [<ffffffff821c2b26>] register_netdevice+0x196/0x300
[  200.752016]  [<ffffffff821c2ca9>] register_netdev+0x19/0x30
[  200.752016]  [<ffffffff81c1c67a>] loopback_net_init+0x4a/0xa0
[  200.752016]  [<ffffffff821b5e62>] ops_init+0x42/0x180
[  200.752016]  [<ffffffff821b600b>] setup_net+0x6b/0x100
[  200.752016]  [<ffffffff821b6466>] copy_net_ns+0x86/0x110
[  200.752016]  [<ffffffff810d5789>] create_new_namespaces+0xd9/0x190

net_alloc_generic() should take into account the maximum index into the
ptr array, as a subsystem might use net_generic() anytime.

This also reduces number of reallocations in net_assign_generic()

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sjur Brændeland <sjur.brandeland@stericsson.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03 09:19:03 -08:00
Nick Bowler
ffee9a18f2 ah: Don't return NET_XMIT_DROP on input.
commit 4b90a603a1 upstream.

When the ahash driver returns -EBUSY, AH4/6 input functions return
NET_XMIT_DROP, presumably copied from the output code path.  But
returning transmit codes on input doesn't make a lot of sense.
Since NET_XMIT_DROP is a positive int, this gets interpreted as
the next header type (i.e., success).  As that can only end badly,
remove the check.

Signed-off-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03 09:18:54 -08:00
Nick Bowler
c0ab420c68 ah: Read nexthdr value before overwriting it in ahash input callback.
commit b7ea81a58a upstream.

The AH4/6 ahash input callbacks read out the nexthdr field from the AH
header *after* they overwrite that header.  This is obviously not going
to end well.  Fix it up.

Signed-off-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25 17:24:51 -08:00
Nick Bowler
d253520a7b ah: Correctly pass error codes in ahash output callback.
commit 069294e813 upstream.

The AH4/6 ahash output callbacks pass nexthdr to xfrm_output_resume
instead of the error code.  This appears to be a copy+paste error from
the input case, where nexthdr is expected.  This causes the driver to
continuously add AH headers to the datagram until either an allocation
fails and the packet is dropped or the ahash driver hits a synchronous
fallback and the resulting monstrosity is transmitted.

Correct this issue by simply passing the error code unadulterated.

Signed-off-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25 17:24:51 -08:00
J. Bruce Fields
a141a5eb3a svcrpc: avoid memory-corruption on pool shutdown
commit b4f36f88b3 upstream.

Socket callbacks use svc_xprt_enqueue() to add an xprt to a
pool->sp_sockets list.  In normal operation a server thread will later
come along and take the xprt off that list.  On shutdown, after all the
threads have exited, we instead manually walk the sv_tempsocks and
sv_permsocks lists to find all the xprt's and delete them.

So the sp_sockets lists don't really matter any more.  As a result,
we've mostly just ignored them and hoped they would go away.

Which has gotten us into trouble; witness for example ebc63e531c
"svcrpc: fix list-corrupting race on nfsd shutdown", the result of Ben
Greear noticing that a still-running svc_xprt_enqueue() could re-add an
xprt to an sp_sockets list just before it was deleted.  The fix was to
remove it from the list at the end of svc_delete_xprt().  But that only
made corruption less likely--I can see nothing that prevents a
svc_xprt_enqueue() from adding another xprt to the list at the same
moment that we're removing this xprt from the list.  In fact, despite
the earlier xpo_detach(), I don't even see what guarantees that
svc_xprt_enqueue() couldn't still be running on this xprt.

So, instead, note that svc_xprt_enqueue() essentially does:
	lock sp_lock
		if XPT_BUSY unset
			add to sp_sockets
	unlock sp_lock

So, if we do:

	set XPT_BUSY on every xprt.
	Empty every sp_sockets list, under the sp_socks locks.

Then we're left knowing that the sp_sockets lists are all empty and will
stay that way, since any svc_xprt_enqueue() will check XPT_BUSY under
the sp_lock and see it set.

And *then* we can continue deleting the xprt's.

(Thanks to Jeff Layton for being correctly suspicious of this code....)

Cc: Ben Greear <greearb@candelatech.com>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25 17:24:48 -08:00
J. Bruce Fields
7df22768c0 svcrpc: destroy server sockets all at once
commit 2fefb8a09e upstream.

There's no reason I can see that we need to call sv_shutdown between
closing the two lists of sockets.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25 17:24:48 -08:00
J. Bruce Fields
b09577ca66 svcrpc: fix double-free on shutdown of nfsd after changing pool mode
commit 61c8504c42 upstream.

The pool_to and to_pool fields of the global svc_pool_map are freed on
shutdown, but are initialized in nfsd startup only in the
SVC_POOL_PERCPU and SVC_POOL_PERNODE cases.

They *are* initialized to zero on kernel startup.  So as long as you use
only SVC_POOL_GLOBAL (the default), this will never be a problem.

You're also OK if you only ever use SVC_POOL_PERCPU or SVC_POOL_PERNODE.

However, the following sequence events leads to a double-free:

	1. set SVC_POOL_PERCPU or SVC_POOL_PERNODE
	2. start nfsd: both fields are initialized.
	3. shutdown nfsd: both fields are freed.
	4. set SVC_POOL_GLOBAL
	5. start nfsd: the fields are left untouched.
	6. shutdown nfsd: now we try to free them again.

Step 4 is actually unnecessary, since (for some bizarre reason), nfsd
automatically resets the pool mode to SVC_POOL_GLOBAL on shutdown.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25 17:24:47 -08:00
Stanislaw Gruszka
b9e11747e1 mac80211: fix rx->key NULL pointer dereference in promiscuous mode
commit 1140afa862 upstream.

Since:

commit 816c04fe7e
Author: Christian Lamparter <chunkeey@googlemail.com>
Date:   Sat Apr 30 15:24:30 2011 +0200

    mac80211: consolidate MIC failure report handling

is possible to that we dereference rx->key == NULL when driver set
RX_FLAG_MMIC_STRIPPED and not RX_FLAG_IV_STRIPPED and we are in
promiscuous mode. This happen with rt73usb and rt61pci at least.

Before the commit we always check rx->key against NULL, so I assume
fix should be done in mac80211 (also mic_fail path has similar check).

References:
https://bugzilla.redhat.com/show_bug.cgi?id=769766
http://rt2x00.serialmonkey.com/pipermail/users_rt2x00.serialmonkey.com/2012-January/004395.html

Reported-by: Stuart D Gathman <stuart@gathman.org>
Reported-by: Kai Wohlfahrt <kai.scorpio@gmail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25 17:24:43 -08:00
Ben Hutchings
49ffa26eca igmp: Avoid zero delay when receiving odd mixture of IGMP queries
commit a8c1f65c79 upstream.

Commit 5b7c840667 ('ipv4: correct IGMP
behavior on v3 query during v2-compatibility mode') added yet another
case for query parsing, which can result in max_delay = 0.  Substitute
a value of 1, as in the usual v3 case.

Reported-by: Simon McVittie <smcv@debian.org>
References: http://bugs.debian.org/654876
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12 11:35:41 -08:00
Stephen Rothwell
732e81a757 ipv4: using prefetch requires including prefetch.h
[ Upstream commit b9eda06f80 ]

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 14:14:10 -08:00
Eric Dumazet
ad5dd5dc45 ipv4: reintroduce route cache garbage collector
[ Upstream commit 9f28a2fc0b ]

Commit 2c8cec5c10 (ipv4: Cache learned PMTU information in inetpeer)
removed IP route cache garbage collector a bit too soon, as this gc was
responsible for expired routes cleanup, releasing their neighbour
reference.

As pointed out by Robert Gladewitz, recent kernels can fill and exhaust
their neighbour cache.

Reintroduce the garbage collection, since we'll have to wait our
neighbour lookups become refcount-less to not depend on this stuff.

Reported-by: Robert Gladewitz <gladewitz@gmx.de>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 14:14:10 -08:00
Weiping Pan
6c3efb1526 ipv4: flush route cache after change accept_local
[ Upstream commit d01ff0a049 ]

After reset ipv4_devconf->data[IPV4_DEVCONF_ACCEPT_LOCAL] to 0,
we should flush route cache, or it will continue receive packets with local
source address, which should be dropped.

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 14:14:09 -08:00
Thomas Graf
0e5fe3ed8d sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd
[ Upstream commit a76c0adf60 ]

When checking whether a DATA chunk fits into the estimated rwnd a
full sizeof(struct sk_buff) is added to the needed chunk size. This
quickly exhausts the available rwnd space and leads to packets being
sent which are much below the PMTU limit. This can lead to much worse
performance.

The reason for this behaviour was to avoid putting too much memory
pressure on the receiver. The concept is not completely irational
because a Linux receiver does in fact clone an skb for each DATA chunk
delivered. However, Linux also reserves half the available socket
buffer space for data structures therefore usage of it is already
accounted for.

When proposing to change this the last time it was noted that this
behaviour was introduced to solve a performance issue caused by rwnd
overusage in combination with small DATA chunks.

Trying to reproduce this I found that with the sk_buff overhead removed,
the performance would improve significantly unless socket buffer limits
are increased.

The following numbers have been gathered using a patched iperf
supporting SCTP over a live 1 Gbit ethernet network. The -l option
was used to limit DATA chunk sizes. The numbers listed are based on
the average of 3 test runs each. Default values have been used for
sk_(r|w)mem.

Chunk
Size    Unpatched     No Overhead
-------------------------------------
   4    15.2 Kbit [!]   12.2 Mbit [!]
   8    35.8 Kbit [!]   26.0 Mbit [!]
  16    95.5 Kbit [!]   54.4 Mbit [!]
  32   106.7 Mbit      102.3 Mbit
  64   189.2 Mbit      188.3 Mbit
 128   331.2 Mbit      334.8 Mbit
 256   537.7 Mbit      536.0 Mbit
 512   766.9 Mbit      766.6 Mbit
1024   810.1 Mbit      808.6 Mbit

Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 14:14:09 -08:00
Xi Wang
f6e4c89e08 sctp: fix incorrect overflow check on autoclose
[ Upstream commit 2692ba61a8 ]

Commit 8ffd3208 voids the previous patches f6778aab and 810c0719 for
limiting the autoclose value.  If userspace passes in -1 on 32-bit
platform, the overflow check didn't work and autoclose would be set
to 0xffffffff.

This patch defines a max_autoclose (in seconds) for limiting the value
and exposes it through sysctl, with the following intentions.

1) Avoid overflowing autoclose * HZ.

2) Keep the default autoclose bound consistent across 32- and 64-bit
   platforms (INT_MAX / HZ in this patch).

3) Keep the autoclose value consistent between setsockopt() and
   getsockopt() calls.

Suggested-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 14:14:08 -08:00
Eric Dumazet
01d6bbab38 sch_gred: should not use GFP_KERNEL while holding a spinlock
[ Upstream commit 3f1e6d3fd3 ]

gred_change_vq() is called under sch_tree_lock(sch).

This means a spinlock is held, and we are not allowed to sleep in this
context.

We might pre-allocate memory using GFP_KERNEL before taking spinlock,
but this is not suitable for stable material.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 14:14:08 -08:00
Gerlando Falauto
9ec14c04ec net: have ipconfig not wait if no dev is available
[ Upstream commit cd7816d149 ]

previous commit 3fb72f1e6e
makes IP-Config wait for carrier on at least one network device.

Before waiting (predefined value 120s), check that at least one device
was successfully brought up. Otherwise (e.g. buggy bootloader
which does not set the MAC address) there is no point in waiting
for carrier.

Cc: Micha Nelissen <micha@neli.hopto.org>
Cc: Holger Brunck <holger.brunck@keymile.com>
Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 14:14:08 -08:00
Thomas Graf
477a897533 mqprio: Avoid panic if no options are provided
[ Upstream commit 7838f2ce36 ]

Userspace may not provide TCA_OPTIONS, in fact tc currently does
so not do so if no arguments are specified on the command line.
Return EINVAL instead of panicing.

Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 14:14:08 -08:00
Alex Juncu
7eac8f9de2 llc: llc_cmsg_rcv was getting called after sk_eat_skb.
[ Upstream commit 9cef310fcd ]

Received non stream protocol packets were calling llc_cmsg_rcv that used a
skb after that skb was released by sk_eat_skb. This caused received STP
packets to generate kernel panics.

Signed-off-by: Alexandru Juncu <ajuncu@ixiacom.com>
Signed-off-by: Kunjan Naik <knaik@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 14:14:06 -08:00
Johannes Berg
afa2450ce3 mac80211: fix another race in aggregation start
commit 15062e6a85 upstream.

Emmanuel noticed that when mac80211 stops the queues
for aggregation that can leave a packet pending. This
packet will be given to the driver after the AMPDU
callback, but as a non-aggregated packet which messes
up the sequence number etc.

I also noticed by looking at the code that if packets
are being processed while we clear the WANT_START bit,
they might see it cleared already and queue up on
tid_tx->pending. If the driver then rejects the new
aggregation session we leak the packet.

Fix both of these issues by changing this code to not
stop the queues at all. Instead, let packets queue up
on the tid_tx->pending queue instead of letting them
get to the driver, and add code to recover properly
in case the driver rejects the session.

(The patch looks large because it has to move two
functions to before their new use.)

Reported-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 14:13:48 -08:00
Ted Feng
919130981e ipip, sit: copy parms.name after register_netdevice
commit 72b36015ba upstream.

Same fix as 731abb9cb2 for ipip and sit tunnel.
Commit 1c5cae815d removed an explicit call to dev_alloc_name in
ipip_tunnel_locate and ipip6_tunnel_locate, because register_netdevice
will now create a valid name, however the tunnel keeps a copy of the
name in the private parms structure. Fix this by copying the name back
after register_netdevice has successfully returned.

This shows up if you do a simple tunnel add, followed by a tunnel show:

$ sudo ip tunnel add mode ipip remote 10.2.20.211
$ ip tunnel
tunl0: ip/ip  remote any  local any  ttl inherit  nopmtudisc
tunl%d: ip/ip  remote 10.2.20.211  local any  ttl inherit
$ sudo ip tunnel add mode sit remote 10.2.20.212
$ ip tunnel
sit0: ipv6/ip  remote any  local any  ttl 64  nopmtudisc 6rd-prefix 2002::/16
sit%d: ioctl 89f8 failed: No such device
sit%d: ipv6/ip  remote 10.2.20.212  local any  ttl inherit

Signed-off-by: Ted Feng <artisdom@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 14:13:45 -08:00
Nikolay Martynov
5009514a09 mac80211: fix race condition caused by late addBA response
Upstream commit d305a6557b.

If addBA responses comes in just after addba_resp_timer has
expired mac80211 will still accept it and try to open the
aggregation session. This causes drivers to be confused and
in some cases even crash.

This patch fixes the race condition and makes sure that if
addba_resp_timer has expired addBA response is not longer
accepted and we do not try to open half-closed session.

Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com>
[some adjustments]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 12:57:38 -08:00
Luis R. Rodriguez
7e06614ab1 cfg80211: amend regulatory NULL dereference fix
commit 0bac71af6e upstream.

Johannes' patch for "cfg80211: fix regulatory NULL dereference"
broke user regulaotry hints and it did not address the fact that
last_request was left populated even if the previous regulatory
hint was stale due to the wiphy disappearing.

Fix user reguluatory hints by only bailing out if for those
regulatory hints where a request_wiphy is expected. The stale last_request
considerations are addressed through the previous fixes on last_request
where we reset the last_request to a static world regdom request upon
reset_regdomains(). In this case though we further enhance the effect
by simply restoring reguluatory settings completely.

Cc: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:45 -08:00
Luis R. Rodriguez
3ed26be173 cfg80211: fix race on init and driver registration
commit a042994dd3 upstream.

There is a theoretical race that if hit will trigger
a crash. The race is between when we issue the first
regulatory hint, regulatory_hint_core(), gets processed
by the workqueue and between when the first device
gets registered to the wireless core. This is not easy
to reproduce but it was easy to do so through the
regulatory simulator I have been working on. This
is a port of the fix I implemented there [1].

[1] a246ccf81f

Cc: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:45 -08:00
Emmanuel Grumbach
20f8d72586 mac80211: fix race between the AGG SM and the Tx data path
commit 2a1e0fd175 upstream.

When a packet is supposed to sent be as an a-MPDU, mac80211 sets
IEEE80211_TX_CTL_AMPDU to let the driver know. On the other
hand, mac80211 configures the driver for aggregration with the
ampdu_action callback.
There is race between these two mechanisms since the following
scenario can occur when the BA agreement is torn down:

Tx softIRQ	 			drv configuration
==========				=================

check OPERATIONAL bit
Set the TX_CTL_AMPDU bit in the packet

					clear OPERATIONAL bit
					stop Tx AGG
Pass Tx packet to the driver.

In that case the driver would get a packet with TX_CTL_AMPDU set
although it has already been notified that the BA session has been
torn down.

To fix this, we need to synchronize all the Qdisc activity after we
cleared the OPERATIONAL bit. After that step, all the following
packets will be buffered until the driver reports it is ready to get
new packets for this RA / TID. This buffering allows not to run into
another race that would send packets with TX_CTL_AMPDU unset while
the driver hasn't been requested to tear down the BA session yet.

This race occurs in practice and iwlwifi complains with a WARN_ON
when it happens.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:33 -08:00
Johannes Berg
64a1b241dd mac80211: don't stop a single aggregation session twice
commit 24f50a9d16 upstream.

Nikolay noticed (by code review) that mac80211 can
attempt to stop an aggregation session while it is
already being stopped. So to fix it, check whether
stop is already being done and bail out if so.

Also move setting the STOPPING state into the lock
so things are properly atomic.

Reported-by: Nikolay Martynov <mar.kolya@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:33 -08:00
Johannes Berg
6cb4e0db2f cfg80211: fix regulatory NULL dereference
commit de3584bd62 upstream.

By the time userspace returns with a response to
the regulatory domain request, the wiphy causing
the request might have gone away. If this is so,
reject the update but mark the request as having
been processed anyway.

Cc: Luis R. Rodriguez <lrodriguez@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:32 -08:00
Eliad Peller
e30922bd0c nl80211: fix MAC address validation
commit e007b857e8 upstream.

MAC addresses have a fixed length. The current
policy allows passing < ETH_ALEN bytes, which
might result in reading beyond the buffer.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:31 -08:00
Trond Myklebust
421f55945f SUNRPC: Ensure we return EAGAIN in xs_nospace if congestion is cleared
commit 24ca9a8477 upstream.

By returning '0' instead of 'EAGAIN' when the tests in xs_nospace() fail
to find evidence of socket congestion, we are making the RPC engine believe
that the message was incorrectly sent and so it disconnects the socket
instead of just retrying.

The bug appears to have been introduced by commit
5e3771ce2d (SUNRPC: Ensure that xs_nospace
return values are propagated).

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:27 -08:00
Josh Boyer
d11d8cff78 ip6_tunnel: copy parms.name after register_netdevice
commit 731abb9cb2 upstream.

Commit 1c5cae815d removed an explicit call to dev_alloc_name in ip6_tnl_create
because register_netdevice will now create a valid name.  This works for the
net_device itself.

However the tunnel keeps a copy of the name in the parms structure for the
ip6_tnl associated with the tunnel.  parms.name is set by copying the net_device
name in ip6_tnl_dev_init_gen.  That function is called from ip6_tnl_dev_init in
ip6_tnl_create, but it is done before register_netdevice is called so the name
is set to a bogus value in the parms.name structure.

This shows up if you do a simple tunnel add, followed by a tunnel show:

[root@localhost ~]# ip -6 tunnel add remote fec0::100 local fec0::200
[root@localhost ~]# ip -6 tunnel show
ip6tnl0: ipv6/ipv6 remote :: local :: encaplimit 0 hoplimit 0 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
ip6tnl%d: ipv6/ipv6 remote fec0::100 local fec0::200 encaplimit 4 hoplimit 64 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
[root@localhost ~]#

Fix this by moving the strcpy out of ip6_tnl_dev_init_gen, and calling it after
register_netdevice has successfully returned.

Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26 09:09:55 -08:00
Luis R. Rodriguez
5c02e3ae0a cfg80211: fix bug on regulatory core exit on access to last_request
commit 58ebacc66b upstream.

Commit 4d9d88d1 by Scott James Remnant <keybuk@google.com> added
the .uevent() callback for the regulatory device used during
the platform device registration. The change was done to account
for queuing up udev change requests through udevadm triggers.
The change also meant that upon regulatory core exit we will now
send a uevent() but the uevent() callback, reg_device_uevent(),
also accessed last_request. Right before commiting device suicide
we free'd last_request but never set it to NULL so
platform_device_unregister() would lead to bogus kernel paging
request. Fix this and also simply supress uevents right before
we commit suicide as they are pointless.

This fix is required for kernels >= v2.6.39

$ git describe --contains 4d9d88d1
v2.6.39-rc1~468^2~25^2^2~21

The impact of not having this present is that a bogus paging
access may occur (only read) upon cfg80211 unload time. You
may also get this BUG complaint below. Although Johannes
could not reproduce the issue this fix is theoretically correct.

mac80211_hwsim: unregister radios
mac80211_hwsim: closing netlink
BUG: unable to handle kernel paging request at ffff88001a06b5ab
IP: [<ffffffffa030df9a>] reg_device_uevent+0x1a/0x50 [cfg80211]
PGD 1836063 PUD 183a063 PMD 1ffcb067 PTE 1a06b160
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU 0
Modules linked in: cfg80211(-) [last unloaded: mac80211]

Pid: 2279, comm: rmmod Tainted: G        W   3.1.0-wl+ #663 Bochs Bochs
RIP: 0010:[<ffffffffa030df9a>]  [<ffffffffa030df9a>] reg_device_uevent+0x1a/0x50 [cfg80211]
RSP: 0000:ffff88001c5f9d58  EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88001d2eda88 RCX: ffff88001c7468fc
RDX: ffff88001a06b5a0 RSI: ffff88001c7467b0 RDI: ffff88001c7467b0
RBP: ffff88001c5f9d58 R08: 000000000000ffff R09: 000000000000ffff
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88001c7467b0
R13: ffff88001d2eda78 R14: ffffffff8164a840 R15: 0000000000000001
FS:  00007f8a91d8a6e0(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff88001a06b5ab CR3: 000000001c62e000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 2279, threadinfo ffff88001c5f8000, task ffff88000023c780)
Stack:
 ffff88001c5f9d98 ffffffff812ff7e5 ffffffff8176ab3d ffff88001c7468c2
 000000000000ffff ffff88001d2eda88 ffff88001c7467b0 ffff880000114820
 ffff88001c5f9e38 ffffffff81241dc7 ffff88001c5f9db8 ffffffff81040189
Call Trace:
 [<ffffffff812ff7e5>] dev_uevent+0xc5/0x170
 [<ffffffff81241dc7>] kobject_uevent_env+0x1f7/0x490
 [<ffffffff81040189>] ? sub_preempt_count+0x29/0x60
 [<ffffffff814cab1a>] ? _raw_spin_unlock_irqrestore+0x4a/0x90
 [<ffffffff81305307>] ? devres_release_all+0x27/0x60
 [<ffffffff8124206b>] kobject_uevent+0xb/0x10
 [<ffffffff812fee27>] device_del+0x157/0x1b0
 [<ffffffff8130377d>] platform_device_del+0x1d/0x90
 [<ffffffff81303b76>] platform_device_unregister+0x16/0x30
 [<ffffffffa030fffd>] regulatory_exit+0x5d/0x180 [cfg80211]
 [<ffffffffa032bec3>] cfg80211_exit+0x2b/0x45 [cfg80211]
 [<ffffffff8109a84c>] sys_delete_module+0x16c/0x220
 [<ffffffff8108a23e>] ? trace_hardirqs_on_caller+0x7e/0x120
 [<ffffffff814cba02>] system_call_fastpath+0x16/0x1b
Code: <all your base are belong to me>
RIP  [<ffffffffa030df9a>] reg_device_uevent+0x1a/0x50 [cfg80211]
 RSP <ffff88001c5f9d58>
CR2: ffff88001a06b5ab
---[ end trace 147c5099a411e8c0 ]---

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Scott James Remnant <keybuk@google.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26 09:09:55 -08:00
Johannes Berg
c1ce1705eb nl80211: fix HT capability attribute validation
commit 6c7394197a upstream.

Since the NL80211_ATTR_HT_CAPABILITY attribute is
used as a struct, it needs a minimum, not maximum
length. Enforce that properly. Not doing so could
potentially lead to reading after the buffer.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26 09:09:55 -08:00
Johannes Berg
ae1e9df381 mac80211: fix bug in ieee80211_build_probe_req
commit 5b2bbf75a2 upstream.

ieee80211_probereq_get() can return NULL in
which case we should clean up & return NULL
in ieee80211_build_probe_req() as well.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26 09:09:54 -08:00
Johannes Berg
492d7eff2d mac80211: fix NULL dereference in radiotap code
commit f8d1ccf155 upstream.

When receiving failed PLCP frames is enabled, there
won't be a rate pointer when we add the radiotap
header and thus the kernel will crash. Fix this by
not assuming the rate pointer is always valid. It's
still always valid for frames that have good PLCP
though, and that is checked & enforced.

This was broken by my
commit fc88518916
Author: Johannes Berg <johannes.berg@intel.com>
Date:   Fri Jul 30 13:23:12 2010 +0200

    mac80211: don't check rates on PLCP error frames

where I removed the check in this case but didn't
take into account that the rate info would be used.

Reported-by: Xiaokang Qin <xiaokang.qin@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26 09:09:54 -08:00
dpward
3fa57c1bf5 net: Handle different key sizes between address families in flow cache
commit aa1c366e4f upstream.

With the conversion of struct flowi to a union of AF-specific structs, some
operations on the flow cache need to account for the exact size of the key.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:37:17 -08:00
Johannes Berg
041f9e20b7 mac80211: disable powersave for broken APs
commit 05cb910857 upstream.

Only AID values 1-2007 are valid, but some APs have been
found to send random bogus values, in the reported case an
AP that was sending the AID field value 0xffff, an AID of
0x3fff (16383).

There isn't much we can do but disable powersave since
there's no way it can work properly in this case.

Reported-by: Bill C Riemers <briemers@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:37:13 -08:00
Eliad Peller
42c6d01ce8 mac80211: config hw when going back on-channel
commit 6911bf0453 upstream.

When going back on-channel, we should reconfigure
the hw iff the hardware is not already configured
to the operational channel.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:37:12 -08:00
Eliad Peller
632abf8b3f mac80211: fix remain_off_channel regression
commit eaa7af2ae5 upstream.

The offchannel code is currently broken - we should
remain_off_channel if the work was started, and
the work's channel and channel_type are the same
as local->tmp_channel and local->tmp_channel_type.

However, if wk->chan_type and local->tmp_channel_type
coexist (e.g. have the same channel type), we won't
remain_off_channel.

This behavior was introduced by commit da2fd1f
("mac80211: Allow work items to use existing
channel type.")

Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:37:12 -08:00
NeilBrown
6fa9e3e3e0 NFS/sunrpc: don't use a credential with extra groups.
commit dc6f55e9f8 upstream.

The sunrpc layer keeps a cache of recently used credentials and
'unx_match' is used to find the credential which matches the current
process.

However unx_match allows a match when the cached credential has extra
groups at the end of uc_gids list which are not in the process group list.

So if a process with a list of (say) 4 group accesses a file and gains
access because of the last group in the list, then another process
with the same uid and gid, and a gid list being the first tree of the
gids of the original process tries to access the file, it will be
granted access even though it shouldn't as the wrong rpc credential
will be used.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:37:07 -08:00
Thomas Gleixner
5796ee3058 net: Unlock sock before calling sk_free()
[ Upstream commit b0691c8ee7 ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:36:50 -08:00
stephen hemminger
ce0f562ecf bridge: leave carrier on for empty bridge
[ Upstream commit b64b73d7d0 ]

This resolves a regression seen by some users of bridging.
Some users use the bridge like a dummy device.
They expect to be able to put an IPv6 address on the device
with no ports attached. Although there are better ways of doing
this, there is no reason to not allow it.

Note: the bridge still will reflect the state of ports in the
bridge if there are any added.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:36:49 -08:00
Oliver Hartkopp
8adc3d3df0 can bcm: fix incomplete tx_setup fix
commit 12d0d0d3a7 upstream.

The commit aabdcb0b55 ("can bcm: fix tx_setup
off-by-one errors") fixed only a part of the original problem reported by
Andre Naujoks. It turned out that the original code needed to be re-ordered
to reduce complexity and to finally fix the reported frame counting issues.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:36:45 -08:00
Willem de Bruijn
62d8d0b9b6 make PACKET_STATISTICS getsockopt report consistently between ring and non-ring
[ Upstream commit 7091fbd82c ]

This is a minor change.

Up until kernel 2.6.32, getsockopt(fd, SOL_PACKET, PACKET_STATISTICS,
...) would return total and dropped packets since its last invocation. The
introduction of socket queue overflow reporting [1] changed drop
rate calculation in the normal packet socket path, but not when using a
packet ring. As a result, the getsockopt now returns different statistics
depending on the reception method used. With a ring, it still returns the
count since the last call, as counts are incremented in tpacket_rcv and
reset in getsockopt. Without a ring, it returns 0 if no drops occurred
since the last getsockopt and the total drops over the lifespan of
the socket otherwise. The culprit is this line in packet_rcv, executed
on a drop:

drop_n_acct:
        po->stats.tp_drops = atomic_inc_return(&sk->sk_drops);

As it shows, the new drop number it taken from the socket drop counter,
which is not reset at getsockopt. I put together a small example
that demonstrates the issue [2]. It runs for 10 seconds and overflows
the queue/ring on every odd second. The reported drop rates are:
ring: 16, 0, 16, 0, 16, ...
non-ring: 0, 15, 0, 30, 0, 46, 0, 60, 0 , 74.

Note how the even ring counts monotonically increase. Because the
getsockopt adds tp_drops to tp_packets, total counts are similarly
reported cumulatively. Long story short, reinstating the original code, as
the below patch does, fixes the issue at the cost of additional per-packet
cycles. Another solution that does not introduce per-packet overhead
is be to keep the current data path, record the value of sk_drops at
getsockopt() at call N in a new field in struct packetsock and subtract
that when reporting at call N+1. I'll be happy to code that, instead,
it's just more messy.

[1] http://patchwork.ozlabs.org/patch/35665/
[2] http://kernel.googlecode.com/files/test-packetsock-getstatistics.c

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:36:29 -08:00
Yan, Zheng
2146d4667b ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket
[ Upstream commit 676a1184e8 ]

ipv6_ac_list and ipv6_fl_list from listening socket are inadvertently
shared with new socket created for connection.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:36:28 -08:00