Commit Graph

2211 Commits

Author SHA1 Message Date
Eric Dumazet
c5a07578be netpoll: fix netpoll_send_udp() bugs
[ Upstream commit 954fba0274 ]

Bogdan Hamciuc diagnosed and fixed following bug in netpoll_send_udp() :

"skb->len += len;" instead of "skb_put(skb, len);"

Meaning that _if_ a network driver needs to call skb_realloc_headroom(),
only packet headers would be copied, leaving garbage in the payload.

However the skb_realloc_headroom() must be avoided as much as possible
since it requires memory and netpoll tries hard to work even if memory
is exhausted (using a pool of preallocated skbs)

It appears netpoll_send_udp() reserved 16 bytes for the ethernet header,
which happens to work for typicall drivers but not all.

Right thing is to use LL_RESERVED_SPACE(dev)
(And also add dev->needed_tailroom of tailroom)

This patch combines both fixes.

Many thanks to Bogdan for raising this issue.

Reported-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16 08:47:38 -07:00
Michał Mirosław
d3a673fb54 ethtool: allow ETHTOOL_GSSET_INFO for users
[ Upstream commit f80400a26a ]

Allow ETHTOOL_GSSET_INFO ethtool ioctl() for unprivileged users.
ETHTOOL_GSTRINGS is already allowed, but is unusable without this one.

Signed-off-by: Micha©© Miros©©aw <mirq-linux@rere.qmqm.pl>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16 08:47:37 -07:00
Jason Wang
325b4161ba net: sock: validate data_len before allocating skb in sock_alloc_send_pskb()
[ Upstream commit cc9b17ad29 ]

We need to validate the number of pages consumed by data_len, otherwise frags
array could be overflowed by userspace. So this patch validate data_len and
return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16 08:47:36 -07:00
David S. Miller
83bba79790 Revert "net: maintain namespace isolation between vlan and real device"
[ Upstream commit 59b9997bab ]

This reverts commit 8a83a00b07.

It causes regressions for S390 devices, because it does an
unconditional DST drop on SKBs for vlans and the QETH device
needs the neighbour entry hung off the DST for certain things
on transmit.

Arnd can't remember exactly why he even needed this change.

Conflicts:

	drivers/net/macvlan.c
	net/8021q/vlan_dev.c
	net/core/dev.c

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10 00:33:03 +09:00
Eric Dumazet
a2abc1310f pktgen: fix module unload for good
[ Upstream commit d4b1133558 ]

commit c57b546840 (pktgen: fix crash at module unload) did a very poor
job with list primitives.

1) list_splice() arguments were in the wrong order

2) list_splice(list, head) has undefined behavior if head is not
initialized.

3) We should use the list_splice_init() variant to clear pktgen_threads
list.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10 00:33:03 +09:00
Eric Dumazet
45d21c1368 pktgen: fix crash at module unload
[ Upstream commit c57b546840 ]

commit 7d3d43dab4 (net: In unregister_netdevice_notifier unregister
the netdevices.) makes pktgen crashing at module unload.

[  296.820578] BUG: spinlock bad magic on CPU#6, rmmod/3267
[  296.820719]  lock: ffff880310c38000, .magic: ffff8803, .owner: <none>/-1, .owner_cpu: -1
[  296.820943] Pid: 3267, comm: rmmod Not tainted 3.4.0-rc5+ #254
[  296.821079] Call Trace:
[  296.821211]  [<ffffffff8168a715>] spin_dump+0x8a/0x8f
[  296.821345]  [<ffffffff8168a73b>] spin_bug+0x21/0x26
[  296.821507]  [<ffffffff812b4741>] do_raw_spin_lock+0x131/0x140
[  296.821648]  [<ffffffff8169188e>] _raw_spin_lock+0x1e/0x20
[  296.821786]  [<ffffffffa00cc0fd>] __pktgen_NN_threads+0x4d/0x140 [pktgen]
[  296.821928]  [<ffffffffa00ccf8d>] pktgen_device_event+0x10d/0x1e0 [pktgen]
[  296.822073]  [<ffffffff8154ed4f>] unregister_netdevice_notifier+0x7f/0x100
[  296.822216]  [<ffffffffa00d2a0b>] pg_cleanup+0x48/0x73 [pktgen]
[  296.822357]  [<ffffffff8109528e>] sys_delete_module+0x17e/0x2a0
[  296.822502]  [<ffffffff81699652>] system_call_fastpath+0x16/0x1b

Hold the pktgen_thread_lock while splicing pktgen_threads, and test
pktgen_exiting in pktgen_device_event() to make unload faster.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10 00:33:03 +09:00
Eric W. Biederman
4e133084e9 net: In unregister_netdevice_notifier unregister the netdevices.
[ Upstream commit 7d3d43dab4 ]

We already synthesize events in register_netdevice_notifier and synthesizing
events in unregister_netdevice_notifier allows to us remove the need for
special case cleanup code.

This change should be safe as it adds no new cases for existing callers
of unregiser_netdevice_notifier to handle.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-21 09:39:59 -07:00
Julian Anastasov
0958c122f4 netns: do not leak net_generic data on failed init
[ Upstream commit b922934d01 ]

ops_init should free the net_generic data on
init failure and __register_pernet_operations should not
call ops_free when NET_NS is not enabled.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-27 09:51:21 -07:00
Eric Dumazet
6d7946bd33 net: fix a race in sock_queue_err_skb()
[ Upstream commit 110c43304d ]

As soon as an skb is queued into socket error queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-27 09:51:19 -07:00
Eric Dumazet
99b8230dac net: fix napi_reuse_skb() skb reserve
[ Upstream commit 2a2a459eee ]

napi->skb is allocated in napi_get_frags() using
netdev_alloc_skb_ip_align(), with a reserve of NET_SKB_PAD +
NET_IP_ALIGN bytes.

However, when such skb is recycled in napi_reuse_skb(), it ends with a
reserve of NET_IP_ALIGN which is suboptimal.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02 09:27:21 -07:00
Michel Machado
035e3f6e8d neighbour: Fixed race condition at tbl->nht
[ Upstream commit 84338a6c9d ]

When the fixed race condition happens:

1. While function neigh_periodic_work scans the neighbor hash table
pointed by field tbl->nht, it unlocks and locks tbl->lock between
buckets in order to call cond_resched.

2. Assume that function neigh_periodic_work calls cond_resched, that is,
the lock tbl->lock is available, and function neigh_hash_grow runs.

3. Once function neigh_hash_grow finishes, and RCU calls
neigh_hash_free_rcu, the original struct neigh_hash_table that function
neigh_periodic_work was using doesn't exist anymore.

4. Once back at neigh_periodic_work, whenever the old struct
neigh_hash_table is accessed, things can go badly.

Signed-off-by: Michel Machado <michel@digirati.com.br>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19 08:57:45 -07:00
Eric Dumazet
9f8a28dca6 netpoll: netpoll_poll_dev() should access dev->flags
[ Upstream commit 58e05f357a ]

commit 5a698af53f (bond: service netpoll arp queue on master device)
tested IFF_SLAVE flag against dev->priv_flags instead of dev->flags

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: WANG Cong <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29 16:34:06 -08:00
Eric Dumazet
32fa5d8323 gro: more generic L2 header check
[ Upstream commit 5ca3b72c5d ]

Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
only, and failed on IB/ipoib traffic.

He provided a patch faking a zeroed header to let GRO aggregates frames.

Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
check to be more generic, ie not assuming L2 header is 14 bytes, but
taking into account hard_header_len.

__napi_gro_receive() has special handling for the common case (Ethernet)
to avoid a memcmp() call and use an inline optimized function instead.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Shlomo Pongratz <shlomop@mellanox.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29 16:33:46 -08:00
Eric Dumazet
8a533666d1 net: fix NULL dereferences in check_peer_redir()
[ Upstream commit d3aaeb38c4, along
  with dependent backports of commits:
     69cce1d140
     9de79c127c
     218fa90f07
     580da35a31
     f7e57044ee
     e049f28883 ]

Gergely Kalman reported crashes in check_peer_redir().

It appears commit f39925dbde (ipv4: Cache learned redirect
information in inetpeer.) added a race, leading to possible NULL ptr
dereference.

Since we can now change dst neighbour, we should make sure a reader can
safely use a neighbour.

Add RCU protection to dst neighbour, and make sure check_peer_redir()
can be called safely by different cpus in parallel.

As neighbours are already freed after one RCU grace period, this patch
should not add typical RCU penalty (cache cold effects)

Many thanks to Gergely for providing a pretty report pointing to the
bug.

Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-13 11:06:13 -08:00
Eric Dumazet
561331eae0 netns: fix net_alloc_generic()
[ Upstream commit 073862ba5d ]

When a new net namespace is created, we should attach to it a "struct
net_generic" with enough slots (even empty), or we can hit the following
BUG_ON() :

[  200.752016] kernel BUG at include/net/netns/generic.h:40!
...
[  200.752016]  [<ffffffff825c3cea>] ? get_cfcnfg+0x3a/0x180
[  200.752016]  [<ffffffff821cf0b0>] ? lockdep_rtnl_is_held+0x10/0x20
[  200.752016]  [<ffffffff825c41be>] caif_device_notify+0x2e/0x530
[  200.752016]  [<ffffffff810d61b7>] notifier_call_chain+0x67/0x110
[  200.752016]  [<ffffffff810d67c1>] raw_notifier_call_chain+0x11/0x20
[  200.752016]  [<ffffffff821bae82>] call_netdevice_notifiers+0x32/0x60
[  200.752016]  [<ffffffff821c2b26>] register_netdevice+0x196/0x300
[  200.752016]  [<ffffffff821c2ca9>] register_netdev+0x19/0x30
[  200.752016]  [<ffffffff81c1c67a>] loopback_net_init+0x4a/0xa0
[  200.752016]  [<ffffffff821b5e62>] ops_init+0x42/0x180
[  200.752016]  [<ffffffff821b600b>] setup_net+0x6b/0x100
[  200.752016]  [<ffffffff821b6466>] copy_net_ns+0x86/0x110
[  200.752016]  [<ffffffff810d5789>] create_new_namespaces+0xd9/0x190

net_alloc_generic() should take into account the maximum index into the
ptr array, as a subsystem might use net_generic() anytime.

This also reduces number of reallocations in net_assign_generic()

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sjur Brændeland <sjur.brandeland@stericsson.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03 09:19:03 -08:00
dpward
3fa57c1bf5 net: Handle different key sizes between address families in flow cache
commit aa1c366e4f upstream.

With the conversion of struct flowi to a union of AF-specific structs, some
operations on the flow cache need to account for the exact size of the key.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:37:17 -08:00
Thomas Gleixner
5796ee3058 net: Unlock sock before calling sk_free()
[ Upstream commit b0691c8ee7 ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:36:50 -08:00
Richard Cochran
babba877da net: hold sock reference while processing tx timestamps
commit da92b194cc upstream.

The pair of functions,

 * skb_clone_tx_timestamp()
 * skb_complete_tx_timestamp()

were designed to allow timestamping in PHY devices. The first
function, called during the MAC driver's hard_xmit method, identifies
PTP protocol packets, clones them, and gives them to the PHY device
driver. The PHY driver may hold onto the packet and deliver it at a
later time using the second function, which adds the packet to the
socket's error queue.

As pointed out by Johannes, nothing prevents the socket from
disappearing while the cloned packet is sitting in the PHY driver
awaiting a timestamp. This patch fixes the issue by taking a reference
on the socket for each such packet. In addition, the comments
regarding the usage of these function are expanded to highlight the
rule that PHY drivers must use skb_complete_tx_timestamp() to release
the packet, in order to release the socket reference, too.

These functions first appeared in v2.6.36.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:35:52 -08:00
Eric W. Biederman
32779fa065 rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
commit d2237d3574 upstream.

Renato Westphal noticed that since commit a2835763e1
"rtnetlink: handle rtnl_link netlink notifications manually" was merged
we no longer send a netlink message when a networking device is moved
from one network namespace to another.

Fix this by adding the missing manual notification in dev_change_net_namespaces.

Since all network devices that are processed by dev_change_net_namspaces are
in the initialized state the complicated tests that guard the manual
rtmsg_ifinfo calls in rollback_registered and register_netdevice are
unnecessary and we can just perform a plain notification.

Tested-by: Renato Westphal <renatowestphal@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:35:50 -08:00
Tim Chen
265d5c2eb2 scm: Capture the full credentials of the scm sender
[ Upstream commit e33f7a9f37 ]

This patch corrects an erroneous update of credential's gid with uid
introduced in commit 257b5358b3 since 2.6.36.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:54 -07:00
Gao feng
cbab190c50 fib:fix BUG_ON in fib_nl_newrule when add new fib rule
[ Upstream commit 561dac2d41 ]

add new fib rule can cause BUG_ON happen
the reproduce shell is
ip rule add pref 38
ip rule add pref 38
ip rule add to 192.168.3.0/24 goto 38
ip rule del pref 38
ip rule add to 192.168.3.0/24 goto 38
ip rule add pref 38

then the BUG_ON will happen
del BUG_ON and use (ctarget == NULL) identify whether this rule is unresolved

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:51 -07:00
Eric Dumazet
8e24aecbcd arp: fix rcu lockdep splat in arp_process()
[ Upstream commit 20e6074eb8 ]

Dave Jones reported a lockdep splat triggered by an arp_process() call
from parp_redo().

Commit faa9dcf793 (arp: RCU changes) is the origin of the bug, since
it assumed arp_process() was called under rcu_read_lock(), which is not
true in this particular path.

Instead of adding rcu_read_lock() in parp_redo(), I chose to add it in
neigh_proxy_process() to take care of IPv6 side too.

 ===================================================
 [ INFO: suspicious rcu_dereference_check() usage. ]
 ---------------------------------------------------
 include/linux/inetdevice.h:209 invoked rcu_dereference_check() without
protection!

 other info that might help us debug this:

 rcu_scheduler_active = 1, debug_locks = 0
 4 locks held by setfiles/2123:
  #0:  (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [<ffffffff8114cbc4>]
walk_component+0x1ef/0x3e8
  #1:  (&isec->lock){+.+.+.}, at: [<ffffffff81204bca>]
inode_doinit_with_dentry+0x3f/0x41f
  #2:  (&tbl->proxy_timer){+.-...}, at: [<ffffffff8106a803>]
run_timer_softirq+0x157/0x372
  #3:  (class){+.-...}, at: [<ffffffff8141f256>] neigh_proxy_process
+0x36/0x103

 stack backtrace:
 Pid: 2123, comm: setfiles Tainted: G        W
3.1.0-0.rc2.git7.2.fc16.x86_64 #1
 Call Trace:
  <IRQ>  [<ffffffff8108ca23>] lockdep_rcu_dereference+0xa7/0xaf
  [<ffffffff8146a0b7>] __in_dev_get_rcu+0x55/0x5d
  [<ffffffff8146a751>] arp_process+0x25/0x4d7
  [<ffffffff8146ac11>] parp_redo+0xe/0x10
  [<ffffffff8141f2ba>] neigh_proxy_process+0x9a/0x103
  [<ffffffff8106a8c4>] run_timer_softirq+0x218/0x372
  [<ffffffff8106a803>] ? run_timer_softirq+0x157/0x372
  [<ffffffff8141f220>] ? neigh_stat_seq_open+0x41/0x41
  [<ffffffff8108f2f0>] ? mark_held_locks+0x6d/0x95
  [<ffffffff81062bb6>] __do_softirq+0x112/0x25a
  [<ffffffff8150d27c>] call_softirq+0x1c/0x30
  [<ffffffff81010bf5>] do_softirq+0x4b/0xa2
  [<ffffffff81062f65>] irq_exit+0x5d/0xcf
  [<ffffffff8150dc11>] smp_apic_timer_interrupt+0x7c/0x8a
  [<ffffffff8150baf3>] apic_timer_interrupt+0x73/0x80
  <EOI>  [<ffffffff8108f439>] ? trace_hardirqs_on_caller+0x121/0x158
  [<ffffffff814fc285>] ? __slab_free+0x30/0x24c
  [<ffffffff814fc283>] ? __slab_free+0x2e/0x24c
  [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f
  [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f
  [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f
  [<ffffffff81130cb0>] kfree+0x108/0x131
  [<ffffffff81204e74>] inode_doinit_with_dentry+0x2e9/0x41f
  [<ffffffff81204fc6>] selinux_d_instantiate+0x1c/0x1e
  [<ffffffff81200f4f>] security_d_instantiate+0x21/0x23
  [<ffffffff81154625>] d_instantiate+0x5c/0x61
  [<ffffffff811563ca>] d_splice_alias+0xbc/0xd2
  [<ffffffff811b17ff>] ext4_lookup+0xba/0xeb
  [<ffffffff8114bf1e>] d_alloc_and_lookup+0x45/0x6b
  [<ffffffff8114cbea>] walk_component+0x215/0x3e8
  [<ffffffff8114cdf8>] lookup_last+0x3b/0x3d
  [<ffffffff8114daf3>] path_lookupat+0x82/0x2af
  [<ffffffff8110fc53>] ? might_fault+0xa5/0xac
  [<ffffffff8110fc0a>] ? might_fault+0x5c/0xac
  [<ffffffff8114c564>] ? getname_flags+0x31/0x1ca
  [<ffffffff8114dd48>] do_path_lookup+0x28/0x97
  [<ffffffff8114df2c>] user_path_at+0x59/0x96
  [<ffffffff811467ad>] ? cp_new_stat+0xf7/0x10d
  [<ffffffff811469a6>] vfs_fstatat+0x44/0x6e
  [<ffffffff811469ee>] vfs_lstat+0x1e/0x20
  [<ffffffff81146b3d>] sys_newlstat+0x1a/0x33
  [<ffffffff8108f439>] ? trace_hardirqs_on_caller+0x121/0x158
  [<ffffffff812535fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff8150af82>] system_call_fastpath+0x16/0x1b

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:50 -07:00
stephen hemminger
c8656c500d net: allow netif_carrier to be called safely from IRQ
[ Upstream commit 1821f7cd65 ]

As reported by Ben Greer and Froncois Romieu. The code path in
the netif_carrier code leads it to try and disable
a late workqueue to reenable it immediately
netif_carrier_on
-> linkwatch_fire_event
   -> linkwatch_schedule_work
      -> cancel_delayed_work
         -> del_timer_sync

If __cancel_delayed_work is used instead then there is no
problem of waiting for running linkwatch_event.

There is a race between linkwatch_event running re-scheduling
but it is harmless to schedule an extra scan of the linkwatch queue.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-15 18:31:39 -07:00
Neil Horman
60f17a7798 net: add IFF_SKB_TX_SHARED flag to priv_flags
[ Upstream commit d887331506 ]

Pktgen attempts to transmit shared skbs to net devices, which can't be used by
some drivers as they keep state information in skbs.  This patch adds a flag
marking drivers as being able to handle shared skbs in their tx path.  Drivers
are defaulted to being unable to do so, but calling ether_setup enables this
flag, as 90% of the drivers calling ether_setup touch real hardware and can
handle shared skbs.  A subsequent patch will audit drivers to ensure that the
flag is set properly

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Jiri Pirko <jpirko@redhat.com>
CC: Robert Olsson <robert.olsson@its.uu.se>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-15 18:31:38 -07:00
David S. Miller
e997d47bff net: Compute protocol sequence numbers and fragment IDs using MD5.
Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.

MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)

Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation.  So the periodic
regeneration and 8-bit counter have been removed.  We compute and
use a full 32-bit sequence number.

For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.

Reported-by: Dan Kaminsky <dan@doxpara.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-15 18:31:35 -07:00
Ben Hutchings
3de8ae6c0d ethtool: Allow zero-length register dumps again
commit 67ae7cf1ee upstream.

Some drivers (ab)use the ethtool_ops::get_regs operation to expose
only a hardware revision ID.  Commit
a77f5db361 ('ethtool: Allocate register
dump buffer with vmalloc()') had the side-effect of breaking these, as
vmalloc() returns a null pointer for size=0 whereas kmalloc() did not.

For backward-compatibility, allow zero-length dumps again.

Reported-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04 21:58:34 -07:00
David S. Miller
957c665f37 ipv6: Don't put artificial limit on routing table size.
IPV6, unlike IPV4, doesn't have a routing cache.

Routing table entries, as well as clones made in response
to route lookup requests, all live in the same table.  And
all of these things are together collected in the destination
cache table for ipv6.

This means that routing table entries count against the garbage
collection limits, even though such entries cannot ever be reclaimed
and are added explicitly by the administrator (rather than being
created in response to lookups).

Therefore it makes no sense to count ipv6 routing table entries
against the GC limits.

Add a DST_NOCOUNT destination cache entry flag, and skip the counting
if it is set.  Use this flag bit in ipv6 when adding routing table
entries.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-01 17:30:43 -07:00
Linus Torvalds
8dac6bee32 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  AFS: Use i_generation not i_version for the vnode uniquifier
  AFS: Set s_id in the superblock to the volume name
  vfs: Fix data corruption after failed write in __block_write_begin()
  afs: afs_fill_page reads too much, or wrong data
  VFS: Fix vfsmount overput on simultaneous automount
  fix wrong iput on d_inode introduced by e6bc45d65d
  Delay struct net freeing while there's a sysfs instance refering to it
  afs: fix sget() races, close leak on umount
  ubifs: fix sget races
  ubifs: split allocation of ubifs_info into a separate function
  fix leak in proc_set_super()
2011-06-16 10:21:59 -07:00
Al Viro
a685e08987 Delay struct net freeing while there's a sysfs instance refering to it
* new refcount in struct net, controlling actual freeing of the memory
	* new method in kobj_ns_type_operations (->drop_ns())
	* ->current_ns() semantics change - it's supposed to be followed by
corresponding ->drop_ns().  For struct net in case of CONFIG_NET_NS it bumps
the new refcount; net_drop_ns() decrements it and calls net_free() if the
last reference has been dropped.  Method renamed to ->grab_current_ns().
	* old net_free() callers call net_drop_ns() instead.
	* sysfs_exit_ns() is gone, along with a large part of callchain
leading to it; now that the references stored in ->ns[...] stay valid we
do not need to hunt them down and replace them with NULL.  That fixes
problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
of sb->s_instances abuse.

	Note that struct net *shutdown* logics has not changed - net_cleanup()
is called exactly when it used to be called.  The only thing postponed by
having a sysfs instance refering to that struct net is actual freeing of
memory occupied by struct net.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-12 17:45:41 -04:00
Dan Carpenter
83fe32de63 netpoll: call dev_put() on error in netpoll_setup()
There is a dev_put(ndev) missing on an error path.  This was
introduced in 0c1ad04aec "netpoll: prevent netpoll setup on slave
devices".

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-11 18:55:22 -07:00
Jiri Pirko
0b5c9db1b1 vlan: Fix the ingress VLAN_FLAG_REORDER_HDR check
Testing of VLAN_FLAG_REORDER_HDR does not belong in vlan_untag
but rather in vlan_do_receive.  Otherwise the vlan header
will not be properly put on the packet in the case of
vlan header accelleration.

As we remove the check from vlan_check_reorder_header
rename it vlan_reorder_header to keep the naming clean.

Fix up the skb->pkt_type early so we don't look at the packet
after adding the vlan tag, which guarantees we don't goof
and look at the wrong field.

Use a simple if statement instead of a complicated switch
statement to decided that we need to increment rx_stats
for a multicast packet.

Hopefully at somepoint we will just declare the case where
VLAN_FLAG_REORDER_HDR is cleared as unsupported and remove
the code.  Until then this keeps it working correctly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-11 16:15:50 -07:00
WANG Cong
0c1ad04aec netpoll: prevent netpoll setup on slave devices
In commit 8d8fc29d02
(netpoll: disable netpoll when enslave a device), we automatically
disable netpoll when the underlying device is being enslaved,
we also need to prevent people from setuping netpoll on
devices that are already enslaved.

Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-09 00:28:13 -07:00
Heiko Carstens
264524d5e5 net: cpu offline cause napi stall
Frank Blaschka reported :
<quote>
  During heavy network load we turn off/on cpus.
  Sometimes this causes a stall on the network device.
  Digging into the dump I found out following:

  napi is scheduled but does not run. From the I/O buffers
  and the napi state I see napi/rx_softirq processing has stopped
  because the budget was reached. napi stays in the
  softnet_data poll_list and the rx_softirq was raised again.

  I assume at this time the cpu offline comes in,
  the rx softirq is raised/moved to another cpu but napi stays in the
  poll_list of the softnet_data of the now offline cpu.

  Reviewing dev_cpu_callback (net/core/dev.c) I did not find the
  poll_list is transfered to the new cpu.
</quote>

This patch is a straightforward implementation of Frank suggestion :

Transfert poll_list and trigger NET_RX_SOFTIRQ on new cpu.

Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-07 01:01:22 -07:00
David S. Miller
3019de124b net: Rework netdev_drivername() to avoid warning.
This interface uses a temporary buffer, but for no real reason.
And now can generate warnings like:

net/sched/sch_generic.c: In function dev_watchdog
net/sched/sch_generic.c:254:10: warning: unused variable drivername

Just return driver->name directly or "".

Reported-by: Connor Hansen <cmdkhh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-06 16:41:33 -07:00
Al Viro
c316e6a308 get_net_ns_by_fd() oopses if proc_ns_fget() returns an error
BTW, looking through the code related to struct net lifetime rules has
caught something else:

struct net *get_net_ns_by_fd(int fd)
{
        ...
        file = proc_ns_fget(fd);
        if (!file)
                goto out;

        ei = PROC_I(file->f_dentry->d_inode);

while in proc_ns_fget() we have two return ERR_PTR(...) and not a single
path that would return NULL.  The other caller of proc_ns_fget() treats
ERR_PTR() correctly...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-05 14:11:09 -07:00
Linus Torvalds
0e833d8cfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
  tg3: Fix tg3_skb_error_unmap()
  net: tracepoint of net_dev_xmit sees freed skb and causes panic
  drivers/net/can/flexcan.c: add missing clk_put
  net: dm9000: Get the chip in a known good state before enabling interrupts
  drivers/net/davinci_emac.c: add missing clk_put
  af-packet: Add flag to distinguish VID 0 from no-vlan.
  caif: Fix race when conditionally taking rtnl lock
  usbnet/cdc_ncm: add missing .reset_resume hook
  vlan: fix typo in vlan_dev_hard_start_xmit()
  net/ipv4: Check for mistakenly passed in non-IPv4 address
  iwl4965: correctly validate temperature value
  bluetooth l2cap: fix locking in l2cap_global_chan_by_psm
  ath9k: fix two more bugs in tx power
  cfg80211: don't drop p2p probe responses
  Revert "net: fix section mismatches"
  drivers/net/usb/catc.c: Fix potential deadlock in catc_ctrl_run()
  sctp: stop pending timers and purge queues when peer restart asoc
  drivers/net: ks8842 Fix crash on received packet when in PIO mode.
  ip_options_compile: properly handle unaligned pointer
  iwlagn: fix incorrect PCI subsystem id for 6150 devices
  ...
2011-06-04 23:16:00 +09:00
Koki Sanagi
ec764bf083 net: tracepoint of net_dev_xmit sees freed skb and causes panic
Because there is a possibility that skb is kfree_skb()ed and zero cleared
after ndo_start_xmit, we should not see the contents of skb like skb->len and
skb->dev->name after ndo_start_xmit. But trace_net_dev_xmit does that
and causes panic by NULL pointer dereference.
This patch fixes trace_net_dev_xmit not to see the contents of skb directly.

If you want to reproduce this panic,

1. Get tracepoint of net_dev_xmit on
2. Create 2 guests on KVM
2. Make 2 guests use virtio_net
4. Execute netperf from one to another for a long time as a network burden
5. host will panic(It takes about 30 minutes)

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-02 14:06:31 -07:00
Linus Torvalds
10799db60c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  net: Kill ratelimit.h dependency in linux/net.h
  net: Add linux/sysctl.h includes where needed.
  net: Kill ether_table[] declaration.
  inetpeer: fix race in unused_list manipulations
  atm: expose ATM device index in sysfs
  IPVS: bug in ip_vs_ftp, same list heaad used in all netns.
  bug.h: Move ratelimit warn interfaces to ratelimit.h
  bonding: cleanup module option descriptions
  net:8021q:vlan.c Fix pr_info to just give the vlan fullname and version.
  net: davinci_emac: fix dev_err use at probe
  can: convert to %pK for kptr_restrict support
  net: fix ETHTOOL_SFEATURES compatibility with old ethtool_ops.set_flags
  netfilter: Fix several warnings in compat_mtw_from_user().
  netfilter: ipset: fix ip_set_flush return code
  netfilter: ipset: remove unused variable from type_pf_tdel()
  netfilter: ipset: Use proper timeout value to jiffies conversion
2011-05-27 11:16:27 -07:00
David S. Miller
c5c177b4ac net: Kill ratelimit.h dependency in linux/net.h
Ingo Molnar noticed that we have this unnecessary ratelimit.h
dependency in linux/net.h, which hid compilation problems from
people doing builds only with CONFIG_NET enabled.

Move this stuff out to a seperate net/net_ratelimit.h file and
include that in the only two places where this thing is needed.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
2011-05-27 13:41:33 -04:00
David S. Miller
86e4ca66e8 bug.h: Move ratelimit warn interfaces to ratelimit.h
As reported by Ingo Molnar, we still have configuration combinations
where use of the WARN_RATELIMIT interfaces break the build because
dependencies don't get met.

Instead of going down the long road of trying to make it so that
ratelimit.h can get included by kernel.h or asm-generic/bug.h,
just move the interface into ratelimit.h and make users have
to include that.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
2011-05-26 15:00:31 -04:00
Michał Mirosław
fd0daf9d58 net: fix ETHTOOL_SFEATURES compatibility with old ethtool_ops.set_flags
Current code squashes flags to bool - this makes set_flags fail whenever
some ETH_FLAG_* equivalent features are set. Fix this.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-26 14:13:59 -04:00
Linus Torvalds
14d74e0cab Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd:
  net: fix get_net_ns_by_fd for !CONFIG_NET_NS
  ns proc: Return -ENOENT for a nonexistent /proc/self/ns/ entry.
  ns: Declare sys_setns in syscalls.h
  net: Allow setting the network namespace by fd
  ns proc: Add support for the ipc namespace
  ns proc: Add support for the uts namespace
  ns proc: Add support for the network namespace.
  ns: Introduce the setns syscall
  ns: proc files for namespace naming policy.
2011-05-25 18:10:16 -07:00
Eric Dumazet
2907c35ff6 net: hold rtnl again in dump callbacks
Commit e67f88dd12 (dont hold rtnl mutex during netlink dump callbacks)
missed fact that rtnl_fill_ifinfo() must be called with rtnl held.

Because of possible deadlocks between two mutexes (cb_mutex and rtnl),
its not easy to solve this problem, so revert this part of the patch.

It also forgot one rcu_read_unlock() in FIB dump_rules()

Add one ASSERT_RTNL() in rtnl_fill_ifinfo() to remind us the rule.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-25 17:55:32 -04:00
Neil Horman
f11970e383 net: make dev_disable_lro use physical device if passed a vlan dev (v2)
If the device passed into dev_disable_lro is a vlan, then repoint the dev
poniter so that we actually modify the underlying physical device.

Signed-of-by: Neil Horman <nhorman@tuxdriver.com>
CC: davem@davemloft.net
CC: bhutchings@solarflare.com

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-25 17:55:25 -04:00
Stephen Rothwell
956c920786 net: fix get_net_ns_by_fd for !CONFIG_NET_NS
After merging the final tree, today's linux-next build (powerpc
ppc44x_defconfig) failed like this:

net/built-in.o: In function `get_net_ns_by_fd':
(.text+0x11976): undefined reference to `netns_operations'
net/built-in.o: In function `get_net_ns_by_fd':
(.text+0x1197a): undefined reference to `netns_operations'

netns_operations is only available if CONFIG_NET_NS is set ...

Caused by commit f063052947 ("net: Allow setting the network namespace
by fd").

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2011-05-24 15:30:51 -07:00
Eric Dumazet
b30c516f87 net: fix __dst_destroy_metrics_generic()
dst_default_metrics is readonly, we dont want to kfree() it later.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-24 13:29:50 -04:00
Eric Dumazet
be3fc413da net: use synchronize_rcu_expedited()
synchronize_rcu() is very slow in various situations (HZ=100,
CONFIG_NO_HZ=y, CONFIG_PREEMPT=n)

Extract from my (mostly idle) 8 core machine :

 synchronize_rcu() in 99985 us
 synchronize_rcu() in 79982 us
 synchronize_rcu() in 87612 us
 synchronize_rcu() in 79827 us
 synchronize_rcu() in 109860 us
 synchronize_rcu() in 98039 us
 synchronize_rcu() in 89841 us
 synchronize_rcu() in 79842 us
 synchronize_rcu() in 80151 us
 synchronize_rcu() in 119833 us
 synchronize_rcu() in 99858 us
 synchronize_rcu() in 73999 us
 synchronize_rcu() in 79855 us
 synchronize_rcu() in 79853 us

When we hold RTNL mutex, we would like to spend some cpu cycles but not
block too long other processes waiting for this mutex.

We also want to setup/dismantle network features as fast as possible at
boot/shutdown time.

This patch makes synchronize_net() call the expedited version if RTNL is
locked.

synchronize_rcu_expedited() typical delay is about 20 us on my machine.

 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 20 us
 synchronize_rcu_expedited() in 16 us
 synchronize_rcu_expedited() in 20 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Ben Greear <greearb@candelatech.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-24 13:26:12 -04:00
Joe Perches
6c4a5cb219 net: filter: Use WARN_RATELIMIT
A mis-configured filter can spam the logs with lots of stack traces.

Rate-limit the warnings and add printout of the bogus filter information.

Original-patch-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-23 17:37:43 -04:00
WANG Cong
ce14f8946a pktgen: refactor pg_init() code
This also shrinks the object size a little.

   text	   data	    bss	    dec	    hex	filename
  28834	    186	      8	  29028	   7164	net/core/pktgen.o
  28816	    186	      8	  29010	   7152	net/core/pktgen.o.AFTER

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: "David Miller" <davem@davemloft.net>,
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-22 21:01:22 -04:00
WANG Cong
68d5ac2ed6 pktgen: use vzalloc_node() instead of vmalloc_node() + memset()
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-22 21:01:21 -04:00