commit c7ac8679be upstream.
The message size allocated for rtnl ifinfo dumps was limited to
a single page. This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.
Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 64f509ce71 upstream.
Clients should not send such packets. By accepting them, we open
up a hole by wich ephemeral ports can be discovered in an off-path
attack.
See: "Reflection scan: an Off-Path Attack on TCP" by Jan Wrobel,
http://arxiv.org/abs/1201.2074
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 82e6bfe2fb upstream.
Commit v2.6.19-rc1~1272^2~41 tells us that r->cost != 0 can happen when
a running state is saved to userspace and then reinstated from there.
Make sure that private xt_limit area is initialized with correct values.
Otherwise, random matchings due to use of uninitialized memory.
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2614f86490 upstream.
In __nf_ct_expect_check, the function refresh_timer returns 1
if a matching expectation is found and its timer is successfully
refreshed. This results in nf_ct_expect_related returning 0.
Note that at this point:
- the passed expectation is not inserted in the expectation table
and its timer was not initialized, since we have refreshed one
matching/existing expectation.
- nf_ct_expect_alloc uses kmem_cache_alloc, so the expectation
timer is in some undefined state just after the allocation,
until it is appropriately initialized.
This can be a problem for the SIP helper during the expectation
addition:
...
if (nf_ct_expect_related(rtp_exp) == 0) {
if (nf_ct_expect_related(rtcp_exp) != 0)
nf_ct_unexpect_related(rtp_exp);
...
Note that nf_ct_expect_related(rtp_exp) may return 0 for the timer refresh
case that is detailed above. Then, if nf_ct_unexpect_related(rtcp_exp)
returns != 0, nf_ct_unexpect_related(rtp_exp) is called, which does:
spin_lock_bh(&nf_conntrack_lock);
if (del_timer(&exp->timeout)) {
nf_ct_unlink_expect(exp);
nf_ct_expect_put(exp);
}
spin_unlock_bh(&nf_conntrack_lock);
Note that del_timer always returns false if the timer has been
initialized. However, the timer was not initialized since setup_timer
was not called, therefore, the expectation timer remains in some
undefined state. If I'm not missing anything, this may lead to the
removal an unexistent expectation.
To fix this, the optimization that allows refreshing an expectation
is removed. Now nf_conntrack_expect_related looks more consistent
to me since it always add the expectation in case that it returns
success.
Thanks to Patrick McHardy for participating in the discussion of
this patch.
I think this may be the source of the problem described by:
http://marc.info/?l=netfilter-devel&m=134073514719421&w=2
Reported-by: Rafal Fitt <rafalf@aplusc.com.pl>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b423f6a40 upstream.
Existing code assumes that del_timer returns true for alive conntrack
entries. However, this is not true if reliable events are enabled.
In that case, del_timer may return true for entries that were
just inserted in the dying list. Note that packets / ctnetlink may
hold references to conntrack entries that were just inserted to such
list.
This patch fixes the issue by adding an independent timer for
event delivery. This increases the size of the ecache extension.
Still we can revisit this later and use variable size extensions
to allocate this area on demand.
Tested-by: Oliver Smith <olipro@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 283283c4da upstream.
After commit 39f618b4fd (3.4)
"ipvs: reset ipvs pointer in netns" we can oops in
ip_vs_dst_event on rmmod ip_vs because ip_vs_control_cleanup
is called after the ipvs_core_ops subsys is unregistered and
net->ipvs is NULL. Fix it by exiting early from ip_vs_dst_event
if ipvs is NULL. It is safe because all services and dests
for the net are already freed.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the past, a process could only see its own stats (uid-based summary,
and details).
Now we allow any process to see other UIDs uid-based stats, but still
hide the detailed stats.
Change-Id: I7666961ed244ac1d9359c339b048799e5db9facc
Signed-off-by: JP Abgrall <jpa@google.com>
[ Upstream commit 2d8a041b7b ]
If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
not set, __ip_vs_get_timeouts() does not fully initialize the structure
that gets copied to userland and that for leaks up to 12 bytes of kernel
stack. Add an explicit memset(0) before passing the structure to
__ip_vs_get_timeouts() to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
qtaguid tracks the device stats by monitoring when it goes up and down,
then it gets the dev_stats().
But devs don't correctly report stats (either they don't count headers
symmetrically between rx/tx, or they count internal control messages).
Now qtaguid counts the rx/tx bytes/packets during raw:prerouting and
mangle:postrouting (nat is not available in ipv6).
The results are in
/proc/net/xt_qtaguid/iface_stat_fmt
which outputs a format line (bash expansion):
ifname total_skb_{rx,tx}_{bytes,packets}
Added event counters for pre/post handling.
Added extra ctrl_*() pid/uid debugging.
Change-Id: Id84345d544ad1dd5f63e3842cab229e71d339297
Signed-off-by: JP Abgrall <jpa@google.com>
Send netlink message notifications using a uevent with
subsystem=xt_idletimer
INTERFACE=...
STATE={active,inactive}
instead of needing a new netlink message type.
Revert the netlink.h change added by commit
beb914e987
Change-Id: I6881936bf754f3367ff3c8f6c1e9661cec0357d4
Signed-off-by: JP Abgrall <jpa@google.com>
When updating the stats for a given uid it would incorrectly assume
IPV4 and pick up the wrong protocol when IPV6.
Change-Id: Iea4a635012b4123bf7aa93809011b7b2040bb3d5
Signed-off-by: JP Abgrall <jpa@google.com>
There was a case that might have seemed like new_tag_stat was not
initialized and actually used.
Added comment explaining why it was impossible, and a BUG()
in case the logic gets changed.
Change-Id: I1eddd1b6f754c08a3bf89f7e9427e5dce1dfb081
Signed-off-by: JP Abgrall <jpa@google.com>
Add new netlink msg type for IDLETIMER.
Send notifications when the label becomes active after an idle period.
Send netlink message notifications in addition to sysfs notifications.
Change-Id: I31677ef00c94b5f82c8457e5bf9e5e584c23c523
Signed-off-by: Ashish Sharma <ashishsharma@google.com>
commit e0aac52e17 upstream.
Commit f11017ec2d (2.6.37)
moved the fwmark variable in subcontext that is invalidated before
reaching the ip_vs_ct_in_get call. As vaddr is provided as pointer
in the param structure make sure the fwmark variable is in
same context. As the fwmark templates can not be matched,
more and more template connections are created and the
controlled connections can not go to single real server.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Because for now the xt_qtaguid module allows procs to use tags without
having /dev/xt_qtaguid open, there was a case where it would try
to delete a resources from a list that was proc specific.
But that resource was never added to that list which is only
used when /dev/xt_qtaguid has been opened by the proc.
Once our userspace is fully updated, we won't need those exceptions.
Change-Id: Idd4bfea926627190c74645142916e10832eb2504
Signed-off-by: JP Abgrall <jpa@google.com>
In cases where the skb would have an sk_socket but no file, that skb
would not be counted at all. Assigning to uid 0 now.
Adding extra counters to track skb counts.
Change-Id: If049b4b525e1fbd5afc9c72b4a174c0a435f2ca7
Signed-off-by: JP Abgrall <jpa@google.com>
* Crash fix
The delete command would delete a socket tag entry without removing it
from the proc_qtu_data { ..., sock_tag_list, }.
This in turn would cause an exiting process to crash while cleaning up
its matching proc_qtu_data.
* Added more aggressive tracking/cleanup of proc_qtu_data
This should allow one process to cleanup qtu_tag_data{} left around from
processes that didn't use resource tracking via /dev/xt_qtaguid.
* Debug printing tweaks
Better code inclusion/exclusion handling,
and extra debug out of full state.
Change-Id: I735965af2962ffcd7f3021cdc0068b3ab21245c2
Signed-off-by: JP Abgrall <jpa@google.com>
There is a
/proc/net/xt_qtaguid/iface/<iface>/{rx_bytes,rx_packets,tx_bytes,...}
but for better convenience and to avoid getting overly stale net/dev stats
we now have
/proc/net/xt_qtaguid/iface_stat_all
which outputs lines of:
iface_name active rx_bytes rx_packets tx_bytes tx_packets
net_dev_rx_bytes net_dev_rx_packets net_dev_tx_bytes net_dev_tx_packets
Change-Id: I12cc10d2d123b86b56d4eb489b1d77b2ce72ebcf
Signed-off-by: JP Abgrall <jpa@google.com>
Most net devs will not reset their stats when just going down/up,
unless a NETDEV_UNREGISTER was notified.
But some devs will not send out a NETDEV_UNREGISTER but still
reset their stats just before a NETDEV_UP.
Now we just track the dev stats during NETDEV_DOWN... just in case.
Then on NETDEV_UP we check the stats: if the device didn't do a
NETDEV_UNREGISTER and a prior NETDEV_DOWN captured stats, then we treat
it as an UNREGISTER and save the totals from the stashed values.
Added extra netdev event debugging.
Change-Id: Iec79e74bfd40269aa3e5892f161be71e09de6946
Signed-off-by: JP Abgrall <jpa@google.com>
When a process doesn't have /dev/xt_qtaguid open, only warn once
instead of for every ctrl access.
Change-Id: I98a462a8731254ddc3bf6d2fefeef9823659b1f0
Signed-off-by: JP Abgrall <jpa@google.com>
* Added global resource tracking based on tags.
- Can be put into passive mode via
/sys/modules/xt_qtaguid/params/tag_tracking_passive
- The number of socket tags per UID is now limited
- Adding /dev/xt_qtaguid that each process should open before starting
to tag sockets. A later change will make it a "must".
- A process should not create new tags unless it has the dev open.
A later change will make it a must.
- On qtaguid_resources release, the process' matching socket tag info
is deleted.
* Support run-time debug mask via /sys/modules parameter "debug_mask".
* split module into prettyprinting code, includes, main.
* Removed ptrdiff_t usage which didn't work in all cases.
Change-Id: I4a21d3bea55d23c1c3747253904e2a79f7d555d9
Signed-off-by: JP Abgrall <jpa@google.com>
Turns out that some devices don't call the notifier chains
with NETDEV_UNREGISTER.
So now we only track up/down as the points for tracking
active/inactive transitions and saving the get_dev_stats().
Change-Id: I948755962b4c64150b4d04f294fb4889f151e42b
Signed-off-by: JP Abgrall <jpa@google.com>
/proc/net/xt_qtaguid/ctrl will now show:
active tagged sockets: lines of "sock=%p tag=0x%llx (uid=%u)"
sockets_tagged, : the number of sockets successfully tagged.
sockets_untagged: the number of sockets successfully untagged.
counter_set_changes: ctrl counter set change requests.
delete_cmds: ctrl delete commands completed.
iface_events: number of NETDEV_* events handled.
match_found_sk: sk found in skbuff without ct assist.
match_found_sk_in_ct: the number of times the connection tracker found
a socket for us. This happens when the skbuff didn't have info.
match_found_sk_none: the number of times no sk could be determined
successfully looked up. This indicates we don't know who the
data actually belongs to. This could be unsolicited traffic.
Change-Id: I3a65613bb24852e1eea768ab0320a6a7073ab9be
Signed-off-by: JP Abgrall <jpa@google.com>
sockfd_put() risks sleeping.
So when doing a delete ctrl command, defer the sockfd_put() and
kfree() to outside of the spinlock.
Change-Id: I5f8ab51d05888d885b2fbb035f61efa5b7abb88a
Signed-off-by: JP Abgrall <jpa@google.com>
* Don't hold the sockets after tagging.
sockfd_lookup() does a get() on the associated file.
There was no matching put() so a closed socket could never be
freed.
* Don't rely on struct member order for tag_node
The structs that had a struct tag_node member would work with
the *_tree_* routines only because tag_node was 1st.
* Improve debug messages
Provide info on who the caller is. Use unsigned int for uid.
* Only process NETDEV_UP events.
* Pacifier: disable netfilter matching. Leave .../stats header.
Change-Id: Iccb8ae3cca9608210c417597287a2391010dff2c
Signed-off-by: JP Abgrall <jpa@google.com>
* Allow tracking interfaces that only have an ipv6 address.
Deal with ipv6 notifier chains that do NETDEV_UP without the rtnl_lock()
* Allow root all access to procfs ctrl/stats.
To disable all checks:
echo 0 > /sys/module/xt_qtaguid/parameters/ctrl_write_gid
echo 0 > /sys/module/xt_qtaguid/parameters/stats_readall_gid
* Add CDEBUG define to enable pr_debug output specific to
procfs ctrl/stats access.
Change-Id: I9a469511d92fe42734daff6ea2326701312a161b
Signed-off-by: JP Abgrall <jpa@google.com>
* Added support for sets of counters.
By default set 0 is active.
Userspace can control which set is active for a given UID by
writing to .../ctrl
s <set_num> <uid>
Changing the active set is only permitted for processes in the
AID_NET_BW_ACCT group.
The active set tracking is reset when the uid tag is deleted with
the .../ctrl command
d 0 <uid>
* New output format for the proc .../stats
- Now has cnt_set in the list.
"""
idx iface acct_tag_hex uid_tag_int cnt_set rx_bytes rx_packets tx_bytes tx_packets rx_tcp_packets rx_tcp_bytes rx_udp_packets rx_udp_bytes rx_other_packets rx_other_bytes tx_tcp_packets tx_tcp_bytes tx_udp_packets tx_udp_bytes tx_other_packets tx_other_bytes
...
2 rmnet0 0x0 1000 0 27729 29 1477 27 27501 26 228 3 0 0 1249 24 228 3 0 0
2 rmnet0 0x0 1000 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 rmnet0 0x0 10005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 rmnet0 0x0 10005 1 46407 57 8008 64 46407 57 0 0 0 0 8008 64 0 0 0 0
...
6 rmnet0 0x7fff000100000000 10005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 rmnet0 0x7fff000100000000 10005 1 27493 24 1564 22 27493 24 0 0 0 0 1564 22 0 0 0 0
"""
* Refactored for proc stats output code.
* Silenced some of the per packet debug output.
* Reworded some of the debug messages.
* Replaced all the spin_lock_irqsave/irqrestore with *_bh():
netfilter handling is done in softirq.
Change-Id: Ibe89f9d754579fd97335617186c614b43333cfd3
Signed-off-by: JP Abgrall <jpa@google.com>
This would cause log spam to the point of slowing down the system.
Change-Id: I5655f0207935004b0198f43ad0d3c9ea25466e4e
Signed-off-by: JP Abgrall <jpa@google.com>
* uid handling
- Limit UID impersonation to processes with a gid in AID_NET_BW_ACCT.
This affects socket tagging, and data removal.
- Limit stats lookup to own uid or the process gid is in AID_NET_BW_STATS.
This affects stats lookup.
* allow pacifying the module
Setting passive to Y/y will make the module return immediately on
external stimulus.
No more stats and silent success on ctrl writes.
Mainly used when one suspects this module of misbehaving.
Change-Id: I83990862d52a9b0922aca103a0f61375cddeb7c4
Signed-off-by: JP Abgrall <jpa@google.com>
* Add a new ctrl command to delete stored data.
d <acct_tag> [<uid>]
The uid will default to the running process's.
The accounting tag can be 0, in which case all counters and socket tags
associated with the uid will be cleared.
* Simplify the ctrl command handling at the expense of duplicate code.
This should make it easier to maintain.
* /proc/net/xt_qtaguid/stats now returns more stats
idx iface acct_tag_hex uid_tag_int
{rx,tx}_{bytes,packets}
{rx,tx}_{tcp,udp,other}_{bytes,packets}
the {rx,tx}_{bytes,packets} are the totals.
* re-tagging will now allow changing the uid.
Change-Id: I9594621543cefeab557caa3d68a22a3eb320466d
Signed-off-by: JP Abgrall <jpa@google.com>
This uses the NETLINK NETLINK_NFLOG family to log a single message
when the quota limit is reached.
It uses the same packet type as ipt_ULOG, but
- never copies skb data,
- uses 112 as the event number (ULOG's +1)
It doesn't log if the module param "event_num" is 0.
Change-Id: I6f31736b568bb31a4ff0b9ac2ee58380e6b675ca
Signed-off-by: JP Abgrall <jpa@google.com>
The xt_quota2 came from
http://sourceforge.net/projects/xtables-addons/develop
It needed tweaking for it to compile within the kernel tree.
Fixed kmalloc() and create_proc_entry() invocations within
a non-interruptible context.
Removed useless copying of current quota back to the iptable's
struct matchinfo:
- those are per CPU: they will change randomly based on which
cpu gets to update the value.
- they prevent matching a rule: e.g.
-A chain -m quota2 --name q1 --quota 123
can't be followed by
-D chain -m quota2 --name q1 --quota 123
as the 123 will be compared to the struct matchinfo's quota member.
Change-Id: I021d3b743db3b22158cc49acb5c94d905b501492
Signed-off-by: JP Abgrall <jpa@google.com>
The original xt_quota in the kernel is plain broken:
- counts quota at a per CPU level
(was written back when ubiquitous SMP was just a dream)
- provides no way to count across IPV4/IPV6.
This patch is the original unaltered code from:
http://sourceforge.net/projects/xtables-addons
at commit e84391ce665cef046967f796dd91026851d6bbf3
Change-Id: I19d49858840effee9ecf6cff03c23b45a97efdeb
Signed-off-by: JP Abgrall <jpa@google.com>
This time the symptom is caused by tagging the same socket twice
without untagging it in between.
This would cause it to not unlock, and return.
Signed-off-by: JP Abgrall <jpa@google.com>
When processing args passed to the procfs ctrl, if the tag was
invalid it would exit without releasing the spin_lock...
Bye bye scheduling.
Signed-off-by: JP Abgrall <jpa@google.com>
Change-Id: Ic1480ae9d37bba687586094cf6d0274db9c5b28a
(This is a direct cherry-pick from 2.6.39: I3b925802)
Fixed procreader for /proc/net/xt_qtaguid/ctrl: it would just
fill the output with the same entry.
Simplify the **start handling.
Signed-off-by: JP Abgrall <jpa@google.com>
Change-Id: I3b92580228f2b57795bb2d0d6197fc95ab6be552