linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-04 18:19:28 +09:00

Author	SHA1	Message	Date
Julian Wiedmann	222440996d	net/af_iucv: drop inbound packets with invalid flags Inbound packets may have any combination of flag bits set in their iucv header. If we don't know how to handle a specific combination, drop the skb instead of leaking it. To clarify what error is returned in this case, replace the hard-coded 0 with the corresponding macro. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:32:21 -07:00
David S. Miller	6ef848efc2	Merge branch 'rtnetlink-add-IFA_TARGET_NETNSID-for-RTM_GETADDR' Christian Brauner says: ==================== rtnetlink: add IFA_TARGET_NETNSID for RTM_GETADDR This iteration should mainly addresses the suggestion to use IFA_TARGET_NETNSID as the property name. Additionally, an an alias for the already existing IFLA_IF_NETNSID property is added. Note that two additional cleanup patches (8\9 and 9\9) were added to address concerns raised that passing more than 6 arguments to a function will cause additional variables to be pushed onto the stack instead of being placed into registers. The way I addressed this is by introducing two new struct inet{6}_fill_args that are used to pass common information down to inet{6}_fill_if() functions shortening all those functions to three pointer arguments. If this is something more people than Kirill find useful they can be kept if not they can simply be dropped in later iterations of this series or when merging. Here is a short overview: 1. Rename from IFA_IF_NETNSID to IFA_TARGET_NETNSID. 2. Add IFLA_TARGET_NETNSID as an alias for IFA_IFLA_NETNSID and switch all occurrences over to the new alias. 3. Add inet4_fill_args struct to avoid passing more than 6 arguments in inet_fill_if() functions. 4. Add inet6_fill_args struct to avoid passing more than 6 arguments in inet_fill_if() functions. The only functional change is the export of rtnl_get_net_ns_capable() which is needed in case ipv6 is built as a module. Note, I did not change the property name to IFA_TARGET_NSID as there was no clear agreement what would be preferred. My personal preference is to keep the IFA_IF_NETNSID name because it aligns naturally with the IFLA_IF_NETNSID property for RTM_LINK requests. Jiri seems to prefer this name too. However, if there is agreement that another property name makes more sense I'm happy to send a v2 that changes this. To test this patchset I performed 1 million getifaddrs() requests against a network namespace containing 5 interfaces (lo, eth{0-4}). The first test used a network namespace aware getifaddrs() implementation I wrote and the second test used the traditional setns() + getifaddrs() method. The results show that this patchsets allows userspace to cut retrieval time in half: 1. netns_getifaddrs(): 82 microseconds 2. setns() + getifaddrs(): 162 microseconds A while back we introduced and enabled IFLA_IF_NETNSID in RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led to signficant performance increases since it allows userspace to avoid taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the interfaces from the netns associated with the netns_fd. Especially when a lot of network namespaces are in use, using setns() becomes increasingly problematic when performance matters. Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf. getifaddrs() style functions and friends). But currently, RTM_GETADDR requests do not support a similar property like IFLA_IF_NETNSID for RTM_LINK requests. This is problematic since userspace can retrieve interfaces from another network namespace by sending a IFLA_IF_NETNSID property along but RTM_GETLINK request but is still forced to use the legacy setns() style of retrieving interfaces in RTM_GETADDR requests. The goal of this series is to make it possible to perform RTM_GETADDR requests on different network namespaces. To this end a new IFA_IF_NETNSID property for RTM_ADDR requests is introduced. It can be used to send a network namespace identifier along in RTM_ADDR requests. The network namespace identifier will be used to retrieve the target network namespace in which the request is supposed to be fulfilled. This aligns the behavior of RTM_ADDR requests with the behavior of RTM_*LINK requests. - The caller must have assigned a valid network namespace identifier for the target network namespace. - The caller must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. [1]: commit `7973bfd875` ("rtnetlink: remove check for IFLA_IF_NETNSID") [2]: commit `5bb8ed0754` ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK") [3]: commit `b61ad68a9f` ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK") [4]: commit `c310bfcb6e` ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK") [5]: commit `7c4f63ba82` ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:27:12 -07:00
Christian Brauner	203651b665	ipv6: add inet6_fill_args inet6_fill_if{addr,mcaddr, acaddr}() already took 6 arguments which meant the 7th argument would need to be pushed onto the stack on x86. Add a new struct inet6_fill_args which holds common information passed to inet6_fill_if{addr,mcaddr, acaddr}() and shortens the functions to three pointer arguments. Signed-off-by: Christian Brauner <christian@brauner.io> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:27:11 -07:00
Christian Brauner	978a46fa6c	ipv4: add inet_fill_args inet_fill_ifaddr() already took 6 arguments which meant the 7th argument would need to be pushed onto the stack on x86. Add a new struct inet_fill_args which holds common information passed to inet_fill_ifaddr() and shortens the function to three pointer arguments. Signed-off-by: Christian Brauner <christian@brauner.io> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:27:11 -07:00
Christian Brauner	7e4a8d5a93	rtnetlink: s/IFLA_IF_NETNSID/IFLA_TARGET_NETNSID/g IFLA_TARGET_NETNSID is the new alias for IFLA_IF_NETNSID. This commit replaces all occurrences of IFLA_IF_NETNSID with the new alias to indicate that this identifier is the preferred one. Signed-off-by: Christian Brauner <christian@brauner.io> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:27:11 -07:00
Christian Brauner	19d8f1ad12	if_link: add IFLA_TARGET_NETNSID alias This adds IFLA_TARGET_NETNSID as an alias for IFLA_IF_NETNSID for RTM_LINK requests. The new name is clearer and also aligns with the newly introduced IFA_TARGET_NETNSID propert for RTM_ADDR requests. Signed-off-by: Christian Brauner <christian@brauner.io> Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:27:11 -07:00
Christian Brauner	87ccbb1f94	rtnetlink: move type calculation out of loop I don't see how the type - which is one of RTM_{GETADDR,GETROUTE,GETNETCONF} - can change. So do the message type calculation once before entering the for loop. Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:27:11 -07:00
Christian Brauner	6ecf4c37eb	ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR - Backwards Compatibility: If userspace wants to determine whether ipv6 RTM_GETADDR requests support the new IFA_TARGET_NETNSID property it should verify that the reply includes the IFA_TARGET_NETNSID property. If it does not userspace should assume that IFA_TARGET_NETNSID is not supported for ipv6 RTM_GETADDR requests on this kernel. - From what I gather from current userspace tools that make use of RTM_GETADDR requests some of them pass down struct ifinfomsg when they should actually pass down struct ifaddrmsg. To not break existing tools that pass down the wrong struct we will do the same as for RTM_GETLINK \| NLM_F_DUMP requests and not error out when the nlmsg_parse() fails. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:27:11 -07:00
Christian Brauner	d38071455f	ipv4: enable IFA_TARGET_NETNSID for RTM_GETADDR - Backwards Compatibility: If userspace wants to determine whether ipv4 RTM_GETADDR requests support the new IFA_TARGET_NETNSID property it should verify that the reply includes the IFA_TARGET_NETNSID property. If it does not userspace should assume that IFA_TARGET_NETNSID is not supported for ipv4 RTM_GETADDR requests on this kernel. - From what I gather from current userspace tools that make use of RTM_GETADDR requests some of them pass down struct ifinfomsg when they should actually pass down struct ifaddrmsg. To not break existing tools that pass down the wrong struct we will do the same as for RTM_GETLINK \| NLM_F_DUMP requests and not error out when the nlmsg_parse() fails. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:27:11 -07:00
Christian Brauner	9f3c057c14	if_addr: add IFA_TARGET_NETNSID This adds a new IFA_TARGET_NETNSID property to be used by address families such as PF_INET and PF_INET6. The IFA_TARGET_NETNSID property can be used to send a network namespace identifier as part of a request. If a IFA_TARGET_NETNSID property is identified it will be used to retrieve the target network namespace in which the request is to be made. Signed-off-by: Christian Brauner <christian@brauner.io> Cc: Jiri Benc <jbenc@redhat.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:27:11 -07:00
Christian Brauner	c383edc424	rtnetlink: add rtnl_get_net_ns_capable() get_target_net() will be used in follow-up patches in ipv{4,6} codepaths to retrieve network namespaces based on network namespace identifiers. So remove the static declaration and export in the rtnetlink header. Also, rename it to rtnl_get_net_ns_capable() to make it obvious what this function is doing. Export rtnl_get_net_ns_capable() so it can be used when ipv6 is built as a module. Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:27:11 -07:00
Alexei Starovoitov	a9c676bc8f	bpf/verifier: fix verifier instability Edward Cree says: In check_mem_access(), for the PTR_TO_CTX case, after check_ctx_access() has supplied a reg_type, the other members of the register state are set appropriately. Previously reg.range was set to 0, but as it is in a union with reg.map_ptr, which is larger, upper bytes of the latter were left in place. This then caused the memcmp() in regsafe() to fail, preventing some branches from being pruned (and occasionally causing the same program to take a varying number of processed insns on repeated verifier runs). Fix the instability by clearing bpf_reg_state in __mark_reg_[un]known() Fixes: `f1174f77b5` ("bpf/verifier: rework value tracking") Debugged-by: Edward Cree <ecree@solarflare.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-09-05 22:21:00 -07:00
David S. Miller	d4cc597623	Merge branch 'net-lan78xx-Minor-improvements' Stefan Wahren says: ==================== net: lan78xx: Minor improvements This patch series contains some minor improvements for the lan78xx driver. Changes in V2: - Keep Copyright comment as multi-line - Add Raghuram's Reviewed-by ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:20:45 -07:00
Stefan Wahren	51ceac9fb5	net: lan78xx: Make declaration style consistent This patch makes some declaration more consistent. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Raghuram Chary Jallipalli <raghuramchary.jallipalli@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:20:45 -07:00
Stefan Wahren	6be665a56d	net: lan78xx: Switch to SPDX identifier Adopt the SPDX license identifier headers to ease license compliance management. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:20:45 -07:00
Stefan Wahren	7a6b022d79	net: lan78xx: Drop unnecessary strcpy in lan78xx_probe There is no need for this strcpy because alloc_etherdev() already does this job. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Raghuram Chary Jallipalli <raghuramchary.jallipalli@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:20:45 -07:00
Stefan Wahren	fa8cd98c06	net: lan78xx: Bail out if lan78xx_get_endpoints fails We need to bail out if lan78xx_get_endpoints() fails, otherwise the result is overwritten. Fixes: `55d7de9de6` ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Raghuram Chary Jallipalli <raghuramchary.jallipalli@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:20:45 -07:00
Davide Caratti	ee28bb56ac	net/sched: fix memory leak in act_tunnel_key_init() If users try to install act_tunnel_key 'set' rules with duplicate values of 'index', the tunnel metadata are allocated, but never released. Then, kmemleak complains as follows: # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111 # echo clear > /sys/kernel/debug/kmemleak # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111 Error: TC IDR already exists. We have an error talking to the kernel # echo scan > /sys/kernel/debug/kmemleak # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8800574e6c80 (size 256): comm "tc", pid 5617, jiffies 4298118009 (age 57.990s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 1c e8 b0 ff ff ff ff ................ 81 24 c2 ad ff ff ff ff 00 00 00 00 00 00 00 00 .$.............. backtrace: [<00000000b7afbf4e>] tunnel_key_init+0x8a5/0x1800 [act_tunnel_key] [<000000007d98fccd>] tcf_action_init_1+0x698/0xac0 [<0000000099b8f7cc>] tcf_action_init+0x15c/0x590 [<00000000dc60eebe>] tc_ctl_action+0x336/0x5c2 [<000000002f5a2f7d>] rtnetlink_rcv_msg+0x357/0x8e0 [<000000000bfe7575>] netlink_rcv_skb+0x124/0x350 [<00000000edab656f>] netlink_unicast+0x40f/0x5d0 [<00000000b322cdcb>] netlink_sendmsg+0x6e8/0xba0 [<0000000063d9d490>] sock_sendmsg+0xb3/0xf0 [<00000000f0d3315a>] ___sys_sendmsg+0x654/0x960 [<00000000c06cbd42>] __sys_sendmsg+0xd3/0x170 [<00000000ce72e4b0>] do_syscall_64+0xa5/0x470 [<000000005caa2d97>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<00000000fac1b476>] 0xffffffffffffffff This problem theoretically happens also in case users attempt to setup a geneve rule having wrong configuration data, or when the kernel fails to allocate 'params_new'. Ensure that tunnel_key_init() releases the tunnel metadata also in the above conditions. Addresses-Coverity-ID: 1373974 ("Resource leak") Fixes: `d0f6dd8a91` ("net/sched: Introduce act_tunnel_key") Fixes: `0ed5269f9e` ("net/sched: add tunnel option support to act_tunnel_key") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:18:54 -07:00
Jakub Kicinski	7848418e28	nfp: separate VXLAN and GRE feature handling VXLAN and GRE FW features have to currently be both advertised for the driver to enable them. Separate the handling. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:18:11 -07:00
David S. Miller	eebd3faa4f	Merge branch 'nfp-improve-the-new-rtsym-helpers' Jakub Kicinski says: ==================== nfp: improve the new rtsym helpers This set fixes a bug in ABS rtsym handling I added in net-next, it expands the error checking and reporting on the rtsym accesses. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:17:07 -07:00
Jakub Kicinski	e84b2f2db2	nfp: validate rtsym accesses fall within the symbol With the accesses to rtsyms now all going via special helpers we can easily make sure the driver is not reading past the end of the symbol. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Francois H. Theron <francois.theron@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:17:07 -07:00
Jakub Kicinski	31e380f38f	nfp: prefix rtsym error messages with symbol name For ease of debug preface all error messages with the name of the symbol which caused them. Use the same message format for existing messages while at it. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Francois H. Theron <francois.theron@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:17:07 -07:00
Jakub Kicinski	3c576de30b	nfp: fix readq on absolute RTsyms Return the error and report value through the output param. Fixes: `640917dd81` ("nfp: support access to absolute RTsyms") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Francois H. Theron <francois.theron@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:17:07 -07:00
Taeung Song	69495d2a52	libbpf: Remove the duplicate checking of function storage After the commit `eac7d84519` ("tools: libbpf: don't return '.text' as a program for multi-function programs"), bpf_program__next() in bpf_object__for_each_program skips the function storage such as .text, so eliminate the duplicate checking. Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-09-05 22:16:00 -07:00
YueHaibing	9e7e6cabf3	failover: Add missing check to validate 'slave_dev' in net_failover_slave_unregister Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/net_failover.c: In function 'net_failover_slave_unregister': drivers/net/net_failover.c:598:35: warning: variable 'primary_dev' set but not used [-Wunused-but-set-variable] There should check the validity of 'slave_dev'. Fixes: `cfc80d9a11` ("net: Introduce net_failover driver") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:14:47 -07:00
Cong Wang	0a3b8b2b21	tipc: orphan sock in tipc_release() Before we unlock the sock in tipc_release(), we have to detach sk->sk_socket from sk, otherwise a parallel tipc_sk_fill_sock_diag() could stil read it after we free this socket. Fixes: `c30b70deb5` ("tipc: implement socket diagnostics for AF_TIPC") Reported-and-tested-by: syzbot+48804b87c16588ad491d@syzkaller.appspotmail.com Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:14:00 -07:00
Dmitry Safonov	428f944bd5	netlink: Make groups check less stupid in netlink_bind() As Linus noted, the test for 0 is needless, groups type can follow the usual kernel style and 8sizeof(unsigned long) is BITS_PER_LONG: > The code [..] isn't technically incorrect... > But it is stupid. > Why stupid? Because the test for 0 is pointless. > > Just doing > if (nlk->ngroups < 8sizeof(groups)) > groups &= (1UL << nlk->ngroups) - 1; > > would have been fine and more understandable, since the "mask by shift > count" already does the right thing for a ngroups value of 0. Now that > test for zero makes me go "what's special about zero?". It turns out > that the answer to that is "nothing". [..] > The type of "groups" is kind of silly too. > > Yeah, "long unsigned int" isn't _technically_ wrong. But we normally > call that type "unsigned long". Cleanup my piece of pointlessness. Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: netdev@vger.kernel.org Fairly-blamed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:11:33 -07:00
Vincent Whitchurch	fa788d986a	packet: add sockopt to ignore outgoing packets Currently, the only way to ignore outgoing packets on a packet socket is via the BPF filter. With MSG_ZEROCOPY, packets that are looped into AF_PACKET are copied in dev_queue_xmit_nit(), and this copy happens even if the filter run from packet_rcv() would reject them. So the presence of a packet socket on the interface takes away the benefits of MSG_ZEROCOPY, even if the packet socket is not interested in outgoing packets. (Even when MSG_ZEROCOPY is not used, the skb is unnecessarily cloned, but the cost for that is much lower.) Add a socket option to allow AF_PACKET sockets to ignore outgoing packets to solve this. Note that the *BSDs already have something similar: BIOCSSEESENT/BIOCSDIRECTION and BIOCSDIRFILT. The first intended user is lldpd. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-05 22:09:37 -07:00
Alaa Hleihel	fe1dc06999	net/mlx5e: don't set CHECKSUM_COMPLETE on SCTP packets CHECKSUM_COMPLETE is not applicable to SCTP protocol. Setting it for SCTP packets leads to CRC32c validation failure. Fixes: `bbceefce9a` ("net/mlx5e: Support RX CHECKSUM_COMPLETE") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 21:14:57 -07:00
Natali Shechtman	f007c13d4a	net/mlx5e: Set ECN for received packets using CQE indication In multi-host (MH) NIC scheme, a single HW port serves multiple hosts or sockets on the same host. The HW uses a mechanism in the PCIe buffer which monitors the amount of consumed PCIe buffers per host. On a certain configuration, under congestion, the HW emulates a switch doing ECN marking on packets using ECN indication on the completion descriptor (CQE). The driver needs to set the ECN bits on the packet SKB, such that the network stack can react on that, this commit does that. Signed-off-by: Natali Shechtman <natali@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 21:14:57 -07:00
Shay Agroskin	64109f1dc4	net/mlx5e: Replace PTP clock lock from RW lock to seq lock Changed "priv.clock.lock" lock from 'rw_lock' to 'seq_lock' in order to improve packet rate performance. Tested on Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz. Sent 64b packets between two peers connected by ConnectX-5, and measured packet rate for the receiver in three modes: no time-stamping (base rate) time-stamping using rw_lock (old lock) for critical region time-stamping using seq_lock (new lock) for critical region Only the receiver time stamped its packets. The measured packet rate improvements are: Single flow (multiple TX rings to single RX ring): without timestamping: 4.26 (M packets)/sec with rw-lock (old lock): 4.1 (M packets)/sec with seq-lock (new lock): 4.16 (M packets)/sec 1.46% improvement Multiple flows (multiple TX rings to six RX rings): without timestamping: 22 (M packets)/sec with rw-lock (old lock): 11.7 (M packets)/sec with seq-lock (new lock): 21.3 (M packets)/sec 82.05% improvement The packet rate improvement is due to the lack of atomic operations for the 'readers' by the seq-lock. Since there are much more 'readers' than 'writers' contention on this lock, almost all atomic operations are saved. this results in a dramatic decrease in overall cache misses. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 21:14:57 -07:00
Roi Dayan	1462e48db0	net/mlx5e: Move Q counters allocation and drop RQ to init_rx Not all profiles query the HW Q counters in update_stats() callback. HW Q couners are limited per device and in case of representors all their Q counters are allocated on the parent PF device. Avoid reundant allocation of HW Q counters by moving the allocation to init_rx profile callback. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 21:14:57 -07:00
Kamal Heib	d24082050f	net/mlx5e: Move mlx5e_priv_flags into en_ethtool.c Move the definition of mlx5e_priv_flags into en_ethtool.c because it's only used there. Fixes: `4e59e28881` ("net/mlx5e: Introduce net device priv flags infrastructure") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 21:14:57 -07:00
Vlad Buslov	12d6066c3b	net/mlx5: Add flow counters idr Previous patch in series changed flow counter storage structure from rb_tree to linked list in order to improve flow counter traversal performance. The drawback of such solution is that flow counter lookup by id becomes linear in complexity. Store pointers to flow counters in idr in order to improve lookup performance to logarithmic again. Idr is non-intrusive data structure and doesn't require extending flow counter struct with new elements. This means that idr can be used for lookup, while linked list from previous patch is used for traversal, and struct mlx5_fc size is <= 2 cache lines. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 21:14:57 -07:00
Vlad Buslov	9aff93d7d0	net/mlx5: Store flow counters in a list In order to improve performance of flow counter stats query loop that traverses all configured flow counters, replace rb_tree with double-linked list. This change improves performance of traversing flow counters by removing the tree traversal. (profiling data showed that call to rb_next was most top CPU consumer) However, lookup of flow flow counter in list becomes linear, instead of logarithmic. This problem is fixed by next patch in series, which adds idr for fast lookup. Idr is to be used because it is not an intrusive data structure and doesn't require adding any new members to struct mlx5_fc, which allows its control data part to stay <= 1 cache line in size. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 21:14:57 -07:00
Vlad Buslov	6e5e228391	net/mlx5: Add new list to store deleted flow counters In order to prevent flow counters stats work function from traversing whole flow counters tree while searching for deleted flow counters, new list to store deleted flow counters is added to struct mlx5_fc_stats. Lockless NULL-terminated single linked list data type is used due to following reasons: - This use case only needs to add single element to list and remove/iterate whole list. Lockless list doesn't require any additional synchronization for these operations. - First cache line of flow counter data structure only has space to store single additional pointer, which precludes usage of double linked list. Remove flow counter 'deleted' flag that is no longer needed. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 21:14:57 -07:00
Vlad Buslov	83033688b7	net/mlx5: Change flow counters addlist type to single linked list In order to prevent flow counters stats work function from traversing whole flow counters tree while searching for deleted flow counters, new list to store deleted flow counters will be added to struct mlx5_fc_stats. However, the flow counter structure itself has no space left to store any more data in first cache line. To free space that is needed to store additional list node, convert current addlist double linked list (two pointers per node) to atomic single linked list (one pointer per node). Lockless NULL-terminated single linked list data type doesn't require any additional external synchronization for operations used by flow counters module (add single new element, remove all elements from list and traverse them). Remove addlist_lock that is no longer needed. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 21:14:56 -07:00
Weinan Li	792fab2c0d	drm/i915/gvt: Fix the incorrect length of child_device_config issue GVT-g emualte the opregion for guest with bdb version as '186' which child_device_config length should be '33'. v2: split into 2 patch. 1st for issue fix, 2nd for code clean up.(Zhenyu) v3: add fixes tag.(Zhenyu) Fixes: `4023f301d2` ("drm/i915/gvt: opregion virtualization for win") CC: Xiaolin Zhang <xiaolin.zhang@intel.com> Reviewed-by: Xiaolin Zhang <xiaolin.zhang@intel.com> Signed-off-by: Weinan Li <weinan.z.li@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>	2018-09-06 11:17:38 +08:00
Roi Dayan	ad9421e36a	net/mlx5: Fix possible deadlock from lockdep when adding fte to fg This is a false positive report due to incorrect nested lock annotations as we lock multiple fgs with the same subclass. Instead of locking all fgs only lock the one being used as was done before. Fixes: `bd71b08ec2` ("net/mlx5: Support multiple updates of steering rules in parallel") Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 17:08:34 -07:00
Saeed Mahameed	fc433829f9	net/mlx5e: Ethtool steering, fix udp source port value Copy and paste bug was introduced in the offending patch. We need to write udp source port value into the headers value and not headers criteria "mask". Fixes: `142644f8a1` ("net/mlx5e: Ethtool steering flow parsing refactoring") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 17:08:33 -07:00
Huy Nguyen	47bc94b822	net/mlx5: Check for error in mlx5_attach_interface Currently, mlx5_attach_interface does not check for error after calling intf->attach or intf->add. When these two calls fails, the client is not initialized and will cause issues such as kernel panic on invalid address in the teardown path (mlx5_detach_interface) Fixes: `737a234bb6` ("net/mlx5: Introduce attach/detach to interface API") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 17:08:33 -07:00
Daniel Jurgens	df7ddb2396	net/mlx5: Consider PCI domain in search for next dev The PCI BDF is not unique. PCI domain must also be considered when searching for the next physical device during lag setup. Example below: mlx5_core 0000:01:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(128) RxCqeCmprss(0) mlx5_core 0000:01:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(128) RxCqeCmprss(0) mlx5_core 0001:01:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(128) RxCqeCmprss(0) mlx5_core 0001:01:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(128) RxCqeCmprss(0) Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 17:08:33 -07:00
Roi Dayan	071304772f	net/mlx5: Fix not releasing read lock when adding flow rules If building match list fg fails and we never jumped to search_again_locked label then the function returned without unlocking the read lock. Fixes: `bd71b08ec2` ("net/mlx5: Support multiple updates of steering rules in parallel") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 17:08:33 -07:00
Raed Salem	c88a026e01	net/mlx5: E-Switch, Fix memory leak when creating switchdev mode FDB tables The memory allocated for the slow path table flow group input structure was not freed upon successful return, fix that. Fixes: `1967ce6ea5` ("net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev mode") Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 17:08:33 -07:00
Tariq Toukan	a090362210	net/mlx5: Use u16 for Work Queue buffer strides offset Minimal stride size is 16. Hence, the number of strides in a fragment (of PAGE_SIZE) is <= PAGE_SIZE / 16 <= 4K. u16 is sufficient to represent this. Fixes: `d7037ad73d` ("net/mlx5: Fix QP fragmented buffer allocation") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 17:08:33 -07:00
Tariq Toukan	8d71e81850	net/mlx5: Use u16 for Work Queue buffer fragment size Minimal stride size is 16. Hence, the number of strides in a fragment (of PAGE_SIZE) is <= PAGE_SIZE / 16 <= 4K. u16 is sufficient to represent this. Fixes: `388ca8be00` ("IB/mlx5: Implement fragmented completion queue (CQ)") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 17:08:33 -07:00
Jack Morgenstein	5df816e7f4	net/mlx5: Fix debugfs cleanup in the device init/remove flow When initializing the device (procedure init_one), the driver calls mlx5_pci_init to perform pci initialization. As part of this initialization, mlx5_pci_init creates a debugfs directory. If this creation fails, init_one aborts, returning failure to the caller (which is the probe method caller). The main reason for such a failure to occur is if the debugfs directory already exists. This can happen if the last time mlx5_pci_close was called, debugfs_remove (silently) failed due to the debugfs directory not being empty. Guarantee that such a debugfs_remove failure will not occur by instead calling debugfs_remove_recursive in procedure mlx5_pci_close. Fixes: `59211bd3b6` ("net/mlx5: Split the load/unload flow into hardware and software flows") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 17:08:33 -07:00
Jack Morgenstein	76d5581c87	net/mlx5: Fix use-after-free in self-healing flow When the mlx5 health mechanism detects a problem while the driver is in the middle of init_one or remove_one, the driver needs to prevent the health mechanism from scheduling future work; if future work is scheduled, there is a problem with use-after-free: the system WQ tries to run the work item (which has been freed) at the scheduled future time. Prevent this by disabling work item scheduling in the health mechanism when the driver is in the middle of init_one() or remove_one(). Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-09-05 17:08:33 -07:00
Parav Pandit	08e74be103	RDMA/uverbs: Fix error cleanup path of ib_uverbs_add_one() If ib_uverbs_create_uapi() fails, dev_num should be freed from the bitmap. Fixes: `7d96c9b176` ("IB/uverbs: Have the core code create the uverbs_root_spec") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-09-05 16:15:52 -06:00
Mikulas Patocka	8b2ded1c94	block: don't warn when doing fsync on read-only devices It is possible to call fsync on a read-only handle (for example, fsck.ext2 does it when doing read-only check), and this call results in kernel warning. The patch `b089cfd95d` ("block: don't warn for flush on read-only device") attempted to disable the warning, but it is buggy and it doesn't (op_is_flush tests flags, but bio_op strips off the flags). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: `721c7fc701` ("block: fail op_is_write() requests to read-only partitions") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-05 16:14:36 -06:00

... 21 22 23 24 25 ...

783647 Commits