linux

mirror of https://github.com/hardkernel/linux.git synced 2026-05-28 23:09:53 +09:00

Author	SHA1	Message	Date
Colin Ian King	6f43e52528	nexthop: remove redundant assignment to err The variable err is initialized with a value that is never read and err is reassigned a few statements later. This initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-31 14:33:52 -07:00
Jens Axboe	9d93a3f5a0	io_uring: punt short reads to async context We can encounter a short read when we're doing buffered reads and the data is partially cached. Right now we just return the short read, but that forces the application to read that CQE, then issue another SQE to finish the read. That read will not be cached, and hence will result in an async punt. It's more efficient to do that async punt from within the kernel, as that will the not need two round trips more to the kernel. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-31 15:30:03 -06:00
Jens Axboe	87e5e6dab6	uio: make import_iovec()/compat_import_iovec() return bytes on success Currently these functions return < 0 on error, and 0 for success. Change that so that we return < 0 on error, but number of bytes for success. Some callers already treat the return value that way, others need a slight tweak. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-31 15:30:03 -06:00
Vladimir Oltean	e8d67fa569	net: dsa: sja1105: Don't store frame type in skb->cb Due to a confusion I thought that eth_type_trans() was called by the network stack whereas it can actually be called by network drivers to figure out the skb protocol and next packet_type handlers. In light of the above, it is not safe to store the frame type from the DSA tagger's .filter callback (first entry point on RX path), since GRO is yet to be invoked on the received traffic. Hence it is very likely that the skb->cb will actually get overwritten between eth_type_trans() and the actual DSA packet_type handler. Of course, what this patch fixes is the actual overwriting of the SJA1105_SKB_CB(skb)->type field from the GRO layer, which made all frames be seen as SJA1105_FRAME_TYPE_NORMAL (0). Fixes: `227d07a07e` ("net: dsa: sja1105: Add support for traffic through standalone ports") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-31 14:27:27 -07:00
John Pittman	61939b12dc	block: print offending values when cloned rq limits are exceeded While troubleshooting issues where cloned request limits have been exceeded, it is often beneficial to know the actual values that have been breached. Print these values, assisting in ease of identification of root cause of the breach. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: John Pittman <jpittman@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-31 15:12:34 -06:00
Bart Van Assche	cd669f88b1	blk-mq: Document the blk_mq_hw_queue_to_node() arguments Document the meaning of the blk_mq_hw_queue_to_node() arguments. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-31 15:12:34 -06:00
Bart Van Assche	ef025d7ec2	blk-mq: Fix spelling in a source code comment Change one occurrence of 'performace' into 'performance'. Cc: Max Gurtovoy <maxg@mellanox.com> Fixes: `fe631457ff` ("blk-mq: map all HWQ also in hyperthreaded system") # v4.13. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-31 15:12:34 -06:00
Bart Van Assche	a0b77e36e1	block: Fix bsg_setup_queue() kernel-doc header Document all bsg_setup_queue() arguments as required. Fixes: `aae3b069d5` ("bsg: pass in desired timeout handler") # v5.0. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-31 15:12:34 -06:00
Bart Van Assche	83826a5066	block: Fix rq_qos_wait() kernel-doc header Add documentation for the @rqw argument and change " - " into ": ". Fixes: `84f603246d` ("block: add rq_qos_wait to rq_qos") # v5.0-rc1~52^2~140. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-31 15:12:34 -06:00
Bart Van Assche	0542cd57d2	block: Fix blk_mq_*_map_queues() kernel-doc headers This patch avoids that the kernel-doc script complains about these function headers when building with W=1. Cc: Hannes Reinecke <hare@suse.com> Cc: Keith Busch <keith.busch@intel.com> Fixes: `ed76e329d7` ("blk-mq: abstract out queue map") # v5.0. Fixes: `e42b3867de` ("blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues") # v5.0. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-31 15:12:34 -06:00
Bart Van Assche	216382dccb	block: Fix throtl_pending_timer_fn() kernel-doc header Commit `e99e88a9d2` renamed a function argument without updating the corresponding kernel-doc header. Update the kernel-doc header. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Reviewed-by: Kees Cook <keescook@chromium.org> Fixes: `e99e88a9d2` ("treewide: setup_timer() -> timer_setup()") # v4.15. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-31 15:12:34 -06:00
Bart Van Assche	33c826ef19	block: Convert blk_invalidate_devt() header into a non-kernel-doc header This patch avoids that the kernel-doc tool warns about this function header when building with W=1. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-31 15:12:34 -06:00
Bart Van Assche	210eaaaea8	block/partitions/ldm: Convert a kernel-doc header into a non-kernel-doc header This patch avoids that the kernel-doc tool warns about this function header when building with W=1. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-31 15:12:34 -06:00
Ke Wu	0ff9848067	security/loadpin: Allow to exclude specific file types Linux kernel already provide MODULE_SIG and KEXEC_VERIFY_SIG to make sure loaded kernel module and kernel image are trusted. This patch adds a kernel command line option "loadpin.exclude" which allows to exclude specific file types from LoadPin. This is useful when people want to use different mechanisms to verify module and kernel image while still use LoadPin to protect the integrity of other files kernel loads. Signed-off-by: Ke Wu <mikewu@google.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> [kees: fix array size issue reported by Coverity via Colin Ian King] Signed-off-by: Kees Cook <keescook@chromium.org>	2019-05-31 13:57:40 -07:00
Linus Torvalds	3ab4436f68	Merge tag 'nfsd-5.2-1' of git://linux-nfs.org/~bfields/linux Pull nfsd fix from Bruce Fields: "This reverts a minor fix which could cause us to treat conflicting NLM locks as nonconflicting. We have proper fix queued up for 5.3. In the meantime, a quick revert seems best for 5.2 and stable" * tag 'nfsd-5.2-1' of git://linux-nfs.org/~bfields/linux: Revert "lockd: Show pid of lockd for remote locks"	2019-05-31 13:51:16 -07:00
Linus Torvalds	41e7231fab	Merge tag 'v5.2-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Four small smb3 fixes, one for stable" * tag 'v5.2-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: CIFS: cifs_read_allocate_pages: don't iterate through whole page array on ENOMEM dfs_cache: fix a wrong use of kfree in flush_cache_ent() fs/cifs/smb2pdu.c: fix buffer free in SMB2_ioctl_free cifs: fix memory leak of pneg_inbuf on -EOPNOTSUPP ioctl case	2019-05-31 13:49:50 -07:00
Pavel Machek	8c0f693c6e	leds: avoid flush_work in atomic context It turns out that various triggers use led_blink_setup() from atomic context, so we can't do a flush_work there. Flush is still needed for slow LEDs, but we can move it to sysfs code where it is safe. WARNING: inconsistent lock state 5.2.0-rc1 #1 Tainted: G W -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes: 000000006e30541b ((work_completion)(&led_cdev->set_brightness_work)){+.?.}, at: +__flush_work+0x3b/0x38a {SOFTIRQ-ON-W} state was registered at: lock_acquire+0x146/0x1a1 __flush_work+0x5b/0x38a flush_work+0xb/0xd led_blink_setup+0x1e/0xd3 led_blink_set+0x3f/0x44 tpt_trig_timer+0xdb/0x106 ieee80211_mod_tpt_led_trig+0xed/0x112 Fixes: `0db37915d9` ("leds: avoid races with workqueue") Signed-off-by: Pavel Machek <pavel@ucw.cz> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>	2019-05-31 22:29:14 +02:00
Chris Wilson	d82b4b2621	drm/i915: Report all objects with allocated pages to the shrinker Currently, we try to report to the shrinker the precise number of objects (pages) that are available to be reaped at this moment. This requires searching all objects with allocated pages to see if they fulfill the search criteria, and this count is performed quite frequently. (The shrinker tries to free ~128 pages on each invocation, before which we count all the objects; counting takes longer than unbinding the objects!) If we take the pragmatic view that with sufficient desire, all objects are eventually reapable (they become inactive, or no longer used as framebuffer etc), we can simply return the count of pinned pages maintained during get_pages/put_pages rather than walk the lists every time. The downside is that we may (slightly) over-report the number of objects/pages we could shrink and so penalize ourselves by shrinking more than required. This is mitigated by keeping the order in which we shrink objects such that we avoid penalizing active and frequently used objects, and if memory is so tight that we need to free them we would need to anyway. v2: Only expose shrinkable objects to the shrinker; a small reduction in not considering stolen and foreign objects. v3: Restore the tracking from a "backup" copy from before the gem/ split Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190530203500.26272-2-chris@chris-wilson.co.uk	2019-05-31 21:23:51 +01:00
Chris Wilson	3b4fa9640c	drm/i915: Track the purgeable objects on a separate eviction list Currently the purgeable objects, I915_MADV_DONTNEED, are mixed in the normal bound/unbound lists. Every shrinker pass starts with an attempt to purge from this set of unneeded objects, which entails us doing a walk over both lists looking for any candidates. If there are none, and since we are shrinking we can reasonably assume that the lists are full!, this becomes a very slow futile walk. If we separate out the purgeable objects into own list, this search then becomes its own phase that is preferentially handled during shrinking. Instead the cost becomes that we then need to filter the purgeable list if we want to distinguish between bound and unbound objects. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190530203500.26272-1-chris@chris-wilson.co.uk	2019-05-31 21:23:51 +01:00
Erez Alfasi	ca6c7df00a	net/mlx5e: TX, Improve performance under GSO workload __netdev_tx_sent_queue() was introduced by: commit `3e59020abf` ("net: bql: add __netdev_tx_sent_queue()") BQL counters should be updated without flipping/caring about BQL status, if the current skb has xmit_more set. Using __netdev_tx_sent_queue() avoids messing with BQL stop flag, increases performance on GSO workload by keeping doorbells to the minimum required and also sparing atomic operations. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 13:04:26 -07:00
Oz Shlomo	10caabdaad	net/mlx5e: Use termination table for VLAN push actions HW does not support push VLAN action in the RX direction (packets arriving from the wire). The FW works around this limitation by haripining the packet. The hairpin workaround applies only when the push VLAN action is specified in a termination table, assuring that there are no actions following the haripin. Instantiate termination table for push VLAN actions. Re-use identical terminating tables for increased HW cache efficiency. Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 13:04:26 -07:00
Yevgeny Kliteynik	9272e3df30	net/mlx5e: Geneve, Add support for encap/decap flows offload Add HW offloading support for flows with Geneve encap/decap. Notes about decap flows with Geneve TLV Options: - Support offloading of 32-bit options data only - At any given time, only one combination of class/type parameters can be offloaded, but the same class/type combination can have many different flows offloaded with different 32-bit option data - Options with value of 0 can't be offloaded Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 13:04:26 -07:00
Yevgeny Kliteynik	d386939a32	net/mlx5e: Rearrange tc tunnel code in a modular way Rearrange tc tunnel code so that it would be easy to add future tunnels: - Define tc tunnel object with the fields and callbacks that any tunnel must implement. - Define tc UDP tunnel object for UDP tunnels, such as VXLAN - Move each tunnel code (GRE, VXLAN) to its own separate file - Rewrite tc tunnel implementation in a general way - using only the objects and their callbacks. Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 13:04:26 -07:00
Yevgeny Kliteynik	1f6da30697	net/mlx5e: Geneve, Keep tunnel info as pointer to the original struct In mlx5e encap entry structure, IP tunnel info data structure is copied by value. This approach worked till now, but it breaks when there are encapsulation options, such as in case of Geneve. These options are stored in the structure that is allocated adjacent to the IP tunnel info struct, and not pointed at by any field in that struct. Therefore, when copying the struct by value, we loose the address of the original struct and can't get to the encapsulation options. Fix the problem by storing the pointer to the tunnel info data instead. Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 13:04:26 -07:00
Yevgeny Kliteynik	0ccc171ea6	net/mlx5: Geneve, Manage Geneve TLV options Use Geneve TLV Options object to manage the flex parser matching on the 32-bit options data. When the first flow with a certain class/type values is requested to be offloaded, create a FW object with FW command (Geneve TLV Options general object) and start counting the number of flows using this object. During this time, any request with a different class/type values will fail to be offloaded. Once the refcount reaches 0, destroy the TLV options general object, and can now offload a flow with any class/type parameters. Geneve TLV Options object is added to core device. It is currently used to manage Geneve TLV options general object allocation in FW and its reference counting only. In the future it will also be used for managing geneve ports by registering callbacks for ndo_udp_tunnel_add/del. Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 13:04:25 -07:00
Yevgeny Kliteynik	d4a18e16c5	net/mlx5e: Enable setting multiple match criteria for flow group When filling in flow spec match criteria, to allow previous modifications of the match criteria, use "\|=" rather than "=". Tunnel options are parsed before the match criteria of the offloaded flow are being set. If the the flow that we're about to offload has encapsulation options, the flow group might need to match on additional criteria. For Geneve, an additional flow group matching parameter should be used - misc3. The appropriate bit in the match criteria is set while parsing the tunnel options, so the criteria value shouldn't be overwritten. This is a pre-step for supporting Geneve TLV options offload. Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 13:04:25 -07:00
Tonghao Zhang	d1bda7eecd	net/mlx5e: Allow matching only enc_key_id/enc_dst_port for decapsulation action In some case, we don't care the enc_src_ip and enc_dst_ip, and if we don't match the field enc_src_ip and enc_dst_ip, we can use fewer flows in hardware when revice the tunnel packets. For example, the tunnel packets may be sent from different hosts, we must offload one rule for each host. $ tc filter add dev vxlan0 protocol ip parent ffff: prio 1 \ flower dst_mac 00:11:22:33:44:00 \ enc_src_ip Host0_IP enc_dst_ip 2.2.2.100 \ enc_dst_port 4789 enc_key_id 100 \ action tunnel_key unset action mirred egress redirect dev eth0_1 $ tc filter add dev vxlan0 protocol ip parent ffff: prio 1 \ flower dst_mac 00:11:22:33:44:00 \ enc_src_ip Host1_IP enc_dst_ip 2.2.2.100 \ enc_dst_port 4789 enc_key_id 100 \ action tunnel_key unset action mirred egress redirect dev eth0_1 If we support flows which only match the enc_key_id and enc_dst_port, a flow can process the packets sent to VM which (mac 00:11:22:33:44:00). $ tc filter add dev vxlan0 protocol ip parent ffff: prio 1 \ flower dst_mac 00:11:22:33:44:00 \ enc_dst_port 4789 enc_key_id 100 \ action tunnel_key unset action mirred egress redirect dev eth0_1 Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 13:04:25 -07:00
Vu Pham	9b81d5a994	net/mlx5e: Generalize vport type in vport representor Beside the special vports (PF/uplink/ecpf), the rest of the vports are similar. Remove vf_ prefix from function and variable names. This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 13:04:25 -07:00
Saeed Mahameed	7fe4d43ecc	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux This series provides some low level updates for mlx5 driver needed for both rdma and netdev trees. 1) Termination flow steering table bits and hardware definitions. 2) Introduce the core dump HW access registers definitions. 3) Refactor and cleans-up VF representors functions handlers. 4) Renames host_params bits to function_changed bits and add the support for eswitch functions change event in the eswitch general case. (for both legacy and switchdev modes). 5) Potential error pointer dereference in error handling Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 13:04:06 -07:00
David S. Miller	6912378d54	Merge branch 'phylink-sfp-updates' Russell King says: ==================== phylink/sfp updates This is a series of updates to phylink and sfp: - Remove an unused net device argument from the phylink MII ioctl emulation code. - add support for using interrupts when using a GPIO for link status tracking, rather than polling it at one second intervals. This reduces the need to wakeup the CPU every second. - add support to the MII ioctl API to read and write Clause 45 PHY registers. I don't know how desirable this is for mainline, but I have used this facility extensively to investigate the Marvell 88x3310 PHY. A recent illustration of use for this was debugging the PHY-without-firmware problem recently reported. - add mandatory attach/detach methods for the upstream side of sfp bus code, which will allow us to remove the "netdev" structure from the SFP layers. - remove the "netdev" structure from the SFP upstream registration calls, which simplifies PHY to SFP links. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-31 12:37:46 -07:00
Russell King	54f70b3ba3	net: sfp: remove sfp-bus use of netdevs The sfp-bus code now no longer has any use for the network device structure, so remove its use. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-31 12:37:46 -07:00
Russell King	320587e6ea	net: sfp: add mandatory attach/detach methods for sfp buses Add attach and detach methods for SFP buses, which will allow us to get rid of the netdev storage in sfp-bus. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-31 12:37:46 -07:00
Russell King	cdea04c246	net: phy: allow Clause 45 access via mii ioctl Allow userspace to generate Clause 45 MII access cycles via phylib. This is useful for tools such as mii-diag to be able to inspect Clause 45 PHYs. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-31 12:37:46 -07:00
Russell King	7b3b0e89bc	net: phylink: support for link gpio interrupt Add support for using GPIO interrupts with a fixed-link GPIO rather than polling the GPIO every second and invoking the phylink resolution. This avoids unnecessary calls to mac_config(). Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-31 12:37:46 -07:00
Russell King	7fdc455eef	net: phylink: remove netdev from phylink mii ioctl emulation The netdev used in the phylink ioctl emulation is never used, so let's remove it. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-31 12:37:46 -07:00
Parav Pandit	8693115af4	{IB,net}/mlx5: Constify rep ops functions pointers Currently for every representor type and for every single vport, representer function pointers copy is stored even though they don't change from one to other vport. Additionally priv data entry for the rep is not passed during registration, but its copied. It is used (set and cleared) by the user of the reps. As we want to scale vports, to simplify and also to split constants from data, 1. Rename mlx5_eswitch_rep_if to mlx5_eswitch_rep_ops as to match _ops prefix with other standard netdev, ibdev ops. 2. Constify the IB and Ethernet rep ops structure. 3. Instead of storing copy of all rep function pointers, store copy per eswitch rep type. 4. Split data and function pointers to mlx5_eswitch_rep_ops and mlx5_eswitch_rep_data. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 12:28:14 -07:00
Parav Pandit	c94ff74877	{IB, net}/mlx5: No need to typecast from void* to mlx5_ib_dev* Avoid typecasting from void* to mlx5_ib_dev* or mlx5e_rep_priv* as it is not needed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 12:28:14 -07:00
Vu Pham	6706a3b94f	net/mlx5: E-Switch, Honor eswitch functions changed event cap Whenever device supports eswitch functions changed event, honor such device setting. Do not limit it to ECPF. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 12:28:14 -07:00
Vu Pham	cd56f929e6	net/mlx5: E-Switch, Replace host_params event with functions_changed event To support sriov on a E-Switch manager, num_vfs are queried to the firmware whenever E-Switch manager is notified by esw_functions_changed event. Replace host_params event with esw_functions_changed event that reflects more appropriate naming. While at it, also correct num_vfs type from int to u16 as expected by the function mlx5_esw_query_functions(). Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 12:28:14 -07:00
Eli Britstein	c6d4e45d3b	net/mlx5: Introduce termination table bits Termination table is a flow table with a termination flag. The flag allows the firmware to assume that the the specified actions are the last actions list. This assumption allows the FW to safely perform potential looping logic (e.g. hairpin). Introduce the bits for this attribute. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 12:28:14 -07:00
Moshe Shemesh	0b9055a112	net/mlx5: Add core dump register access HW bits Add Firmware core dump registers and HW definitions. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 12:28:14 -07:00
Lijun Ou	97545b1022	RDMA/hns: Bugfix for posting multiple srq work request When the user submits more than 32 work request to a srq queue at a time, it needs to find the corresponding number of entries in the bitmap in the idx queue. However, the original lookup function named ffs only processes 32 bits of the array element, When the number of srq wqe issued exceeds 32, the ffs will only process the lower 32 bits of the elements, it will not be able to get the correct wqe index for srq wqe. Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-31 16:11:02 -03:00
Tejun Heo	a5e112e642	cgroup: add cgroup_parse_float() cgroup already uses floating point for percent[ile] numbers and there are several controllers which want to take them as input. Add a generic parse helper to handle inputs. Update the interface convention documentation about the use of percentage numbers. While at it, also clarify the default time unit. Signed-off-by: Tejun Heo <tj@kernel.org>	2019-05-31 11:48:40 -07:00
Linus Torvalds	d266b3f5ca	Merge branch 'next-fixes-for-5.2-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity subsystem fixes from Mimi Zohar: "Four bug fixes, none 5.2-specific, all marked for stable. The first two are related to the architecture specific IMA policy support. The other two patches, one is related to EVM signatures, based on additional hash algorithms, and the other is related to displaying the IMA policy" * 'next-fixes-for-5.2-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: show rules with IMA_INMASK correctly evm: check hash algorithm passed to init_desc() ima: fix wrong signed policy requirement when not appraising x86/ima: Check EFI_RUNTIME_SERVICES before using	2019-05-31 11:08:44 -07:00
Linus Torvalds	8164c5719b	Merge tag 'for-linus-5.2b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "One minor cleanup patch and a fix for handling of live migration when running as Xen guest" * tag 'for-linus-5.2b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xenbus: Avoid deadlock during suspend due to open transactions xen/pvcalls: Remove set but not used variable	2019-05-31 10:53:34 -07:00
Johannes Weiner	7b785645e8	mm: fix page cache convergence regression Since `a283348629` ("page cache: Finish XArray conversion"), on most major Linux distributions, the page cache doesn't correctly transition when the hot data set is changing, and leaves the new pages thrashing indefinitely instead of kicking out the cold ones. On a freshly booted, freshly ssh'd into virtual machine with 1G RAM running stock Arch Linux: [root@ham ~]# ./reclaimtest.sh + dd of=workingset-a bs=1M count=0 seek=600 + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + ./mincore workingset-a 153600/153600 workingset-a + dd of=workingset-b bs=1M count=0 seek=600 + cat workingset-b + cat workingset-b + cat workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 104029/153600 workingset-a 120086/153600 workingset-b + cat workingset-b + cat workingset-b + cat workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 104029/153600 workingset-a 120268/153600 workingset-b workingset-b is a 600M file on a 1G host that is otherwise entirely idle. No matter how often it's being accessed, it won't get cached. While investigating, I noticed that the non-resident information gets aggressively reclaimed - /proc/vmstat::workingset_nodereclaim. This is a problem because a workingset transition like this relies on the non-resident information tracked in the page cache tree of evicted file ranges: when the cache faults are refaults of recently evicted cache, we challenge the existing active set, and that allows a new workingset to establish itself. Tracing the shrinker that maintains this memory revealed that all page cache tree nodes were allocated to the root cgroup. This is a problem, because 1) the shrinker sizes the amount of non-resident information it keeps to the size of the cgroup's other memory and 2) on most major Linux distributions, only kernel threads live in the root cgroup and everything else gets put into services or session groups: [root@ham ~]# cat /proc/self/cgroup 0::/user.slice/user-0.slice/session-c1.scope As a result, we basically maintain no non-resident information for the workloads running on the system, thus breaking the caching algorithm. Looking through the code, I found the culprit in the above-mentioned patch: when switching from the radix tree to xarray, it dropped the __GFP_ACCOUNT flag from the tree node allocations - the flag that makes sure the allocated memory gets charged to and tracked by the cgroup of the calling process - in this case, the one doing the fault. To fix this, allow xarray users to specify per-tree flag that makes xarray allocate nodes using __GFP_ACCOUNT. Then restore the page cache tree annotation to request such cgroup tracking for the cache nodes. With this patch applied, the page cache correctly converges on new workingsets again after just a few iterations: [root@ham ~]# ./reclaimtest.sh + dd of=workingset-a bs=1M count=0 seek=600 + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + ./mincore workingset-a 153600/153600 workingset-a + dd of=workingset-b bs=1M count=0 seek=600 + cat workingset-b + ./mincore workingset-a workingset-b 124607/153600 workingset-a 87876/153600 workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 81313/153600 workingset-a 133321/153600 workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 63036/153600 workingset-a 153600/153600 workingset-b Cc: stable@vger.kernel.org # 4.20+ Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2019-05-31 13:52:41 -04:00
David S. Miller	b4b12b0d2f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net The phylink conflict was between a bug fix by Russell King to make sure we have a consistent PHY interface mode, and a change in net-next to pull some code in phylink_resolve() into the helper functions phylink_mac_link_{up,down}() On the dp83867 side it's mostly overlapping changes, with the 'net' side removing a condition that was supposed to trigger for RGMII but because of how it was coded never actually could trigger. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-31 10:49:43 -07:00
Linus Torvalds	27a03b1a71	Merge tag 's390-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Heiko Carstens: - Farewell Martin Schwidefsky: add Martin to CREDITS and remove him from MAINTAINERS - Vasily Gorbik and Christian Borntraeger join as maintainers for s390 - Fix locking bug in ctr(aes) and ctr(des) s390 specific ciphers - A rather large patch which fixes gcm-aes-s390 scatter gather handling - Fix zcrypt wrong dispatching for control domain CPRBs - Fix assignment of bus resources in PCI code - Fix structure definition for set PCI function - Fix one compile error and one compile warning seen when CONFIG_OPTIMIZE_INLINING is enabled * tag 's390-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: MAINTAINERS: add Vasily Gorbik and Christian Borntraeger for s390 MAINTAINERS: Farewell Martin Schwidefsky s390/crypto: fix possible sleep during spinlock aquired s390/crypto: fix gcm-aes-s390 selftest failures s390/zcrypt: Fix wrong dispatching for control domain CPRBs s390/pci: fix assignment of bus resources s390/pci: fix struct definition for set PCI function s390: mark __cpacf_check_opcode() and cpacf_query_func() as __always_inline s390: add unreachable() to dump_fault_info() to fix -Wmaybe-uninitialized	2019-05-31 10:49:25 -07:00
Tejun Heo	c03cd7738a	cgroup: Include dying leaders with live threads in PROCS iterations CSS_TASK_ITER_PROCS currently iterates live group leaders; however, this means that a process with dying leader and live threads will be skipped. IOW, cgroup.procs might be empty while cgroup.threads isn't, which is confusing to say the least. Fix it by making cset track dying tasks and include dying leaders with live threads in PROCS iteration. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Topi Miettinen <toiwoton@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com>	2019-05-31 10:38:58 -07:00
Tejun Heo	b636fd38dc	cgroup: Implement css_task_iter_skip() When a task is moved out of a cset, task iterators pointing to the task are advanced using the normal css_task_iter_advance() call. This is fine but we'll be tracking dying tasks on csets and thus moving tasks from cset->tasks to (to be added) cset->dying_tasks. When we remove a task from cset->tasks, if we advance the iterators, they may move over to the next cset before we had the chance to add the task back on the dying list, which can allow the task to escape iteration. This patch separates out skipping from advancing. Skipping only moves the affected iterators to the next pointer rather than fully advancing it and the following advancing will recognize that the cursor has already been moved forward and do the rest of advancing. This ensures that when a task moves from one list to another in its cset, as long as it moves in the right direction, it's always visible to iteration. This doesn't cause any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com>	2019-05-31 10:38:58 -07:00

... 302 303 304 305 306 ...

858756 Commits