linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-09 20:32:04 +09:00

Author	SHA1	Message	Date
David Ahern	0f457a3662	ipv4: Move cached routes to fib_nh_common While the cached routes, nh_pcpu_rth_output and nh_rth_input, are IPv4 specific, a later patch wants to make them accessible for IPv6 nexthops with IPv4 routes using a fib6_nh. Move the cached routes from fib_nh to fib_nh_common and update references. Initialization of the cached entries is moved to fib_nh_common_init, and free is moved to fib_nh_common_release. Change in location only, from fib_nh up to fib_nh_common; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-05 00:47:16 -07:00
William Tu	7080da8909	libbpf: add libbpf_util.h to header install. The libbpf_util.h is used by xsk.h, so add it to the install headers. Reported-by: Ben Pfaff <blp@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-05-05 00:06:59 -07:00
Vineet Gupta	ca31ca8247	tools/bpf: fix perf build error with uClibc (seen on ARC) When build perf for ARC recently, there was a build failure due to lack of __NR_bpf. \| Auto-detecting system features: \| \| ... get_cpuid: [ OFF ] \| ... bpf: [ on ] \| \| # error __NR_bpf not defined. libbpf does not support your arch. ^~~~~ \| bpf.c: In function 'sys_bpf': \| bpf.c:66:17: error: '__NR_bpf' undeclared (first use in this function) \| return syscall(__NR_bpf, cmd, attr, size); \| ^~~~~~~~ \| sys_bpf Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-05-05 00:04:01 -07:00
Masahiro Yamada	a7d0067147	bpftool: exclude bash-completion/bpftool from .gitignore pattern tools/bpf/bpftool/.gitignore has the "bpftool" pattern, which is intended to ignore the following build artifact: tools/bpf/bpftool/bpftool However, the .gitignore entry is effective not only for the current directory, but also for any sub-directories. So, from the point of .gitignore grammar, the following check-in file is also considered to be ignored: tools/bpf/bpftool/bash-completion/bpftool As the manual gitignore(5) says "Files already tracked by Git are not affected", this is not a problem as far as Git is concerned. However, Git is not the only program that parses .gitignore because .gitignore is useful to distinguish build artifacts from source files. For example, tar(1) supports the --exclude-vcs-ignore option. As of writing, this option does not work perfectly, but it intends to create a tarball excluding files specified by .gitignore. So, I believe it is better to fix this issue. You can fix it by prefixing the pattern with a slash; the leading slash means the specified pattern is relative to the current directory. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-05-04 23:43:57 -07:00
Alexei Starovoitov	ec1c8fa04f	Merge branch 'af_xdp-fixes' Björn Töpel says: ==================== William found two bugs, when doing socket teardown within the same process. The first issue was an invalid munmap call, and the second one was an invalid XSKMAP cleanup. Both resulted in that the process kept references to the socket, which was not correctly cleaned up. When a new socket was created, the bind() call would fail, since the old socket was still lingering, refusing to give up the queue on the netdev. More details can be found in the individual commits. Thanks, Björn ==================== Reviewed-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-05-04 23:26:38 -07:00
Björn Töpel	5750902a6e	libbpf: proper XSKMAP cleanup The bpf_map_update_elem() function, when used on an XSKMAP, will fail if not a valid AF_XDP socket is passed as value. Therefore, this is function cannot be used to clear the XSKMAP. Instead, the bpf_map_delete_elem() function should be used for that. This patch also simplifies the code by breaking up xsk_update_bpf_maps() into three smaller functions. Reported-by: William Tu <u9012063@gmail.com> Fixes: `1cad078842` ("libbpf: add support for using AF_XDP sockets") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: William Tu <u9012063@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-05-04 23:26:22 -07:00
Björn Töpel	0e6741f092	libbpf: fix invalid munmap call When unmapping the AF_XDP memory regions used for the rings, an invalid address was passed to the munmap() calls. Instead of passing the beginning of the memory region, the descriptor region was passed to munmap. When the userspace application tried to tear down an AF_XDP socket, the operation failed and the application would still have a reference to socket it wished to get rid of. Reported-by: William Tu <u9012063@gmail.com> Fixes: `1cad078842` ("libbpf: add support for using AF_XDP sockets") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: William Tu <u9012063@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-05-04 23:26:22 -07:00
Yonghong Song	6cea33701e	selftests/bpf: set RLIMIT_MEMLOCK properly for test_libbpf_open.c Test test_libbpf.sh failed on my development server with failure -bash-4.4$ sudo ./test_libbpf.sh [0] libbpf: Error in bpf_object__probe_name():Operation not permitted(1). Couldn't load basic 'r0 = 0' BPF program. test_libbpf: failed at file test_l4lb.o selftests: test_libbpf [FAILED] -bash-4.4$ The reason is because my machine has 64KB locked memory by default which is not enough for this program to get locked memory. Similar to other bpf selftests, let us increase RLIMIT_MEMLOCK to infinity, which fixed the issue. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-05-04 23:23:51 -07:00
YueHaibing	71f150f4c2	bpf: Use PTR_ERR_OR_ZERO in bpf_fd_sk_storage_update_elem() Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-05-04 23:20:58 -07:00
Martyna Szapar	0b63644602	i40e: Memory leak in i40e_config_iwarp_qvlist Added freeing the old allocation of vf->qvlist_info in function i40e_config_iwarp_qvlist before overwriting it with the new allocation. Signed-off-by: Martyna Szapar <martyna.szapar@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 17:45:22 -07:00
Martyna Szapar	24474f2709	i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c Fixed possible memory leak in i40e_vc_add_cloud_filter function: cfilter is being allocated and in some error conditions the function returns without freeing the memory. Fix of integer truncation from u16 (type of queue_id value) to u8 when calling i40e_vc_isvalid_queue_id function. Signed-off-by: Martyna Szapar <martyna.szapar@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 17:40:25 -07:00
Gustavo A. R. Silva	825f0a4eb7	i40e: Use struct_size() in kzalloc() One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; size = sizeof(struct foo) + count * sizeof(struct boo); instance = kzalloc(size, GFP_KERNEL) Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL) Notice that, in this case, variable size is not necessary, hence it is removed. This code was detected with the help of Coccinelle. Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 17:34:43 -07:00
Maciej Paczkowski	0a92892c69	i40e: Revert ShadowRAM checksum calculation change The reason of this revert is unexpected issue found in NVM Update tool during NVM image downgrade. The implementation is no longer needed since the QV tools are already aware of new FW double ShadowRAM dump mechanism. This patch reverts ShadowRAM checksum calculation change introduced in commit 9d12f0c4e436 ("i40e: Revert ShadowRAM checksum calculation change") Signed-off-by: Maciej Paczkowski <maciej.paczkowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 17:30:58 -07:00
Martyna Szapar	d29e0d233e	i40e: missing input validation on VF message handling by the PF Patch is adding missing input validation on VF message handling by the PF to the functions with opcodes: VIRTCHNL_OP_CONFIG_VSI_QUEUES = 6 VIRTCHNL_OP_CONFIG_IRQ_MAP = 7, VIRTCHNL_OP_DISABLE_QUEUES = 9, VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE = 14, Signed-off-by: Martyna Szapar <martyna.szapar@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 17:28:04 -07:00
Aleksandr Loktionov	2e45d3f467	i40e: Add support for X710 B/P & SFP+ cards New device ids are created to support X710 backplane and SFP+ cards. This patch adds in i40e driver support for 2.5GbaseT and 5GbaseT speed. It's implemented by checking I40E_CAP_PHY_TYPE_2_5GBASE_T, I40E_CAP_PHY_TYPE_5GBASE_T bits from f/w and setting corresponding bits in ethtool link ksettings supported and advertising masks. Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 17:24:48 -07:00
Grzegorz Siwik	c004804dce	i40e: Wrong truncation from u16 to u8 In this patch fixed wrong truncation method from u16 to u8 during validation. It was changed by changing u8 to u32 parameter in method declaration and arguments were changed to u32. Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 17:22:48 -07:00
Sergey Nemov	7015ca3df9	i40e: add num_vectors checker in iwarp handler Field num_vectors from struct virtchnl_iwarp_qvlist_info should not be larger than num_msix_vectors_vf in the hw struct. The iwarp uses the same set of vectors as the LAN VF driver. Signed-off-by: Sergey Nemov <sergey.nemov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 17:21:07 -07:00
Grzegorz Siwik	1aa874b42e	i40e: Fix the typo in adding 40GE KR4 mode This patch fixes the typo in I40E_CAP_PHY_TYPE mode link code. It was fixed by changing 40000baseLR4_Full to 40000baseKR4_Full Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 17:16:47 -07:00
Grzegorz Siwik	40a23040d8	i40e: Setting VF to VLAN 0 requires restart This patch fixes a bug where changing VLAN to 0 was not set until VF restart. Now we are setting pvid info to 0 when we have to change VLAN to 0. Without this change when VF VLAN was changed to 0 nothing happened until VF restart. For changing to VLAN different than 0 it worked correctly. Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 17:13:30 -07:00
Aleksandr Loktionov	e576e76966	i40e: add new pci id for X710/XXV710 N3000 cards New device ids are created to support X710/XXV710 N3000 cards. Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 17:10:52 -07:00
Grzegorz Siwik	937f599a11	i40e: VF's promiscuous attribute is not kept This patch fixes a bug where the promiscuous mode was not being kept when the VF switched to a new VLAN. Now we are config two times a promiscuous mode when we switch VLAN. Without this change when we change VF VLAN we still receive all the packets from previous VLAN and only unicast from new VLAN. Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 16:55:34 -07:00
Michal Swiatkowski	64439f8f0b	ice: Disable sniffing VF traffic on PF Delete code that add default Tx rule on PF. With this rule PF can see Tx VF traffic that should go outside. For traffic from VF to another VF default Tx rule on PF doesn't apply because of lower priority than VF mac rule. With this change on PF in promisc mode we can see only Rx traffic that doesn't match any other rule (mac etc.). We can't see Tx traffic from other VSI. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 14:44:47 -07:00
Jesse Brandeburg	0690527014	ice: Use more efficient structures Move a bunch of members around to make more efficient use of memory, eliminating holes where possible. None of these members are hot path so cache line alignment is not very important here. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 14:40:36 -07:00
Jesse Brandeburg	0437f1a98a	ice: Use bitfields where possible The driver was converted to not use bool, but it was neglected that the bools should have been converted to bit fields as bit fields in software structures are ok, as long as they use the correct kinds of unsigned types. This avoids wasting lots of storage space to store single bit values. One of the change hunks moves a variable lport out of a group of "combinable" bit fields because all bits of the u8 lport are valid and the variable can be packed in the struct in struct holes. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 14:38:57 -07:00
Akeem G Abodunrin	d95276ced0	ice: Add function to program ethertype based filter rule on VSIs This patch adds function to program VSI with ethertype based filter rule, so that all flow control frames would be disallowed from being transmitted to the client, in order to prevent malicious VSI, especially VF from sending out PAUSE or PFC frames, and then control other VSIs traffic. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 14:36:28 -07:00
Tony Nguyen	8f529ff912	ice: Separate if conditions for ice_set_features() Set features can have multiple features turned on\|off in a single call. Grouping these all in an if/else means after one condition is met, other conditions/features will not be evaluated. Break the if/else statements by feature to ensure all features will be handled properly. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 14:33:44 -07:00
Tony Nguyen	a03499d614	ice: Remove __always_unused attribute The variable netdev is being used in this function; remove the __always_unused attribute from it. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 14:31:59 -07:00
Bruce Allan	c3a6825e82	ice: Suppress false-positive style issues reported by static analyzer A recent version of cppcheck falsely reports- Variable ip.hdr is assigned a value that is never used. ip is a union so the pointer ip.hdr is actually used when referenced as ip.v4 and ip.v6. Silence these false reports when using cppcheck with the --inline-suppr command-line option. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 14:30:05 -07:00
Brett Creeley	e40c899a64	ice: Refactor getting/setting coalesce Currently if the driver has an uneven amount of Rx/Tx queues setting the coalesce settings through ethtool will result in an error. This is happening because in the setting coalesce flow we are reporting an error if either Rx or Tx fails. Also, the flow for setting/getting per_q_coalesce and setting/getting coalesce settings for the entire device is different. Fix these issues by adding one function, ice_set_q_coalesce(), and another, ice_get_q_coalesce(), that both getting/setting per_q and entire device coalesce can use. This makes handling the error cases generic between the two flows and simplifies __ice_set_coalesce() and __ice_get_coalesce(). Also, add a header comment to __ice_set_coalesce(). Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 14:25:26 -07:00
Brett Creeley	a85a3847fb	ice: Always free/allocate q_vectors Currently when probing/removing the driver we allocate/deallocate each vsi->q_vectors array in ice_vsi_alloc_arrays() and ice_vsi_free_arrays() respectively. However, we don't do this during the reset and VSI rebuild flow. This is inconsistent and unnecessary to have a difference between the two flows. This patch makes the change to always allocate/deallocate the vsi->q_vectors array regardless of the driver flow we are in. Also, update the comment for ice_vsi_free_arrays() to be more descriptive. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 14:22:55 -07:00
Bruce Allan	207e3721ac	ice: Do not unnecessarily initialize local variable The local variable speed does not need to be initialized and can cause some static analysis tools to complain the initial assigned value is never used. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 14:21:01 -07:00
Michal Swiatkowski	ba0db585bd	ice: Add more validation in ice_vc_cfg_irq_map_msg Add few checks to validate msg from iavf driver. Test if we have got enough q_vectors allocated in VSI connected with VF. Add masks for itr_indx and msix_indx to avoid writing to reserved fieldi of QINT. Clear q_vector->num_ring_rx/tx, without it we can increment this value every time we send irq map msg from VF. So after second call this value will be incorrect. Decrement num_vectors from msg, because last vector in iavf msg is misc vector (we don't set map for it). Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 14:18:27 -07:00
Akeem G Abodunrin	bb877b22bc	ice: Don't remove VLAN filters that were never programmed In case of non-trusted VFs, it is possible to program VLAN filter far less than what is requested by the VF originally, thereby makes number of VLAN elements being tracked by VF different from actual VLAN tags. This patch makes sure that we are not attempting to remove VLAN filter that does not exist. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 14:07:34 -07:00
Tony Nguyen	e80e76db6c	ice: Preserve VLAN Rx stripping settings When Tx insertion is set, we are not accounting for the state of Rx stripping. This causes Rx stripping to be enabled any time Tx insertion is changed, even when it's supposed to be disabled. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 13:45:00 -07:00
Michal Swiatkowski	a52db6b260	ice: Fix for allowing too many MDD events on VF Disable VF if any malicious device driver (MDD) event is detected by hardware. Track vf->num_mdd_events for information about VF MDD events. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 13:42:53 -07:00
Jesse Brandeburg	819d899863	ice: Use pf instead of vsi-back Many times in our functions we have a local variable pf, which is equivalent to vsi->back. Just use pf consistently instead of vsi->back where available. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-05-04 13:06:56 -07:00
Linus Torvalds	6203838dec	Merge tag 'powerpc-5.1-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "One regression fix. Changes we merged to STRICT_KERNEL_RWX on 32-bit were causing crashes under load on some machines depending on memory layout. Thanks to Christophe Leroy" * tag 'powerpc-5.1-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/32s: Fix BATs setting with CONFIG_STRICT_KERNEL_RWX	2019-05-04 12:24:05 -07:00
YueHaibing	de166bbe86	ieee802154: hwsim: unregister hw while hwsim_subscribe_all_others fails KASAN report this: kernel BUG at net/mac802154/main.c:130! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 0 PID: 19932 Comm: modprobe Not tainted 5.1.0-rc6+ #22 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:ieee802154_free_hw+0x2a/0x30 [mac802154] Code: 55 48 8d 57 38 48 89 e5 53 48 89 fb 48 8b 47 38 48 39 c2 75 15 48 8d 7f 48 e8 82 85 16 e1 48 8b 7b 28 e8 f9 ef 83 e2 5b 5d c3 <0f> 0b 0f 1f 40 00 55 48 89 e5 53 48 89 fb 0f b6 86 80 00 00 00 88 RSP: 0018:ffffc90001c7b9f0 EFLAGS: 00010206 RAX: ffff88822df3aa80 RBX: ffff88823143d5c0 RCX: 0000000000000002 RDX: ffff88823143d5f8 RSI: ffff88822b1fabc0 RDI: ffff88823143d5c0 RBP: ffffc90001c7b9f8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffff4 R13: ffff88822dea4f50 R14: ffff88823143d7c0 R15: 00000000fffffff4 FS: 00007ff52e999540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdc06dba768 CR3: 000000023160a000 CR4: 00000000000006f0 Call Trace: hwsim_add_one+0x2dd/0x540 [mac802154_hwsim] hwsim_probe+0x2f/0xb0 [mac802154_hwsim] platform_drv_probe+0x3a/0x90 ? driver_sysfs_add+0x79/0xb0 really_probe+0x1d4/0x2d0 driver_probe_device+0x50/0xf0 device_driver_attach+0x54/0x60 __driver_attach+0x7e/0xd0 ? device_driver_attach+0x60/0x60 bus_for_each_dev+0x68/0xc0 driver_attach+0x19/0x20 bus_add_driver+0x15e/0x200 driver_register+0x5b/0xf0 __platform_driver_register+0x31/0x40 hwsim_init_module+0x74/0x1000 [mac802154_hwsim] ? 0xffffffffa00e9000 do_one_initcall+0x6c/0x3cc ? kmem_cache_alloc_trace+0x248/0x3b0 do_init_module+0x5b/0x1f1 load_module+0x1db1/0x2690 ? m_show+0x1d0/0x1d0 __do_sys_finit_module+0xc5/0xd0 __x64_sys_finit_module+0x15/0x20 do_syscall_64+0x6b/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7ff52e4a2839 Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffffa7b3c08 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 00005647560a2a00 RCX: 00007ff52e4a2839 RDX: 0000000000000000 RSI: 00005647547f3c2e RDI: 0000000000000003 RBP: 00005647547f3c2e R08: 0000000000000000 R09: 00005647560a2a00 R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000 R13: 00005647560a2c10 R14: 0000000000040000 R15: 00005647560a2a00 Modules linked in: mac802154_hwsim(+) mac802154 [last unloaded: mac802154_hwsim] In hwsim_add_one, if hwsim_subscribe_all_others fails, we should call ieee802154_unregister_hw to free resources. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: `f25da51fdc` ("ieee802154: hwsim: add replacement for fakelb") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>	2019-05-04 20:44:39 +02:00
YueHaibing	1cbbbf39ef	ieee802154: hwsim: Fix error handle path in hwsim_init_module KASAN report this: BUG: unable to handle kernel paging request at fffffbfff834f001 PGD 237fe8067 P4D 237fe8067 PUD 237e64067 PMD 1c968d067 PTE 0 Oops: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 8871 Comm: syz-executor.0 Tainted: G C 5.0.0+ #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:strcmp+0x31/0xa0 lib/string.c:328 Code: 00 00 00 00 fc ff df 55 53 48 83 ec 08 eb 0a 84 db 48 89 ef 74 5a 4c 89 e6 48 89 f8 48 89 fa 48 8d 6f 01 48 c1 e8 03 83 e2 07 <42> 0f b6 04 28 38 d0 7f 04 84 c0 75 50 48 89 f0 48 89 f2 0f b6 5d RSP: 0018:ffff8881e0c57800 EFLAGS: 00010246 RAX: 1ffffffff834f001 RBX: ffffffffc1a78000 RCX: ffffffff827b9503 RDX: 0000000000000000 RSI: ffffffffc1a40008 RDI: ffffffffc1a78008 RBP: ffffffffc1a78009 R08: fffffbfff6a92195 R09: fffffbfff6a92195 R10: ffff8881e0c578b8 R11: fffffbfff6a92194 R12: ffffffffc1a40008 R13: dffffc0000000000 R14: ffffffffc1a3e470 R15: ffffffffc1a40000 FS: 00007fdcc02ff700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffbfff834f001 CR3: 00000001b3134003 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: genl_family_find_byname+0x7f/0xf0 net/netlink/genetlink.c:104 genl_register_family+0x1e1/0x1070 net/netlink/genetlink.c:333 ? 0xffffffffc1978000 hwsim_init_module+0x6a/0x1000 [mac802154_hwsim] ? 0xffffffffc1978000 ? 0xffffffffc1978000 ? 0xffffffffc1978000 do_one_initcall+0xbc/0x47d init/main.c:887 do_init_module+0x1b5/0x547 kernel/module.c:3456 load_module+0x6405/0x8c10 kernel/module.c:3804 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fdcc02fec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000020000200 RDI: 0000000000000003 RBP: 00007fdcc02fec70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fdcc02ff6bc R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004 Modules linked in: mac802154_hwsim(+) mac802154 ieee802154 speakup(C) rc_proteus_2309 rtc_rk808 streebog_generic rds vboxguest madera_spi madera da9052_wdt mISDN_core ueagle_atm usbatm atm ir_imon_decoder scsi_transport_sas rc_dntv_live_dvb_t panel_samsung_s6d16d0 drm drm_panel_orientation_quirks lib80211 fb_agm1264k_fl(C) gspca_pac7302 gspca_main videobuf2_v4l2 soundwire_intel_init i2c_dln2 dln2 usbcore hid_gaff 88pm8607 nfnetlink axp20x_i2c axp20x uio pata_marvell pmbus_core snd_sonicvibes gameport snd_pcm snd_opl3_lib snd_timer snd_hwdep snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore rtc_ds1511 rtc_ds1742 vsock dwc_xlgmac rtc_rx8010 libphy twofish_x86_64_3way twofish_x86_64 twofish_common ad5696_i2c ad5686 lp8788_charger cxd2880_spi dvb_core videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops fbtft(C) sysimgblt sysfillrect syscopyarea fb_sys_fops janz_ican3 firewire_net firewire_core crc_itu_t spi_slave_system_control i2c_matroxfb i2c_algo_bit matroxfb_base fb fbdev matroxfb_DAC1064 matroxfb_accel cfbcopyarea cfbimgblt cfbfillrect matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc leds_blinkm ti_dac7311 intel_spi_pci intel_spi spi_nor hid_elan hid async_tx rc_cinergy_1400 rc_core intel_ishtp kxcjk_1013 industrialio_triggered_buffer kfifo_buf can_dev intel_th spi_pxa2xx_platform pata_artop vme_ca91cx42 gb_gbphy(C) greybus(C) industrialio mptbase st_drv cmac ttpci_eeprom via_wdt gpio_xra1403 mtd iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 input_leds crypto_simd cryptd glue_helper ide_pci_generic piix psmouse ide_core serio_raw ata_generic i2c_piix4 pata_acpi parport_pc parport floppy rtc_cmos intel_agp intel_gtt agpgart sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: speakup] Dumping ftrace buffer: (ftrace buffer empty) CR2: fffffbfff834f001 ---[ end trace 5aa772c793e0e971 ]--- RIP: 0010:strcmp+0x31/0xa0 lib/string.c:328 Code: 00 00 00 00 fc ff df 55 53 48 83 ec 08 eb 0a 84 db 48 89 ef 74 5a 4c 89 e6 48 89 f8 48 89 fa 48 8d 6f 01 48 c1 e8 03 83 e2 07 <42> 0f b6 04 28 38 d0 7f 04 84 c0 75 50 48 89 f0 48 89 f2 0f b6 5d RSP: 0018:ffff8881e0c57800 EFLAGS: 00010246 RAX: 1ffffffff834f001 RBX: ffffffffc1a78000 RCX: ffffffff827b9503 RDX: 0000000000000000 RSI: ffffffffc1a40008 RDI: ffffffffc1a78008 RBP: ffffffffc1a78009 R08: fffffbfff6a92195 R09: fffffbfff6a92195 R10: ffff8881e0c578b8 R11: fffffbfff6a92194 R12: ffffffffc1a40008 R13: dffffc0000000000 R14: ffffffffc1a3e470 R15: ffffffffc1a40000 FS: 00007fdcc02ff700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffbfff834f001 CR3: 00000001b3134003 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 The error handing path misplace the cleanup in hwsim_init_module, switch the two cleanup functions to fix above issues. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: `f25da51fdc` ("ieee802154: hwsim: add replacement for fakelb") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>	2019-05-04 20:44:39 +02:00
Oded Gabbay	19734970c9	habanalabs: force user to set device debug mode This patch adds the implementation of the HL_DEBUG_OP_SET_MODE opcode in the DEBUG IOCTL. It forces the user who wants to debug the device to set the device into debug mode before he can configure the debug engines. The patch also makes sure to disable debug mode upon user releasing FD, in case the user forgot to disable debug mode. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2019-05-04 17:36:06 +03:00
Omer Shpigelman	d1287493ab	habanalabs: minor documentation and prints fixes This patch fixes comments on various structure members and some spelling errors in log messages. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2019-05-05 13:24:24 +03:00
Omer Shpigelman	34a5fab7b6	habanalabs: remove redundant CPU checks This patch removes redundant CPU availability checks in: goya_test_queues() - will be done in goya_test_cpu_queue(). goya_ring_doorbell() - was done earlier in goya_send_cpu_message(). Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2019-05-05 11:21:16 +03:00
Oded Gabbay	cfc2f35006	habanalabs: improve a couple of error messages This patch improves the error message that is shown when a new user tries to open a new FD while there is already an existing user that is working on the device. It also improves the error message in case of missing firmware file. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2019-05-04 16:43:20 +03:00
Ming Lei	662156641b	block: don't drain in-progress dispatch in blk_cleanup_queue() Now freeing hw queue resource is moved to hctx's release handler, we don't need to worry about the race between blk_cleanup_queue and run queue any more. So don't drain in-progress dispatch in blk_cleanup_queue(). This is basically revert of `c2856ae2f3` ("blk-mq: quiesce queue before freeing queue"). Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-04 07:24:11 -06:00
Ming Lei	1b97871b50	blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release hctx is always released after requeue is freed. With holding queue's kobject refcount, it is safe for driver to run queue, so one run queue might be scheduled after blk_sync_queue() is done. So moving the cancel of hctx->run_work into blk_mq_hw_sysfs_release() for avoiding run released queue. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-04 07:24:09 -06:00
Ming Lei	2f8f1336a4	blk-mq: always free hctx after request queue is freed In normal queue cleanup path, hctx is released after request queue is freed, see blk_mq_release(). However, in __blk_mq_update_nr_hw_queues(), hctx may be freed because of hw queues shrinking. This way is easy to cause use-after-free, because: one implicit rule is that it is safe to call almost all block layer APIs if the request queue is alive; and one hctx may be retrieved by one API, then the hctx can be freed by blk_mq_update_nr_hw_queues(); finally use-after-free is triggered. Fixes this issue by always freeing hctx after releasing request queue. If some hctxs are removed in blk_mq_update_nr_hw_queues(), introduce a per-queue list to hold them, then try to resuse these hctxs if numa node is matched. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-04 07:24:08 -06:00
Ming Lei	7c6c5b7c91	blk-mq: split blk_mq_alloc_and_init_hctx into two parts Split blk_mq_alloc_and_init_hctx into two parts, and one is blk_mq_alloc_hctx() for allocating all hctx resources, another is blk_mq_init_hctx() for initializing hctx, which serves as counter-part of blk_mq_exit_hctx(). Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: Martin K . Petersen <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-04 07:24:06 -06:00
Ming Lei	c7e2d94b3d	blk-mq: free hw queue's resource in hctx's release handler Once blk_cleanup_queue() returns, tags shouldn't be used any more, because blk_mq_free_tag_set() may be called. Commit `45a9c9d909` ("blk-mq: Fix a use-after-free") fixes this issue exactly. However, that commit introduces another issue. Before `45a9c9d909`, we are allowed to run queue during cleaning up queue if the queue's kobj refcount is held. After that commit, queue can't be run during queue cleaning up, otherwise oops can be triggered easily because some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue(). We have invented ways for addressing this kind of issue before, such as: `8dc765d438` ("SCSI: fix queue cleanup race before queue initialization is done") `c2856ae2f3` ("blk-mq: quiesce queue before freeing queue") But still can't cover all cases, recently James reports another such kind of issue: https://marc.info/?l=linux-scsi&m=155389088124782&w=2 This issue can be quite hard to address by previous way, given scsi_run_queue() may run requeues for other LUNs. Fixes the above issue by freeing hctx's resources in its release handler, and this way is safe becasue tags isn't needed for freeing such hctx resource. This approach follows typical design pattern wrt. kobject's release handler. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reported-by: James Smart <james.smart@broadcom.com> Fixes: `45a9c9d909` ("blk-mq: Fix a use-after-free") Cc: stable@vger.kernel.org Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-04 07:24:05 -06:00
Ming Lei	fbc2a15e34	blk-mq: move cancel of requeue_work into blk_mq_release With holding queue's kobject refcount, it is safe for driver to schedule requeue. However, blk_mq_kick_requeue_list() may be called after blk_sync_queue() is done because of concurrent requeue activities, then requeue work may not be completed when freeing queue, and kernel oops is triggered. So moving the cancel of requeue_work into blk_mq_release() for avoiding race between requeue and freeing queue. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-04 07:24:04 -06:00
Ming Lei	e87eb301be	blk-mq: grab .q_usage_counter when queuing request from plug code path Just like aio/io_uring, we need to grab 2 refcount for queuing one request, one is for submission, another is for completion. If the request isn't queued from plug code path, the refcount grabbed in generic_make_request() serves for submission. In theroy, this refcount should have been released after the sumission(async run queue) is done. blk_freeze_queue() works with blk_sync_queue() together for avoiding race between cleanup queue and IO submission, given async run queue activities are canceled because hctx->run_work is scheduled with the refcount held, so it is fine to not hold the refcount when running the run queue work function for dispatch IO. However, if request is staggered into plug list, and finally queued from plug code path, the refcount in submission side is actually missed. And we may start to run queue after queue is removed because the queue's kobject refcount isn't guaranteed to be grabbed in flushing plug list context, then kernel oops is triggered, see the following race: blk_mq_flush_plug_list(): blk_mq_sched_insert_requests() insert requests to sw queue or scheduler queue blk_mq_run_hw_queue Because of concurrent run queue, all requests inserted above may be completed before calling the above blk_mq_run_hw_queue. Then queue can be freed during the above blk_mq_run_hw_queue(). Fixes the issue by grab .q_usage_counter before calling blk_mq_sched_insert_requests() in blk_mq_flush_plug_list(). This way is safe because the queue is absolutely alive before inserting request. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by: Bart Van Assche <bvanassche@acm.org> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-05-04 07:24:02 -06:00

... 417 418 419 420 421 ...

858756 Commits