linux

mirror of https://github.com/hardkernel/linux.git synced 2026-04-22 13:30:42 +09:00

Author	SHA1	Message	Date
Marcel Holtmann	58a96fc353	Bluetooth: Add debug setting for changing minimum encryption key size For testing and qualification purposes it is useful to allow changing the minimum encryption key size value that the host stack is going to enforce. This adds a new debugfs setting min_encrypt_key_size to achieve this functionality. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2019-08-17 13:54:40 +03:00
Linus Torvalds	2d63ba3e41	Merge tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These add a check to avoid recent suspend-to-idle power regression on systems with NVMe drives where the PCIe ASPM policy is "performance" (or when the kernel is built without ASPM support), fix an issue related to frequency limits in the schedutil cpufreq governor and fix a mistake related to the PM QoS usage in the cpufreq core introduced recently. Specifics: - Disable NVMe power optimization related to suspend-to-idle added recently on systems where PCIe ASPM is not able to put PCIe links into low-power states to prevent excess power from being drawn by the system while suspended (Rafael Wysocki). - Make the schedutil governor handle frequency limits changes properly in all cases (Viresh Kumar). - Prevent the cpufreq core from treating positive values returned by dev_pm_qos_update_request() as errors (Viresh Kumar)" * tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled PCI/ASPM: Add pcie_aspm_enabled() cpufreq: schedutil: Don't skip freq update when limits change cpufreq: dev_pm_qos_update_request() can return 1 on success	2019-08-16 09:13:16 -07:00
David S. Miller	480fd998bd	Merge tag 'rxrpc-fixes-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix local endpoint handling Here's a pair of patches that fix two issues in the handling of local endpoints (rxrpc_local structs): (1) Use list_replace_init() rather than list_replace() if we're going to unconditionally delete the replaced item later, lest the list get corrupted. (2) Don't access the rxrpc_local object after passing our ref to the workqueue, not even to illuminate tracepoints, as the work function may cause the object to be freed. We have to cache the information beforehand. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-15 16:33:22 -07:00
David S. Miller	12ed601513	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net This patchset contains Netfilter fixes for net: 1) Extend selftest to cover flowtable with ipsec, from Florian Westphal. 2) Fix interaction of ipsec with flowtable, also from Florian. 3) User-after-free with bound set to rule that fails to load. 4) Adjust state and timeout for flows that expire. 5) Timeout update race with flows in teardown state. 6) Ensure conntrack id hash calculation use invariants as input, from Dirk Morris. 7) Do not push flows into flowtable for TCP fin/rst packets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-15 14:01:14 -07:00
Sudarsana Reddy Kalluru	0dabbe1bb3	qed: Add driver API for flashing the config attributes. The patch adds driver interface for reading the config attributes from user provided buffer, and updates these values on nvm config flash partition. This is basically an expansion of our existing ethtool -f implementation. The management FW has exposed an additional method of configuring some of the nvram options, and this makes use of that. This implementation will come into use when newer FW files which contain configuration directives employing this API will be provided to ethtool -f. Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-15 12:54:45 -07:00
David S. Miller	8714652fcd	Merge tag 'linux-can-next-for-5.4-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2019-08-14 this is a pull request for net-next/master consisting of 41 patches. The first two patches are for the kvaser_pciefd driver: Christer Beskow removes unnecessary code in the kvaser_pciefd_pwm_stop() function, YueHaibing removes the unused including of <linux/version.h>. In the next patch YueHaibing also removes the unused including of <linux/version.h> in the f81601 driver. In the ti_hecc driver the next 6 patches are by me and fix checkpatch warnings. YueHaibing's patch removes an unused variable in the ti_hecc_mailbox_read() function. The next 6 patches all target the xilinx_can driver. Anssi Hannula's patch fixes a chip start failure with an invalid bus. The patch by Venkatesh Yadav Abbarapu skips an error message in case of a deferred probe. The 3 patches by Appana Durga Kedareswara rao fix the RX and TX path for CAN-FD frames. Srinivas Neeli's patch fixes the bit timing calculations for CAN-FD. The next 12 patches are by me and several checkpatch warnings in the af_can, raw and bcm components. Thomas Gleixner provides a patch for the bcm, which switches the timer to HRTIMER_MODE_SOFT and removes the hrtimer_tasklet. Then 6 more patches by me for the gw component, which fix checkpatch warnings, followed by 2 patches by Oliver Hartkopp to add CAN-FD support. The vcan driver gets 3 patches by me, fixing checkpatch warnings. And finally a patch by Andre Hartmann to fix typos in CAN's netlink header. ====================	2019-08-15 12:43:22 -07:00
Jens Axboe	7b6620d7db	block: remove REQ_NOWAIT_INLINE We had a few issues with this code, and there's still a problem around how we deal with error handling for chained/split bios. For now, just revert the code and we'll try again with a thoroug solution. This reverts commits: `e15c2ffa10` ("block: fix O_DIRECT error handling for bio fragments") `0eb6ddfb86` ("block: Fix __blkdev_direct_IO() for bio fragments") `6a43074e2f` ("block: properly handle IOCB_NOWAIT for async O_DIRECT IO") `893a1c9720` ("blk-mq: allow REQ_NOWAIT to return an error inline") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-08-15 11:09:16 -06:00
Linus Torvalds	3291204239	Merge tag 'auxdisplay-for-linus-v5.3-rc5' of git://github.com/ojeda/linux Pull auxdisplay fixes from Miguel Ojeda: "A few minor auxdisplay improvements: - A couple of small header cleanups for charlcd (Masahiro Yamada) - A trivial typo fix for the examples of cfag12864b (Masahiro Yamada) - An Kconfig help text improvement for charlcd (Mans Rullgard) - An error path fix for panel (zhengbin)" * tag 'auxdisplay-for-linus-v5.3-rc5' of git://github.com/ojeda/linux: auxdisplay: Fix a typo in cfag12864b-example.c auxdisplay: charlcd: add include guard to charlcd.h auxdisplay: charlcd: move charlcd.h to drivers/auxdisplay auxdisplay: charlcd: add help text for backlight initial state auxdisplay: panel: need to delete scan_timer when misc_register fails in panel_attach	2019-08-15 09:20:17 -07:00
Christoph Hellwig	edfbcb321f	usb: add a hcd_uses_dma helper The USB buffer allocation code is the only place in the usb core (and in fact the whole kernel) that uses is_device_dma_capable, while the URB mapping code uses the uses_dma flag in struct usb_bus. Switch the buffer allocation to use the uses_dma flag used by the rest of the USB code, and create a helper in hcd.h that checks this flag as well as the CONFIG_HAS_DMA to simplify the caller a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20190811080520.21712-3-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-15 15:18:05 +02:00
Jeremy Sowden	707816c8b0	netfilter: remove deprecation warnings from uapi headers. There are two netfilter userspace headers which contain deprecation warnings. While these headers are not used within the kernel, they are compiled stand-alone for header-testing. Pablo informs me that userspace iptables still refer to these headers, and the intention was to use xt_LOG.h instead and remove these, but userspace was never updated. Remove the warnings. Fixes: `2a475c409f` ("kbuild: remove all netfilter headers from header-test blacklist.") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-14 23:36:27 +02:00
Linus Torvalds	a8dba0531b	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Doug Ledford: "Fairly small pull request for -rc3. I'm out of town the rest of this week, so I made sure to clean out as much as possible from patchworks in enough time for 0-day to chew through it (Yay! for 0-day being back online! :-)). Jason might send through any emergency stuff that could pop up, otherwise I'm back next week. The only real thing of note is the siw ABI change. Since we just merged siw this release, there are no prior kernel releases to maintain kernel ABI with. I told Bernard that if there is anything else about the siw ABI he thinks he might want to change before it goes set in stone, he should get it in ASAP. The siw module was around for several years outside the kernel tree, and it had to be revamped considerably for inclusion upstream, so we are making no attempts to be backward compatible with the out of tree version. Once 5.3 is actually released, we will have our baseline ABI to maintain. Summary: - Fix a memory registration release flow issue that was causing a WARN_ON (mlx5) - If the counters for a port aren't allocated, then we can't do operations on the non-existent counters (core) - Check the right variable for error code result (mlx5) - Fix a use after free issue (mlx5) - Fix an off by one memory leak (siw) - Actually return an error code on error (core) - Allow siw to be built on 32bit arches (siw, ABI change, but OK since siw was just merged this merge window and there is no prior released kernel to maintain compatibility with and we also updated the rdma-core user space package to match)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/siw: Change CQ flags from 64->32 bits RDMA/core: Fix error code in stat_get_doit_qp() RDMA/siw: Fix a memory leak in siw_init_cpulist() IB/mlx5: Fix use-after-free error while accessing ev_file pointer IB/mlx5: Check the correct variable in error handling code RDMA/counter: Prevent QP counter binding if counters unsupported IB/mlx5: Fix implicit MR release flow	2019-08-14 11:10:38 -07:00
Linus Torvalds	e83b009c5c	Merge tag 'dma-mapping-5.3-4' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: - fix the handling of the bus_dma_mask in dma_get_required_mask, which caused a regression in this merge window (Lucas Stach) - fix a regression in the handling of DMA_ATTR_NO_KERNEL_MAPPING (me) - fix dma_mmap_coherent to not cause page attribute mismatches on coherent architectures like x86 (me) * tag 'dma-mapping-5.3-4' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: fix page attributes for dma_mmap_* dma-direct: don't truncate dma_required_mask to bus addressing capabilities dma-direct: fix DMA_ATTR_NO_KERNEL_MAPPING	2019-08-14 10:31:11 -07:00
David Howells	06d9532fa6	rxrpc: Fix read-after-free in rxrpc_queue_local() rxrpc_queue_local() attempts to queue the local endpoint it is given and then, if successful, prints a trace line. The trace line includes the current usage count - but we're not allowed to look at the local endpoint at this point as we passed our ref on it to the workqueue. Fix this by reading the usage count before queuing the work item. Also fix the reading of local->debug_id for trace lines, which must be done with the same consideration as reading the usage count. Fixes: `09d2bf595d` ("rxrpc: Add a tracepoint to track rxrpc_local refcounting") Reported-by: syzbot+78e71c5bab4f76a6a719@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com>	2019-08-14 11:37:51 +01:00
YueHaibing	68e03b8547	gpio: Fix build error of function redefinition when do randbuilding, I got this error: In file included from drivers/hwmon/pmbus/ucd9000.c:19:0: ./include/linux/gpio/driver.h:576:1: error: redefinition of gpiochip_add_pin_range gpiochip_add_pin_range(struct gpio_chip chip, const char pinctl_name, ^~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/hwmon/pmbus/ucd9000.c:18:0: ./include/linux/gpio.h:245:1: note: previous definition of gpiochip_add_pin_range was here gpiochip_add_pin_range(struct gpio_chip chip, const char pinctl_name, ^~~~~~~~~~~~~~~~~~~~~~ Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: `964cb34188` ("gpio: move pincontrol calls to <linux/gpio/driver.h>") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20190731123814.46624-1-yuehaibing@huawei.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>	2019-08-14 10:57:18 +02:00
David Ahern	d00ee64e1d	netlink: Fix nlmsg_parse as a wrapper for strict message parsing Eric reported a syzbot warning: BUG: KMSAN: uninit-value in nh_valid_get_del_req+0x6f1/0x8c0 net/ipv4/nexthop.c:1510 CPU: 0 PID: 11812 Comm: syz-executor444 Not tainted 5.3.0-rc3+ #17 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x191/0x1f0 lib/dump_stack.c:113 kmsan_report+0x162/0x2d0 mm/kmsan/kmsan_report.c:109 __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:294 nh_valid_get_del_req+0x6f1/0x8c0 net/ipv4/nexthop.c:1510 rtm_del_nexthop+0x1b1/0x610 net/ipv4/nexthop.c:1543 rtnetlink_rcv_msg+0x115a/0x1580 net/core/rtnetlink.c:5223 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5241 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf6c/0x1050 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x110f/0x1330 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg net/socket.c:657 [inline] ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311 __sys_sendmmsg+0x53a/0xae0 net/socket.c:2413 __do_sys_sendmmsg net/socket.c:2442 [inline] __se_sys_sendmmsg+0xbd/0xe0 net/socket.c:2439 __x64_sys_sendmmsg+0x56/0x70 net/socket.c:2439 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:297 entry_SYSCALL_64_after_hwframe+0x63/0xe7 The root cause is nlmsg_parse calling __nla_parse which means the header struct size is not checked. nlmsg_parse should be a wrapper around __nlmsg_parse with NL_VALIDATE_STRICT for the validate argument very much like nlmsg_parse_deprecated is for NL_VALIDATE_LIBERAL. Fixes: `3de6440354` ("netlink: re-add parse/validate functions in strict mode") Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 20:37:16 -07:00
Jakub Kicinski	c162610c7d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Rename mss field to mss_option field in synproxy, from Fernando Mancera. 2) Use SYSCTL_{ZERO,ONE} definitions in conntrack, from Matteo Croce. 3) More strict validation of IPVS sysctl values, from Junwei Hu. 4) Remove unnecessary spaces after on the right hand side of assignments, from yangxingwu. 5) Add offload support for bitwise operation. 6) Extend the nft_offload_reg structure to store immediate date. 7) Collapse several ip_set header files into ip_set.h, from Jeremy Sowden. 8) Make netfilter headers compile with CONFIG_KERNEL_HEADER_TEST=y, from Jeremy Sowden. 9) Fix several sparse warnings due to missing prototypes, from Valdis Kletnieks. 10) Use static lock initialiser to ensure connlabel spinlock is initialized on boot time to fix sched/act_ct.c, patch from Florian Westphal. ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 18:22:57 -07:00
Heiner Kallweit	65b27995a4	net: phy: let phy_speed_down/up support speeds >1Gbps So far phy_speed_down/up can be used up to 1Gbps only. Remove this restriction by using new helper __phy_speed_down. New member adv_old in struct phy_device is used by phy_speed_up to restore the advertised modes before calling phy_speed_down. Don't simply advertise what is supported because a user may have intentionally removed modes from advertisement. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 17:14:06 -07:00
Heiner Kallweit	331c56ac73	net: phy: add phy_speed_down_core and phy_resolve_min_speed phy_speed_down_core provides most of the functionality for phy_speed_down. It makes use of new helper phy_resolve_min_speed that is based on the sorting of the settings[] array. In certain cases it may be helpful to be able to exclude legacy half duplex modes, therefore prepare phy_resolve_min_speed() for it. v2: - rename __phy_speed_down to phy_speed_down_core Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 17:14:06 -07:00
Jakub Kicinski	708852dcac	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== The following pull-request contains BPF updates for your net-next tree. There is a small merge conflict in libbpf (Cc Andrii so he's in the loop as well): for (i = 1; i <= btf__get_nr_types(btf); i++) { t = (struct btf_type )btf__type_by_id(btf, i); if (!has_datasec && btf_is_var(t)) { / replace VAR with INT / t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0); <<<<<<< HEAD / * using size = 1 is the safest choice, 4 will be too * big and cause kernel BTF validation failure if * original variable took less than 4 bytes / t->size = 1; (int )(t+1) = BTF_INT_ENC(0, 0, 8); } else if (!has_datasec && kind == BTF_KIND_DATASEC) { ======= t->size = sizeof(int); (int )(t + 1) = BTF_INT_ENC(0, 0, 32); } else if (!has_datasec && btf_is_datasec(t)) { >>>>>>> `72ef80b5ee` / replace DATASEC with STRUCT / Conflict is between the two commits `1d4126c4e1` ("libbpf: sanitize VAR to conservative 1-byte INT") and `b03bc6853c` ("libbpf: convert libbpf code to use new btf helpers"), so we need to pick the sanitation fixup as well as use the new btf_is_datasec() helper and the whitespace cleanup. Looks like the following: [...] if (!has_datasec && btf_is_var(t)) { / replace VAR with INT / t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0); / * using size = 1 is the safest choice, 4 will be too * big and cause kernel BTF validation failure if * original variable took less than 4 bytes / t->size = 1; (int )(t + 1) = BTF_INT_ENC(0, 0, 8); } else if (!has_datasec && btf_is_datasec(t)) { / replace DATASEC with STRUCT */ [...] The main changes are: 1) Addition of core parts of compile once - run everywhere (co-re) effort, that is, relocation of fields offsets in libbpf as well as exposure of kernel's own BTF via sysfs and loading through libbpf, from Andrii. More info on co-re: http://vger.kernel.org/bpfconf2019.html#session-2 and http://vger.kernel.org/lpc-bpf2018.html#session-2 2) Enable passing input flags to the BPF flow dissector to customize parsing and allowing it to stop early similar to the C based one, from Stanislav. 3) Add a BPF helper function that allows generating SYN cookies from XDP and tc BPF, from Petar. 4) Add devmap hash-based map type for more flexibility in device lookup for redirects, from Toke. 5) Improvements to XDP forwarding sample code now utilizing recently enabled devmap lookups, from Jesper. 6) Add support for reporting the effective cgroup progs in bpftool, from Jakub and Takshak. 7) Fix reading kernel config from bpftool via /proc/config.gz, from Peter. 8) Fix AF_XDP umem pages mapping for 32 bit architectures, from Ivan. 9) Follow-up to add two more BPF loop tests for the selftest suite, from Alexei. 10) Add perf event output helper also for other skb-based program types, from Allan. 11) Fix a co-re related compilation error in selftests, from Yonghong. ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 16:24:57 -07:00
Andrea Arcangeli	a8282608c8	Revert "mm, thp: restore node-local hugepage allocations" This reverts commit `2f0799a0ff` ("mm, thp: restore node-local hugepage allocations"). commit `2f0799a0ff` was rightfully applied to avoid the risk of a severe regression that was reported by the kernel test robot at the end of the merge window. Now we understood the regression was a false positive and was caused by a significant increase in fairness during a swap trashing benchmark. So it's safe to re-apply the fix and continue improving the code from there. The benchmark that reported the regression is very useful, but it provides a meaningful result only when there is no significant alteration in fairness during the workload. The removal of __GFP_THISNODE increased fairness. __GFP_THISNODE cannot be used in the generic page faults path for new memory allocations under the MPOL_DEFAULT mempolicy, or the allocation behavior significantly deviates from what the MPOL_DEFAULT semantics are supposed to be for THP and 4k allocations alike. Setting THP defrag to "always" or using MADV_HUGEPAGE (with THP defrag set to "madvise") has never meant to provide an implicit MPOL_BIND on the "current" node the task is running on, causing swap storms and providing a much more aggressive behavior than even zone_reclaim_node = 3. Any workload who could have benefited from __GFP_THISNODE has now to enable zone_reclaim_mode=1\|\|2\|\|3. __GFP_THISNODE implicitly provided the zone_reclaim_mode behavior, but it only did so if THP was enabled: if THP was disabled, there would have been no chance to get any 4k page from the current node if the current node was full of pagecache, which further shows how this __GFP_THISNODE was misplaced in MADV_HUGEPAGE. MADV_HUGEPAGE has never been intended to provide any zone_reclaim_mode semantics, in fact the two are orthogonal, zone_reclaim_mode = 1\|2\|3 must work exactly the same with MADV_HUGEPAGE set or not. The performance characteristic of memory depends on the hardware details. The numbers below are obtained on Naples/EPYC architecture and the N/A projection extends them to show what we should aim for in the future as a good THP NUMA locality default. The benchmark used exercises random memory seeks (note: the cost of the page faults is not part of the measurement). D0 THP \| D0 4k \| D1 THP \| D1 4k \| D2 THP \| D2 4k \| D3 THP \| D3 4k \| ... 0% \| +43% \| +45% \| +106% \| +131% \| +224% \| N/A \| N/A D0 means distance zero (i.e. local memory), D1 means distance one (i.e. intra socket memory), D2 means distance two (i.e. inter socket memory), etc... For the guest physical memory allocated by qemu and for guest mode kernel the performance characteristic of RAM is more complex and an ideal default could be: D0 THP \| D1 THP \| D0 4k \| D2 THP \| D1 4k \| D3 THP \| D2 4k \| D3 4k \| ... 0% \| +58% \| +101% \| N/A \| +222% \| N/A \| N/A \| N/A NOTE: the N/A are projections and haven't been measured yet, the measurement in this case is done on a 1950x with only two NUMA nodes. The THP case here means THP was used both in the host and in the guest. After applying this commit the THP NUMA locality order that we'll get out of MADV_HUGEPAGE is this: D0 THP \| D1 THP \| D2 THP \| D3 THP \| ... \| D0 4k \| D1 4k \| D2 4k \| D3 4k \| ... Before this commit it was: D0 THP \| D0 4k \| D1 4k \| D2 4k \| D3 4k \| ... Even if we ignore the breakage of large workloads that can't fit in a single node that the __GFP_THISNODE implicit "current node" mbind caused, the THP NUMA locality order provided by __GFP_THISNODE was still not the one we shall aim for in the long term (i.e. the first one at the top). After this commit is applied, we can introduce a new allocator multi order API and to replace those two alloc_pages_vmas calls in the page fault path, with a single multi order call: unsigned int order = (1 << HPAGE_PMD_ORDER) \| (1 << 0); page = alloc_pages_multi_order(..., &order); if (!page) goto out; if (!(order & (1 << 0))) { VM_WARN_ON(order != 1 << HPAGE_PMD_ORDER); /* THP fault / } else { VM_WARN_ON(order != 1 << 0); / 4k fallback */ } The page allocator logic has to be altered so that when it fails on any zone with order 9, it has to try again with a order 0 before falling back to the next zone in the zonelist. After that we need to do more measurements and evaluate if adding an opt-in feature for guest mode is worth it, to swap "DN 4k \| DN+1 THP" with "DN+1 THP \| DN 4k" at every NUMA distance crossing. Link: http://lkml.kernel.org/r/20190503223146.2312-3-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Andrea Arcangeli	92717d429b	Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"" Patch series "reapply: relax __GFP_THISNODE for MADV_HUGEPAGE mappings". The fixes for what was originally reported as "pathological THP behavior" we rightfully reverted to be sure not to introduced regressions at end of a merge window after a severe regression report from the kernel bot. We can safely re-apply them now that we had time to analyze the problem. The mm process worked fine, because the good fixes were eventually committed upstream without excessive delay. The regression reported by the kernel bot however forced us to revert the good fixes to be sure not to introduce regressions and to give us the time to analyze the issue further. The silver lining is that this extra time allowed to think more at this issue and also plan for a future direction to improve things further in terms of THP NUMA locality. This patch (of 2): This reverts commit `356ff8a9a7` ("Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"). So it reapplies `89c83fb539` ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"). Consolidation of the THP allocation flags at the same place was meant to be a clean up to easier handle otherwise scattered code which is imposing a maintenance burden. There were no real problems observed with the gfp mask consolidation but the reversion was rushed through without a larger consensus regardless. This patch brings the consolidation back because this should make the long term maintainability easier as well as it should allow future changes to be less error prone. [mhocko@kernel.org: changelog additions] Link: http://lkml.kernel.org/r/20190503223146.2312-2-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Qian Cai	0cfaee2af3	include/asm-generic/5level-fixup.h: fix variable 'p4d' set but not used A compiler throws a warning on an arm64 system since commit `9849a5697d` ("arch, mm: convert all architectures to use 5level-fixup.h"), mm/kasan/init.c: In function 'kasan_free_p4d': mm/kasan/init.c:344:9: warning: variable 'p4d' set but not used [-Wunused-but-set-variable] p4d_t *p4d; ^~~ because p4d_none() in "5level-fixup.h" is compiled away while it is a static inline function in "pgtable-nopud.h". However, if converted p4d_none() to a static inline there, powerpc would be unhappy as it reads those in assembler language in "arch/powerpc/include/asm/book3s/64/pgtable.h", so it needs to skip assembly include for the static inline C function. While at it, converted a few similar functions to be consistent with the ones in "pgtable-nopud.h". Link: http://lkml.kernel.org/r/20190806232917.881-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Roman Gushchin	ec9f02384f	mm: workingset: fix vmstat counters for shadow nodes Memcg counters for shadow nodes are broken because the memcg pointer is obtained in a wrong way. The following approach is used: virt_to_page(xa_node)->mem_cgroup Since commit `4d96ba3530` ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") page->mem_cgroup pointer isn't set for slab pages, so memcg_from_slab_page() should be used instead. Also I doubt that it ever worked correctly: virt_to_head_page() should be used instead of virt_to_page(). Otherwise objects residing on tail pages are not accounted, because only the head page contains a valid mem_cgroup pointer. That was a case since the introduction of these counters by the commit `68d48e6a2d` ("mm: workingset: add vmstat counter for shadow nodes"). Link: http://lkml.kernel.org/r/20190801233532.138743-1-guro@fb.com Fixes: `4d96ba3530` ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Ralph Campbell	76470ccd62	mm: document zone device struct page field usage Patch series "mm/hmm: fixes for device private page migration", v3. Testing the latest linux git tree turned up a few bugs with page migration to and from ZONE_DEVICE private and anonymous pages. Hopefully it clarifies how ZONE_DEVICE private struct page uses the same mapping and index fields from the source anonymous page mapping. This patch (of 3): Struct page for ZONE_DEVICE private pages uses the page->mapping and and page->index fields while the source anonymous pages are migrated to device private memory. This is so rmap_walk() can find the page when migrating the ZONE_DEVICE private page back to system memory. ZONE_DEVICE pmem backed fsdax pages also use the page->mapping and page->index fields when files are mapped into a process address space. Add comments to struct page and remove the unused "_zd_pad_1" field to make this more clear. Link: http://lkml.kernel.org/r/20190724232700.23327-2-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-13 16:06:52 -07:00
Bernard Metzler	2c8ccb37b0	RDMA/siw: Change CQ flags from 64->32 bits This patch changes the driver/user shared (mmapped) CQ notification flags field from unsigned 64-bits size to unsigned 32-bits size. This enables building siw on 32-bit architectures. This patch changes the siw-abi, but as siw was only just merged in this merge window cycle, there are no released kernels with the prior abi. We are making no attempt to be binary compatible with siw user space libraries prior to the merge of siw into the upstream kernel, only moving forward with upstream kernels and upstream rdma-core provided siw libraries are we guaranteeing compatibility. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190809151816.13018-1-bmt@zurich.ibm.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-13 12:22:06 -04:00
Andre Hartmann	3ca3c4aad2	can: netlink: fix documentation typos This patch fixes some documentation typos in struct can_bittiming_const. Signed-off-by: Andre Hartmann <aha_1980@gmx.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-08-13 17:32:21 +02:00
Oliver Hartkopp	456a8a646b	can: gw: add support for CAN FD frames Introduce CAN FD support which needs an extension of the netlink API to pass CAN FD type content to the kernel which has a different size to Classic CAN. Additionally the struct canfd_frame has a new 'flags' element that can now be modified with can-gw. The new CGW_FLAGS_CAN_FD option flag defines whether the routing job handles Classic CAN or CAN FD frames. This setting is very strict at reception time and enables the new possibilities, e.g. CGW_FDMOD_* and modifying the flags element of struct canfd_frame, only when CGW_FLAGS_CAN_FD is set. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-08-13 17:32:21 +02:00
Oliver Hartkopp	e9dc7c6050	can: gw: use struct canfd_frame as internal data structure To prepare the CAN FD support this patch implements the first adaptions in data structures for CAN FD without changing the current functionality. Additionally some code at the end of this patch is moved or indented to simplify the review of the next implementation step. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-08-13 17:32:21 +02:00
Jeremy Sowden	2a475c409f	kbuild: remove all netfilter headers from header-test blacklist. All the blacklisted NF headers can now be compiled stand-alone, so removed them from the blacklist. Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-13 12:15:38 +02:00
Jeremy Sowden	20a9379d9a	netfilter: remove "#ifdef __KERNEL__" guards from some headers. A number of non-UAPI Netfilter header-files contained superfluous "#ifdef __KERNEL__" guards. Removed them. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-13 12:15:28 +02:00
Jeremy Sowden	78458e3e08	netfilter: add missing IS_ENABLED(CONFIG_NETFILTER) checks to some header-files. linux/netfilter.h defines a number of struct and inline function definitions which are only available is CONFIG_NETFILTER is enabled. These structs and functions are used in declarations and definitions in other header-files. Added preprocessor checks to make sure these headers will compile if CONFIG_NETFILTER is disabled. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-13 12:15:18 +02:00
Jeremy Sowden	0abc8bf4f2	netfilter: add missing IS_ENABLED(CONFIG_NF_CONNTRACK) checks to some header-files. struct nf_conn contains a "struct nf_conntrack ct_general" member and struct net contains a "struct netns_ct ct" member which are both only defined in CONFIG_NF_CONNTRACK is enabled. These members are used in a number of inline functions defined in other header-files. Added preprocessor checks to make sure the headers will compile if CONFIG_NF_CONNTRACK is disabled. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-13 12:15:08 +02:00
Jeremy Sowden	47e640af2e	netfilter: add missing IS_ENABLED(CONFIG_NF_TABLES) check to header-file. nf_tables.h defines an API comprising several inline functions and macros that depend on the nft member of struct net. However, this is only defined is CONFIG_NF_TABLES is enabled. Added preprocessor checks to ensure that nf_tables.h will compile if CONFIG_NF_TABLES is disabled. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-13 12:14:58 +02:00
Jeremy Sowden	9211bfbff8	netfilter: add missing IS_ENABLED(CONFIG_BRIDGE_NETFILTER) checks to header-file. br_netfilter.h defines inline functions that use an enum constant and struct member that are only defined if CONFIG_BRIDGE_NETFILTER is enabled. Added preprocessor checks to ensure br_netfilter.h will compile if CONFIG_BRIDGE_NETFILTER is disabled. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-13 12:14:49 +02:00
Jeremy Sowden	a1b2f04ea5	netfilter: add missing includes to a number of header-files. A number of netfilter header-files used declarations and definitions from other headers without including them. Added include directives to make those declarations and definitions available. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-13 12:14:39 +02:00
Jeremy Sowden	bd96b4c756	netfilter: inline four headers files into another one. linux/netfilter/ipset/ip_set.h included four other header files: include/linux/netfilter/ipset/ip_set_comment.h include/linux/netfilter/ipset/ip_set_counter.h include/linux/netfilter/ipset/ip_set_skbinfo.h include/linux/netfilter/ipset/ip_set_timeout.h Of these the first three were not included anywhere else. The last, ip_set_timeout.h, was included in a couple of other places, but defined inline functions which call other inline functions defined in ip_set.h, so ip_set.h had to be included before it. Inlined all four into ip_set.h, and updated the other files that included ip_set_timeout.h. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-13 12:14:26 +02:00
Pablo Neira Ayuso	43dd16efc7	netfilter: nf_tables: store data in offload context registers Store immediate data into offload context register. This allows follow up instructions to take it from the corresponding source register. This patch is required to support for payload mangling, although other instructions that take data from source register will benefit from this too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-13 12:10:01 +02:00
Yishai Hadas	b1635ee612	net/mlx5: Add XRQ legacy commands opcodes Add XRQ legacy commands opcodes, will be used via the DEVX interface. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-08-13 12:58:11 +03:00
John Garry	b884e2de2a	lib: logic_pio: Add logic_pio_unregister_range() Add a function to unregister a logical PIO range. Logical PIO space can still be leaked when unregistering certain LOGIC_PIO_CPU_MMIO regions, but this acceptable for now since there are no callers to unregister LOGIC_PIO_CPU_MMIO regions, and the logical PIO region allocation scheme would need significant work to improve this. Cc: stable@vger.kernel.org Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Wei Xu <xuwei5@hisilicon.com>	2019-08-13 14:54:24 +08:00
Rafael J. Wysocki	accd2dd72c	PCI/ASPM: Add pcie_aspm_enabled() Add a function checking whether or not PCIe ASPM has been enabled for a given device. It will be used by the NVMe driver to decide how to handle the device during system suspend. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com>	2019-08-12 10:47:55 +02:00
Heiner Kallweit	bf22b343ca	net: phy: add phy_modify_paged_changed Add helper function phy_modify_paged_changed, behavios is the same as for phy_modify_changed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:24:32 -07:00
Heiner Kallweit	f4069cd7fa	net: phy: prepare phylib to deal with PHY's extending Clause 22 The integrated PHY in 2.5Gbps chip RTL8125 is the first (known to me) PHY that uses standard Clause 22 for all modes up to 1Gbps and adds 2.5Gbps control using vendor-specific registers. To use phylib for the standard part little extensions are needed: - Move most of genphy_config_aneg to a new function __genphy_config_aneg that takes a parameter whether restarting auto-negotiation is needed (depending on whether content of vendor-specific advertisement register changed). - Don't clear phydev->lp_advertising in genphy_read_status so that we can set non-C22 mode flags before. Basically both changes mimic the behavior of the equivalent Clause 45 functions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:24:32 -07:00
Ido Schimmel	e9feb58020	drop_monitor: Expose tail drop counter Previous patch made the length of the per-CPU skb drop list configurable. Expose a counter that shows how many packets could not be enqueued to this list. This allows users determine the desired queue length. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	30328d46af	drop_monitor: Make drop queue length configurable In packet alert mode, each CPU holds a list of dropped skbs that need to be processed in process context and sent to user space. To avoid exhausting the system's memory the maximum length of this queue is currently set to 1000. Allow users to tune the length of this queue according to their needs. The configured length is reported to user space when drop monitor configuration is queried. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	444be061d0	drop_monitor: Add a command to query current configuration Users should be able to query the current configuration of drop monitor before they start using it. Add a command to query the existing configuration which currently consists of alert mode and packet truncation length. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	57986617a7	drop_monitor: Allow truncation of dropped packets When sending dropped packets to user space it is not always necessary to copy the entire packet as usually only the headers are of interest. Allow user to specify the truncation length and add the original length of the packet as additional metadata to the netlink message. By default no truncation is performed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	ca30707dee	drop_monitor: Add packet alert mode So far drop monitor supported only one alert mode in which a summary of locations in which packets were recently dropped was sent to user space. This alert mode is sufficient in order to understand that packets were dropped, but lacks information to perform a more detailed analysis. Add a new alert mode in which the dropped packet itself is passed to user space along with metadata: The drop location (as program counter and resolved symbol), ingress netdevice and drop timestamp. More metadata can be added in the future. To avoid performing expensive operations in the context in which kfree_skb() is invoked (can be hard IRQ), the dropped skb is cloned and queued on per-CPU skb drop list. Then, in process context the netlink message is allocated, prepared and finally sent to user space. The per-CPU skb drop list is limited to 1000 skbs to prevent exhausting the system's memory. Subsequent patches will make this limit configurable and also add a counter that indicates how many skbs were tail dropped. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	28315f7999	drop_monitor: Add alert mode operations The next patch is going to add another alert mode in which the dropped packet is notified to user space, instead of only a summary of recent drops. Abstract the differences between the modes by adding alert mode operations. The operations are selected based on the currently configured mode and associated with the probes and the work item just before tracing starts. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Greg Kroah-Hartman	9f818c8a73	mlx5: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. This cleans up a lot of unneeded code and logic around the debugfs files, making all of this much simpler and easier to understand as we don't need to keep the dentries saved anymore. Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:47 -07:00
Greg Kroah-Hartman	a62052ba2a	wimax: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. This cleans up a lot of unneeded code and logic around the debugfs wimax files, making all of this much simpler and easier to understand. Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> Cc: linux-wimax@intel.com Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:47 -07:00

1 2 3 4 5 ...

112844 Commits