linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-10 04:48:04 +09:00

Author	SHA1	Message	Date
Stefan Hajnoczi	716adf173f	UPSTREAM: VSOCK: defer sock removal to transports The virtio transport will implement graceful shutdown and the related SO_LINGER socket option. This requires orphaning the sock but keeping it in the table of connections after .release(). This patch adds the vsock_remove_sock() function and leaves it up to the transport when to remove the sock. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `6773b7dc39`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Cody Schuffelen <schuffelen@google.com> Change-Id: I889cdbc0b1de8d2ff54a70ab7a6b4623edb3de06	2019-01-15 17:08:35 -08:00
Stefan Hajnoczi	3fc44c12b2	UPSTREAM: VSOCK: transport-specific vsock_transport functions struct vsock_transport contains function pointers called by AF_VSOCK core code. The transport may want its own transport-specific function pointers and they can be added after struct vsock_transport. Allow the transport to fetch vsock_transport. It can downcast it to access transport-specific function pointers. The virtio transport will use this. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `0b01aeb3d2`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Cody Schuffelen <schuffelen@google.com> Change-Id: I442706ae71dc14c70fc4033d9719134c2d034509	2019-01-15 17:08:34 -08:00
Stefan Hajnoczi	a598d93c2a	UPSTREAM: vsock: make listener child lock ordering explicit There are several places where the listener and pending or accept queue child sockets are accessed at the same time. Lockdep is unhappy that two locks from the same class are held. Tell lockdep that it is safe and document the lock ordering. Originally Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> sent a similar patch asking whether this is safe. I have audited the code and also covered the vsock_pending_work() function. Suggested-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit `4192f672fa`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Cody Schuffelen <schuffelen@google.com> Change-Id: I0cb7ee964057e9338971e1a2043ae17557feaec7	2019-01-15 17:08:34 -08:00
Jason Wang	0f0ec3accb	UPSTREAM: vhost: new device IOTLB API This patch tries to implement an device IOTLB for vhost. This could be used with userspace(qemu) implementation of DMA remapping to emulate an IOMMU for the guest. The idea is simple, cache the translation in a software device IOTLB (which is implemented as an interval tree) in vhost and use vhost_net file descriptor for reporting IOTLB miss and IOTLB update/invalidation. When vhost meets an IOTLB miss, the fault address, size and access can be read from the file. After userspace finishes the translation, it writes the translated address to the vhost_net file to update the device IOTLB. When device IOTLB is enabled by setting VIRTIO_F_IOMMU_PLATFORM all vq addresses set by ioctl are treated as iova instead of virtual address and the accessing can only be done through IOTLB instead of direct userspace memory access. Before each round or vq processing, all vq metadata is prefetched in device IOTLB to make sure no translation fault happens during vq processing. In most cases, virtqueues are contiguous even in virtual address space. The IOTLB translation for virtqueue itself may make it a little slower. We might add fast path cache on top of this patch. Signed-off-by: Jason Wang <jasowang@redhat.com> [mst: use virtio feature bit: VHOST_F_DEVICE_IOTLB -> VIRTIO_F_IOMMU_PLATFORM ] [mst: fix build warnings ] Signed-off-by: Michael S. Tsirkin <mst@redhat.com> [ weiyj.lk: missing unlock on error ] Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> (cherry picked from commit `6b1e6cc785`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + vsock adb tunnel Signed-off-by: Cody Schuffelen <schuffelen@google.com> Change-Id: I10e4e64d6bc9b36a0d9b444c2319e290921c63c6	2019-01-15 17:08:34 -08:00
Alistair Strachan	2fb5f444a8	BACKPORT: vhost: convert pre sorted vhost memory array to interval tree Current pre-sorted memory region array has some limitations for future device IOTLB conversion: 1) need extra work for adding and removing a single region, and it's expected to be slow because of sorting or memory re-allocation. 2) need extra work of removing a large range which may intersect several regions with different size. 3) need trick for a replacement policy like LRU To overcome the above shortcomings, this patch convert it to interval tree which can easily address the above issue with almost no extra work. The patch could be used for: - Extend the current API and only let the userspace to send diffs of memory table. - Simplify Device IOTLB implementation. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `a9709d6874`) [astrachan: Backported around stable backport `711df71` ("vhost_net: stop device during reset owner")] Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Alistair Strachan <astrachan@google.com> Change-Id: I51c7c7229908a5ce1f082a80eeda5a01e85c2234	2019-01-15 17:08:34 -08:00
Jason Wang	ec8d83a074	UPSTREAM: vhost: introduce vhost memory accessors This patch introduces vhost memory accessors which were just wrappers for userspace address access helpers. This is a requirement for vhost device iotlb implementation which will add iotlb translations in those accessors. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `bfe2bc5128`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Alistair Strachan <astrachan@google.com> Change-Id: Ia67c171384109e646f55027a71dd8df9c6b9c61a	2019-01-15 17:08:34 -08:00
Jason Wang	6b9b4adc1f	UPSTREAM: vhost_net: stop polling socket during rx processing We don't stop rx polling socket during rx processing, this will lead unnecessary wakeups from under layer net devices (E.g sock_def_readable() form tun). Rx will be slowed down in this way. This patch avoids this by stop polling socket during rx processing. A small drawback is that this introduces some overheads in light load case because of the extra start/stop polling, but single netperf TCP_RR does not notice any change. In a super heavy load case, e.g using pktgen to inject packet to guest, we get about ~8.8% improvement on pps: before: ~1240000 pkt/s after: ~1350000 pkt/s Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit `8241a1e466`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Alistair Strachan <astrachan@google.com> Change-Id: Ia51c61a6c1976b6f6406342f369ffc365d964078	2019-01-15 17:08:34 -08:00
Julia Lawall	708df0e257	UPSTREAM: VSOCK: constify vsock_transport structure The vsock_transport structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit `56130915bb`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Cody Schuffelen <schuffelen@google.com> Change-Id: I59ffded185fbf22aaeb39753870ef9c866ab1a2a	2019-01-15 17:08:34 -08:00
Jason Wang	e6fdb47476	UPSTREAM: vhost: lockless enqueuing We use spinlock to synchronize the work list now which may cause unnecessary contentions. So this patch switch to use llist to remove this contention. Pktgen tests shows about 5% improvement: Before: ~1300000 pps After: ~1370000 pps Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `04b96e5528`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Alistair Strachan <astrachan@google.com> Change-Id: Icf032cf2010eaedd92dc5906d454274709b4a9b4	2019-01-15 17:08:34 -08:00
Jason Wang	773bac0e1f	UPSTREAM: vhost: simplify work flushing We used to implement the work flushing through tracking queued seq, done seq, and the number of flushing. This patch simplify this by just implement work flushing through another kind of vhost work with completion. This will be used by lockless enqueuing patch. Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `7235acdb11`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Alistair Strachan <astrachan@google.com> Change-Id: Id3903050706dd734a0217c56ad8ca99b2b22471e	2019-01-15 17:08:34 -08:00
Jorgen Hansen	7103ea77d3	UPSTREAM: VSOCK: Only check error on skb_recv_datagram when skb is NULL If skb_recv_datagram returns an skb, we should ignore the err value returned. Otherwise, datagram receives will return EAGAIN when they have to wait for a datagram. Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit `9c995cc9a2`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Cody Schuffelen <schuffelen@google.com> Change-Id: I69d487529656cb2fe8be9c2cef0db440d4db5cac	2019-01-15 17:08:34 -08:00
Claudio Imbrenda	4c63405330	BACKPORT: AF_VSOCK: Shrink the area influenced by prepare_to_wait When a thread is prepared for waiting by calling prepare_to_wait, sleeping is not allowed until either the wait has taken place or finish_wait has been called. The existing code in af_vsock imposed unnecessary no-sleep assumptions to a broad list of backend functions. This patch shrinks the influence of prepare_to_wait to the area where it is strictly needed, therefore relaxing the no-sleep restriction there. Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit `f7f9b5e7f8`) [astrachan: Backported around stable backport `3223ea1` ("vsock: use new wait API for vsock_stream_sendmsg()")] Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Cody Schuffelen <schuffelen@google.com> Change-Id: Ic1d7bae4b1187ad48194bf4d0b1dd09ab0275734	2019-01-15 17:08:33 -08:00
Jason Wang	6c81476a7c	UPSTREAM: vhost_net: basic polling support This patch tries to poll for new added tx buffer or socket receive queue for a while at the end of tx/rx processing. The maximum time spent on polling were specified through a new kind of vring ioctl. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `0308813724`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Alistair Strachan <astrachan@google.com> Change-Id: I93db6c14e39b2db04e047ee2e0dce3af5436d95e	2019-01-15 17:08:33 -08:00
Jason Wang	29181c5bfa	UPSTREAM: vhost: introduce vhost_vq_avail_empty() This patch introduces a helper which will return true if we're sure that the available ring is empty for a specific vq. When we're not sure, e.g vq access failure, return false instead. This could be used for busy polling code to exit the busy loop. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `d4a60603fa`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Alistair Strachan <astrachan@google.com> Change-Id: I1d4e972543470530a5bc37e19f4199b1f345e042	2019-01-15 17:08:33 -08:00
Jason Wang	2dd59f910b	UPSTREAM: vhost: introduce vhost_has_work() This path introduces a helper which can give a hint for whether or not there's a work queued in the work list. This could be used for busy polling code to exit the busy loop. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `526d3e7ff5`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Alistair Strachan <astrachan@google.com> Change-Id: I2717a212f888b0d3ea78f1ce0bcc21ec9dee5050	2019-01-15 17:08:33 -08:00
Greg Kurz	a5de7503e9	UPSTREAM: vhost: rename vhost_init_used() Looking at how callers use this, maybe we should just rename init_used to vhost_vq_init_access. The _used suffix was a hint that we access the vq used ring. But maybe what callers care about is that it must be called after access_ok. Also, this function manipulates the vq->is_le field which isn't related to the vq used ring. This patch simply renames vhost_init_used() to vhost_vq_init_access() as suggested by Michael. No behaviour change. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `80f7d0301e`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Alistair Strachan <astrachan@google.com> Change-Id: If8ef33d7f00e515afbff3adc7b00607c4a58fdf6	2019-01-15 17:08:33 -08:00
Greg Kurz	2ab0816efd	UPSTREAM: vhost: rename cross-endian helpers The default use case for vhost is when the host and the vring have the same endianness (default native endianness). But there are cases where they differ and vhost should byteswap when accessing the vring. The first case is when the host is big endian and the vring belongs to a virtio 1.0 device, which is always little endian. This is covered by the vq->is_le field. This field is initialized when userspace calls the VHOST_SET_FEATURES ioctl. It is reset when the device stops. We already have a vhost_init_is_le() helper, but the reset operation is opencoded as follows: vq->is_le = virtio_legacy_is_little_endian(); It isn't clear that we are resetting vq->is_le here. This patch moves the code to a helper with a more explicit name. The other case where we may have to byteswap is when the architecture can switch endianness at runtime (bi-endian). If endianness differs in the host and in the guest, then legacy devices need to be used in cross-endian mode. This mode is available with CONFIG_VHOST_CROSS_ENDIAN_LEGACY=y, which introduces a vq->user_be field. Userspace may enable cross-endian mode by calling the SET_VRING_ENDIAN ioctl before the device is started. The cross-endian mode is disabled when the device is stopped. The current names of the helpers that manipulate vq->user_be are unclear. This patch renames those helpers to clearly show that this is cross-endian stuff and with explicit enable/disable semantics. No behaviour change. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `c507203756`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Alistair Strachan <astrachan@google.com> Change-Id: I42271a7cbc4ecc3826116943805bc59d4f8bd192	2019-01-15 17:08:33 -08:00
Greg Kurz	6ef11fba4c	UPSTREAM: vhost: fix error path in vhost_init_used() We don't want side effects. If something fails, we rollback vq->is_le to its previous value. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `e1f33be918`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Alistair Strachan <astrachan@google.com> Change-Id: I18c57e5c78aa3e89d267213770f915a5a5c76100	2019-01-15 17:08:33 -08:00
Stefan Hajnoczi	c7c7d1506c	UPSTREAM: virtio: make find_vqs() checkpatch.pl-friendly checkpatch.pl wants arrays of strings declared as follows: static const char * const names[] = { "vq-1", "vq-2", "vq-3" }; Currently the find_vqs() function takes a const char names[] argument so passing checkpatch.pl's const char const names[] results in a compiler error due to losing the second const. This patch adjusts the find_vqs() prototype and updates all virtio transports. This makes it possible for virtio_balloon.c, virtio_input.c, virtgpu_kms.c, and virtio_rpmsg_bus.c to use the checkpatch.pl-friendly type. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Bjorn Andersson <bjorn.andersson@sonymobile.com> (cherry picked from commit `f7ad26ff95`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Cody Schuffelen <schuffelen@google.com> Change-Id: I23513ea85e7a43efd0c604fc4445b301b4f610ba	2019-01-15 17:08:33 -08:00
Eric Dumazet	3576d75c8d	UPSTREAM: net: move napi_hash[] into read mostly section We do not often add/delete a napi context. Moving napi_hash[] into read mostly section avoids potential false sharing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit `6180d9de61`) Bug: 121166534 Test: Ran cuttlefish with android-4.4 + VSOCKETS, VMWARE_VMCI_VSOCKETS Signed-off-by: Alistair Strachan <astrachan@google.com> Change-Id: I891b9dfef88c44f6480842754cd7d81e7a250792	2019-01-15 17:08:33 -08:00
Sami Tolvanen	acb5ff1905	ANDROID: cuttlefish_defconfig: remove DM_VERITY_HASH_PREFETCH_MIN_SIZE This option is reverted in change Iebcb0cd9982f36c4bd2552811f9147325a291db0. Bug: 71728490 Change-Id: If25dfd56abed16bfe579840e4a52d6bedbae69ca Signed-off-by: Sami Tolvanen <samitolvanen@google.com>	2019-01-14 17:58:50 +00:00
Sami Tolvanen	ec8fcb232d	Revert "ANDROID: dm verity: add minimum prefetch size" This reverts commit `ace74ccf82`. Bug: 71728490 Change-Id: Iebcb0cd9982f36c4bd2552811f9147325a291db0 Signed-off-by: Sami Tolvanen <samitolvanen@google.com>	2019-01-14 17:58:18 +00:00
Peng Zhou	c69bfa0ff3	ANDROID: f2fs: Complement "android_fs" tracepoint of read path It's only in DIO before, complement for BIO. Bug: 120445624 Change-Id: I90b6fb15e355978da8805ed6306c595819be989d Signed-off-by: Peng Zhou <peng.zhou@mediatek.com>	2019-01-14 16:17:08 +00:00
Greg Kroah-Hartman	241f76b17c	Merge 4.4.170 into android-4.4 Changes in 4.4.170 USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data xhci: Don't prevent USB2 bus suspend in state check intended for USB3 only USB: serial: option: add GosunCn ZTE WeLink ME3630 USB: serial: option: add HP lt4132 USB: serial: option: add Simcom SIM7500/SIM7600 (MBIM mode) USB: serial: option: add Fibocom NL668 series USB: serial: option: add Telit LN940 series mmc: core: Reset HPI enabled state during re-init and in case of errors mmc: omap_hsmmc: fix DMA API warning gpio: max7301: fix driver for use with CONFIG_VMAP_STACK Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels x86/mtrr: Don't copy uninitialized gentry fields back to userspace drm/ioctl: Fix Spectre v1 vulnerabilities ip6mr: Fix potential Spectre v1 vulnerability ipv4: Fix potential Spectre v1 vulnerability ax25: fix a use-after-free in ax25_fillin_cb() ibmveth: fix DMA unmap error in ibmveth_xmit_start error path ieee802154: lowpan_header_create check must check daddr ipv6: explicitly initialize udp6_addr in udp_sock_create6() isdn: fix kernel-infoleak in capi_unlocked_ioctl netrom: fix locking in nr_find_socket() packet: validate address length packet: validate address length if non-zero sctp: initialize sin6_flowinfo for ipv6 addrs in sctp_inet6addr_event vhost: make sure used idx is seen before log in vhost_add_used_n() VSOCK: Send reset control packet when socket is partially bound xen/netfront: tolerate frags with no data gro_cell: add napi_disable in gro_cells_destroy sock: Make sock->sk_stamp thread-safe ALSA: rme9652: Fix potential Spectre v1 vulnerability ALSA: emu10k1: Fix potential Spectre v1 vulnerabilities ALSA: pcm: Fix potential Spectre v1 vulnerability ALSA: emux: Fix potential Spectre v1 vulnerabilities ALSA: hda: add mute LED support for HP EliteBook 840 G4 ALSA: hda/tegra: clear pending irq handlers USB: serial: pl2303: add ids for Hewlett-Packard HP POS pole displays USB: serial: option: add Fibocom NL678 series usb: r8a66597: Fix a possible concurrency use-after-free bug in r8a66597_endpoint_disable() Input: elan_i2c - add ACPI ID for touchpad in ASUS Aspire F5-573G KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup perf pmu: Suppress potential format-truncation warning ext4: fix possible use after free in ext4_quota_enable ext4: missing unlock/put_page() in ext4_try_to_write_inline_data() ext4: fix EXT4_IOC_GROUP_ADD ioctl ext4: force inode writes when nfsd calls commit_metadata() spi: bcm2835: Fix race on DMA termination spi: bcm2835: Fix book-keeping of DMA termination spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode cdc-acm: fix abnormal DATA RX issue for Mediatek Preloader. media: vivid: free bitmap_cap when updating std/timings/etc. MIPS: Ensure pmd_present() returns false after pmd_mknotpresent() MIPS: Align kernel load address to 64KB CIFS: Fix error mapping for SMB2_LOCK command which caused OFD lock problem x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested spi: bcm2835: Unbreak the build of esoteric configs powerpc: Fix COFF zImage booting on old powermacs ARM: imx: update the cpu power up timing setting on i.mx6sx Input: restore EV_ABS ABS_RESERVED checkstack.pl: fix for aarch64 xfrm: Fix bucket count reported to userspace scsi: bnx2fc: Fix NULL dereference in error handling Input: omap-keypad - fix idle configuration to not block SoC idle states scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown fork: record start_time late hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL mm, devm_memremap_pages: kill mapping "System RAM" support sunrpc: fix cache_head leak due to queued request sunrpc: use SVC_NET() in svcauth_gss_* functions crypto: x86/chacha20 - avoid sleeping with preemption disabled ALSA: cs46xx: Potential NULL dereference in probe ALSA: usb-audio: Avoid access before bLength check in build_audio_procunit() ALSA: usb-audio: Fix an out-of-bound read in create_composite_quirks dlm: fixed memory leaks after failed ls_remove_names allocation dlm: possible memory leak on error path in create_lkb() dlm: lost put_lkb on error path in receive_convert() and receive_unlock() dlm: memory leaks on error path in dlm_user_request() gfs2: Fix loop in gfs2_rbm_find b43: Fix error in cordic routine 9p/net: put a lower bound on msize iommu/vt-d: Handle domain agaw being less than iommu agaw ceph: don't update importing cap's mseq when handing cap export genwqe: Fix size check intel_th: msu: Fix an off-by-one in attribute store power: supply: olpc_battery: correct the temperature units Linux 4.4.170 Change-Id: I1b2927583f8853bfeb3ad11d045c2cf5c5c926f3 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-01-13 10:34:49 +01:00
Greg Kroah-Hartman	b83b3fa784	Linux 4.4.170	2019-01-13 10:05:34 +01:00
Lubomir Rintel	1bd63edb92	power: supply: olpc_battery: correct the temperature units commit `ed54ffbe55` upstream. According to [1] and [2], the temperature values are in tenths of degree Celsius. Exposing the Celsius value makes the battery appear on fire: $ upower -i /org/freedesktop/UPower/devices/battery_olpc_battery ... temperature: 236.9 degrees C Tested on OLPC XO-1 and OLPC XO-1.75 laptops. [1] include/linux/power_supply.h [2] Documentation/power/power_supply_class.txt Fixes: `fb972873a7` ("[BATTERY] One Laptop Per Child power/battery driver") Cc: stable@vger.kernel.org Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:34 +01:00
Alexander Shishkin	b1892669f7	intel_th: msu: Fix an off-by-one in attribute store commit `ec5b5ad6e2` upstream. The 'nr_pages' attribute of the 'msc' subdevices parses a comma-separated list of window sizes, passed from userspace. However, there is a bug in the string parsing logic wherein it doesn't exclude the comma character from the range of characters as it consumes them. This leads to an out-of-bounds access given a sufficiently long list. For example: > # echo 8,8,8,8 > /sys/bus/intel_th/devices/0-msc0/nr_pages > ================================================================== > BUG: KASAN: slab-out-of-bounds in memchr+0x1e/0x40 > Read of size 1 at addr ffff8803ffcebcd1 by task sh/825 > > CPU: 3 PID: 825 Comm: npktest.sh Tainted: G W 4.20.0-rc1+ > Call Trace: > dump_stack+0x7c/0xc0 > print_address_description+0x6c/0x23c > ? memchr+0x1e/0x40 > kasan_report.cold.5+0x241/0x308 > memchr+0x1e/0x40 > nr_pages_store+0x203/0xd00 [intel_th_msu] Fix this by accounting for the comma character. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Fixes: `ba82664c13` ("intel_th: Add Memory Storage Unit driver") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:33 +01:00
Christian Borntraeger	21c7b13778	genwqe: Fix size check commit `fdd6696846` upstream. Calling the test program genwqe_cksum with the default buffer size of 2MB triggers the following kernel warning on s390: WARNING: CPU: 30 PID: 9311 at mm/page_alloc.c:3189 __alloc_pages_nodemask+0x45c/0xbe0 CPU: 30 PID: 9311 Comm: genwqe_cksum Kdump: loaded Not tainted 3.10.0-957.el7.s390x #1 task: 00000005e5d13980 ti: 00000005e7c6c000 task.ti: 00000005e7c6c000 Krnl PSW : 0704c00180000000 00000000002780ac (__alloc_pages_nodemask+0x45c/0xbe0) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3 Krnl GPRS: 00000000002932b8 0000000000b73d7c 0000000000000010 0000000000000009 0000000000000041 00000005e7c6f9b8 0000000000000001 00000000000080d0 0000000000000000 0000000000b70500 0000000000000001 0000000000000000 0000000000b70528 00000000007682c0 0000000000277df2 00000005e7c6f9a0 Krnl Code: 000000000027809e: de7195001000 ed 1280(114,%r9),0(%r1) 00000000002780a4: a774fead brc 7,277dfe #00000000002780a8: a7f40001 brc 15,2780aa >00000000002780ac: 92011000 mvi 0(%r1),1 00000000002780b0: a7f4fea7 brc 15,277dfe 00000000002780b4: 9101c6b6 tm 1718(%r12),1 00000000002780b8: a784ff3a brc 8,277f2c 00000000002780bc: a7f4fe2e brc 15,277d18 Call Trace: ([<0000000000277df2>] __alloc_pages_nodemask+0x1a2/0xbe0) [<000000000013afae>] s390_dma_alloc+0xfe/0x310 [<000003ff8065f362>] __genwqe_alloc_consistent+0xfa/0x148 [genwqe_card] [<000003ff80658f7a>] genwqe_mmap+0xca/0x248 [genwqe_card] [<00000000002b2712>] mmap_region+0x4e2/0x778 [<00000000002b2c54>] do_mmap+0x2ac/0x3e0 [<0000000000292d7e>] vm_mmap_pgoff+0xd6/0x118 [<00000000002b081c>] SyS_mmap_pgoff+0xdc/0x268 [<00000000002b0a34>] SyS_old_mmap+0x8c/0xb0 [<000000000074e518>] sysc_tracego+0x14/0x1e [<000003ffacf87dc6>] 0x3ffacf87dc6 turns out the check in __genwqe_alloc_consistent uses "> MAX_ORDER" while the mm code uses ">= MAX_ORDER". Fix genwqe. Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Frank Haverkamp <haver@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:33 +01:00
Yan, Zheng	5f6ce5ea83	ceph: don't update importing cap's mseq when handing cap export commit `3c1392d4c4` upstream. Updating mseq makes client think importer mds has accepted all prior cap messages and importer mds knows what caps client wants. Actually some cap messages may have been dropped because of mseq mismatch. If mseq is left untouched, importing cap's mds_wanted later will get reset by cap import message. Cc: stable@vger.kernel.org Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:33 +01:00
Sohil Mehta	0216bf654a	iommu/vt-d: Handle domain agaw being less than iommu agaw commit `3569dd07aa` upstream. The Intel IOMMU driver opportunistically skips a few top level page tables from the domain paging directory while programming the IOMMU context entry. However there is an implicit assumption in the code that domain's adjusted guest address width (agaw) would always be greater than IOMMU's agaw. The IOMMU capabilities in an upcoming platform cause the domain's agaw to be lower than IOMMU's agaw. The issue is seen when the IOMMU supports both 4-level and 5-level paging. The domain builds a 4-level page table based on agaw of 2. However the IOMMU's agaw is set as 3 (5-level). In this case the code incorrectly tries to skip page page table levels. This causes the IOMMU driver to avoid programming the context entry. The fix handles this case and programs the context entry accordingly. Fixes: `de24e55395` ("iommu/vt-d: Simplify domain_context_mapping_one") Cc: <stable@vger.kernel.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reported-by: Ramos Falcon, Ernesto R <ernesto.r.ramos.falcon@intel.com> Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:33 +01:00
Dominique Martinet	2d75014407	9p/net: put a lower bound on msize commit `574d356b7a` upstream. If the requested msize is too small (either from command line argument or from the server version reply), we won't get any work done. If it's really too small, nothing will work, and this got caught by syzbot recently (on a new kmem_cache_create_usercopy() call) Just set a minimum msize to 4k in both code paths, until someone complains they have a use-case for a smaller msize. We need to check in both mount option and server reply individually because the msize for the first version request would be unchecked with just a global check on clnt->msize. Link: http://lkml.kernel.org/r/1541407968-31350-1-git-send-email-asmadeus@codewreck.org Reported-by: syzbot+0c1d61e4db7db94102ca@syzkaller.appspotmail.com Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:33 +01:00
Larry Finger	c820ac339c	b43: Fix error in cordic routine commit `8ea3819c0b` upstream. The cordic routine for calculating sines and cosines that was added in commit `6f98e62a9f` ("b43: update cordic code to match current specs") contains an error whereby a quantity declared u32 can in fact go negative. This problem was detected by Priit Laes who is switching b43 to use the routine in the library functions of the kernel. Fixes: `9865045403` ("b43: make cordic common (LP-PHY and N-PHY need it)") Reported-by: Priit Laes <plaes@plaes.org> Cc: Rafał Miłecki <zajec5@gmail.com> Cc: Stable <stable@vger.kernel.org> # 2.6.34 Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Priit Laes <plaes@plaes.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:33 +01:00
Andreas Gruenbacher	1845189826	gfs2: Fix loop in gfs2_rbm_find commit `2d29f6b96d` upstream. Fix the resource group wrap-around logic in gfs2_rbm_find that commit `e579ed4f44` broke. The bug can lead to unnecessary repeated scanning of the same bitmaps; there is a risk that future changes will turn this into an endless loop. Fixes: `e579ed4f44` ("GFS2: Introduce rbm field bii") Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:33 +01:00
Vasily Averin	bf72973ce1	dlm: memory leaks on error path in dlm_user_request() commit `d47b41acee` upstream. According to comment in dlm_user_request() ua should be freed in dlm_free_lkb() after successful attach to lkb. However ua is attached to lkb not in set_lock_args() but later, inside request_lock(). Fixes `597d0cae0f` ("[DLM] dlm: user locks") Cc: stable@kernel.org # 2.6.19 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:33 +01:00
Vasily Averin	3ed774e59c	dlm: lost put_lkb on error path in receive_convert() and receive_unlock() commit `c0174726c3` upstream. Fixes `6d40c4a708` ("dlm: improve error and debug messages") Cc: stable@kernel.org # 3.5 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:33 +01:00
Vasily Averin	27f4aa2a0c	dlm: possible memory leak on error path in create_lkb() commit `23851e978f` upstream. Fixes `3d6aa675ff` ("dlm: keep lkbs in idr") Cc: stable@kernel.org # 3.1 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:33 +01:00
Vasily Averin	a09b8db228	dlm: fixed memory leaks after failed ls_remove_names allocation commit `b982896cdb` upstream. If allocation fails on last elements of array need to free already allocated elements. v2: just move existing out_rsbtbl label to right place Fixes 789924ba635f ("dlm: fix race between remove and lookup") Cc: stable@kernel.org # 3.6 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:33 +01:00
Hui Peng	11e047131f	ALSA: usb-audio: Fix an out-of-bound read in create_composite_quirks commit `cbb2ebf70d` upstream. In `create_composite_quirk`, the terminating condition of for loops is `quirk->ifnum < 0`. So any composite quirks should end with `struct snd_usb_audio_quirk` object with ifnum < 0. for (quirk = quirk_comp->data; quirk->ifnum >= 0; ++quirk) { ..... } the data field of Bower's & Wilkins PX headphones usb device device quirks do not end with {.ifnum = -1}, wihch may result in out-of-bound read. This Patch fix the bug by adding an ending quirk object. Fixes: `240a8af929` ("ALSA: usb-audio: Add a quirck for B&W PX headphones") Signed-off-by: Hui Peng <benquike@163.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:32 +01:00
Takashi Iwai	a5e09a908e	ALSA: usb-audio: Avoid access before bLength check in build_audio_procunit() commit `f4351a199c` upstream. The parser for the processing unit reads bNrInPins field before the bLength sanity check, which may lead to an out-of-bound access when a malformed descriptor is given. Fix it by assignment after the bLength check. Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:32 +01:00
Dan Carpenter	83f470ebd7	ALSA: cs46xx: Potential NULL dereference in probe commit `1524f4e47f` upstream. The "chip->dsp_spos_instance" can be NULL on some of the ealier error paths in snd_cs46xx_create(). Reported-by: "Yavuz, Tuba" <tuba@ece.ufl.edu> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:32 +01:00
Eric Biggers	557f16c7fe	crypto: x86/chacha20 - avoid sleeping with preemption disabled In chacha20-simd, clear the MAY_SLEEP flag in the blkcipher_desc to prevent sleeping with preemption disabled, under kernel_fpu_begin(). This was fixed upstream incidentally by a large refactoring, commit `9ae433bc79` ("crypto: chacha20 - convert generic and x86 versions to skcipher"). But syzkaller easily trips over this when running on older kernels, as it's easily reachable via AF_ALG. Therefore, this patch makes the minimal fix for older kernels. Fixes: `c9320b6dcb` ("crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64") Cc: linux-crypto@vger.kernel.org Cc: Martin Willi <martin@strongswan.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:32 +01:00
Vasily Averin	69c1fd103b	sunrpc: use SVC_NET() in svcauth_gss_* functions commit `b8be5674fa` upstream. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:32 +01:00
Vasily Averin	192f7ca0c7	sunrpc: fix cache_head leak due to queued request commit `4ecd55ea07` upstream. After commit `d202cce896`, an expired cache_head can be removed from the cache_detail's hash. However, the expired cache_head may be waiting for a reply from a previously submitted request. Such a cache_head has an increased refcounter and therefore it won't be freed after cache_put(freeme). Because the cache_head was removed from the hash it cannot be found during cache_clean() and can be leaked forever, together with stalled cache_request and other taken resources. In our case we noticed it because an entry in the export cache was holding a reference on a filesystem. Fixes `d202cce896` ("sunrpc: never return expired entries in sunrpc_cache_lookup") Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Cc: stable@kernel.org # 2.6.35 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:32 +01:00
Dan Williams	6331b9d7ac	mm, devm_memremap_pages: kill mapping "System RAM" support commit `06489cfbd9` upstream. Given the fact that devm_memremap_pages() requires a percpu_ref that is torn down by devm_memremap_pages_release() the current support for mapping RAM is broken. Support for remapping "System RAM" has been broken since the beginning and there is no existing user of this this code path, so just kill the support and make it an explicit error. This cleanup also simplifies a follow-on patch to fix the error path when setting a devm release action for devm_memremap_pages_release() fails. Link: http://lkml.kernel.org/r/154275557997.76910.14689813630968180480.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: "Jérôme Glisse" <jglisse@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:32 +01:00
Dan Williams	a93d56de45	mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL commit `808153e118` upstream. devm_memremap_pages() is a facility that can create struct page entries for any arbitrary range and give drivers the ability to subvert core aspects of page management. Specifically the facility is tightly integrated with the kernel's memory hotplug functionality. It injects an altmap argument deep into the architecture specific vmemmap implementation to allow allocating from specific reserved pages, and it has Linux specific assumptions about page structure reference counting relative to get_user_pages() and get_user_pages_fast(). It was an oversight and a mistake that this was not marked EXPORT_SYMBOL_GPL from the outset. Again, devm_memremap_pagex() exposes and relies upon core kernel internal assumptions and will continue to evolve along with 'struct page', memory hotplug, and support for new memory types / topologies. Only an in-kernel GPL-only driver is expected to keep up with this ongoing evolution. This interface, and functionality derived from this interface, is not suitable for kernel-external drivers. Link: http://lkml.kernel.org/r/154275557457.76910.16923571232582744134.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:32 +01:00
Michal Hocko	060853fdd4	hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined commit `b15c87263a` upstream. We have received a bug report that an injected MCE about faulty memory prevents memory offline to succeed on 4.4 base kernel. The underlying reason was that the HWPoison page has an elevated reference count and the migration keeps failing. There are two problems with that. First of all it is dubious to migrate the poisoned page because we know that accessing that memory is possible to fail. Secondly it doesn't make any sense to migrate a potentially broken content and preserve the memory corruption over to a new location. Oscar has found out that 4.4 and the current upstream kernels behave slightly differently with his simply testcase === int main(void) { int ret; int i; int fd; char array = malloc(4096); char array_locked = malloc(4096); fd = open("/tmp/data", O_RDONLY); read(fd, array, 4095); for (i = 0; i < 4096; i++) array_locked[i] = 'd'; ret = mlock((void )PAGE_ALIGN((unsigned long)array_locked), sizeof(array_locked)); if (ret) perror("mlock"); sleep (20); ret = madvise((void )PAGE_ALIGN((unsigned long)array_locked), 4096, MADV_HWPOISON); if (ret) perror("madvise"); for (i = 0; i < 4096; i++) array_locked[i] = 'd'; return 0; } === + offline this memory. In 4.4 kernels he saw the hwpoisoned page to be returned back to the LRU list kernel: [<ffffffff81019ac9>] dump_trace+0x59/0x340 kernel: [<ffffffff81019e9a>] show_stack_log_lvl+0xea/0x170 kernel: [<ffffffff8101ac71>] show_stack+0x21/0x40 kernel: [<ffffffff8132bb90>] dump_stack+0x5c/0x7c kernel: [<ffffffff810815a1>] warn_slowpath_common+0x81/0xb0 kernel: [<ffffffff811a275c>] __pagevec_lru_add_fn+0x14c/0x160 kernel: [<ffffffff811a2eed>] pagevec_lru_move_fn+0xad/0x100 kernel: [<ffffffff811a334c>] __lru_cache_add+0x6c/0xb0 kernel: [<ffffffff81195236>] add_to_page_cache_lru+0x46/0x70 kernel: [<ffffffffa02b4373>] extent_readpages+0xc3/0x1a0 [btrfs] kernel: [<ffffffff811a16d7>] __do_page_cache_readahead+0x177/0x200 kernel: [<ffffffff811a18c8>] ondemand_readahead+0x168/0x2a0 kernel: [<ffffffff8119673f>] generic_file_read_iter+0x41f/0x660 kernel: [<ffffffff8120e50d>] __vfs_read+0xcd/0x140 kernel: [<ffffffff8120e9ea>] vfs_read+0x7a/0x120 kernel: [<ffffffff8121404b>] kernel_read+0x3b/0x50 kernel: [<ffffffff81215c80>] do_execveat_common.isra.29+0x490/0x6f0 kernel: [<ffffffff81215f08>] do_execve+0x28/0x30 kernel: [<ffffffff81095ddb>] call_usermodehelper_exec_async+0xfb/0x130 kernel: [<ffffffff8161c045>] ret_from_fork+0x55/0x80 And that latter confuses the hotremove path because an LRU page is attempted to be migrated and that fails due to an elevated reference count. It is quite possible that the reuse of the HWPoisoned page is some kind of fixed race condition but I am not really sure about that. With the upstream kernel the failure is slightly different. The page doesn't seem to have LRU bit set but isolate_movable_page simply fails and do_migrate_range simply puts all the isolated pages back to LRU and therefore no progress is made and scan_movable_pages finds same set of pages over and over again. Fix both cases by explicitly checking HWPoisoned pages before we even try to get reference on the page, try to unmap it if it is still mapped. As explained by Naoya: : Hwpoison code never unmapped those for no big reason because : Ksm pages never dominate memory, so we simply didn't have strong : motivation to save the pages. Also put WARN_ON(PageLRU) in case there is a race and we can hit LRU HWPoison pages which shouldn't happen but I couldn't convince myself about that. Naoya has noted the following: : Theoretically no such gurantee, because try_to_unmap() doesn't have a : guarantee of success and then memory_failure() returns immediately : when hwpoison_user_mappings fails. : Or the following code (comes after hwpoison_user_mappings block) also impli= : es : that the target page can still have PageLRU flag. : : /* : * Torn down by someone else? : */ : if (PageLRU(p) && !PageSwapCache(p) && p->mapping =3D=3D NULL) { : action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED); : res =3D -EBUSY; : goto out; : } : : So I think it's OK to keep "if (WARN_ON(PageLRU(page)))" block in : current version of your patch. Link: http://lkml.kernel.org/r/20181206120135.14079-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.com> Debugged-by: Oscar Salvador <osalvador@suse.com> Tested-by: Oscar Salvador <osalvador@suse.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:32 +01:00
David Herrmann	d447cf0cee	fork: record start_time late commit `7b55851367` upstream. This changes the fork(2) syscall to record the process start_time after initializing the basic task structure but still before making the new process visible to user-space. Technically, we could record the start_time anytime during fork(2). But this might lead to scenarios where a start_time is recorded long before a process becomes visible to user-space. For instance, with userfaultfd(2) and TLS, user-space can delay the execution of fork(2) for an indefinite amount of time (and will, if this causes network access, or similar). By recording the start_time late, it much closer reflects the point in time where the process becomes live and can be observed by other processes. Lastly, this makes it much harder for user-space to predict and control the start_time they get assigned. Previously, user-space could fork a process and stall it in copy_thread_tls() before its pid is allocated, but after its start_time is recorded. This can be misused to later-on cycle through PIDs and resume the stalled fork(2) yielding a process that has the same pid and start_time as a process that existed before. This can be used to circumvent security systems that identify processes by their pid+start_time combination. Even though user-space was always aware that start_time recording is flaky (but several projects are known to still rely on start_time-based identification), changing the start_time to be recorded late will help mitigate existing attacks and make it much harder for user-space to control the start_time a process gets assigned. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:32 +01:00
Steffen Maier	d15d1677aa	scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown commit `60a161b7e5` upstream. Suppose adapter (open) recovery is between opened QDIO queues and before (the end of) initial posting of status read buffers (SRBs). This time window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING causing by design looping with exponential increase sleeps in the function performing exchange config data during recovery [zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up. Suppose an event occurs for which the FCP channel would send an unsolicited notification to zfcp by means of a previously posted SRB. We saw it with local cable pull (link down) in multi-initiator zoning with multiple NPIV-enabled subchannels of the same shared FCP channel. As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the initial status read buffers from within the adapter's ERP thread, the channel does send an unsolicited notification. Since v2.6.27 commit `d26ab06ede` ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules adapter->stat_work to re-fill the just consumed SRB from a work item. Now the ERP thread and the work item post SRBs in parallel. Both contexts call the helper function zfcp_status_read_refill(). The tracking of missing (to be posted / re-filled) SRBs is not thread-safe due to separate atomic_read() and atomic_dec(), in order to depend on posting success. Hence, both contexts can see atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue (trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in zfcp_qdio_handler_error(). An obvious and seemingly clean fix would be to schedule stat_work from the ERP thread and wait for it to finish. This would serialize all SRB re-fills. However, we already have another work item wait on the ERP thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all the open port recoveries during zfcp auto port scan, but in fact it waits for any pending recovery including an adapter recovery. This approach leads to a deadlock. [see also v3.19 commit `18f87a67e6` ("zfcp: auto port scan resiliency"); v2.6.37 commit `d3e1088d68` ("[SCSI] zfcp: No ERP escalation on gpn_ft eval"); v2.6.28 commit `fca55b6fb5` ("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP") fixing v2.6.27 commit `c57a39a45a` ("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port"); v2.6.27 commit `cc8c282963` ("[SCSI] zfcp: Automatically attach remote ports")] Instead make the accounting of missing SRBs atomic for parallel execution in both the ERP thread and adapter->stat_work. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: `d26ab06ede` ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall") Cc: <stable@vger.kernel.org> #2.6.27+ Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-13 10:05:32 +01:00
Tony Lindgren	beb2654f07	Input: omap-keypad - fix idle configuration to not block SoC idle states [ Upstream commit `e2ca26ec4f` ] With PM enabled, I noticed that pressing a key on the droid4 keyboard will block deeper idle states for the SoC. Let's fix this by using IRQF_ONESHOT and stop constantly toggling the device OMAP4_KBD_IRQENABLE register as suggested by Dmitry Torokhov <dmitry.torokhov@gmail.com>. From the hardware point of view, looks like we need to manage the registers for OMAP4_KBD_IRQENABLE and OMAP4_KBD_WAKEUPENABLE together to avoid blocking deeper SoC idle states. And with toggling of OMAP4_KBD_IRQENABLE register now gone with IRQF_ONESHOT, also the SoC idle state problem is gone during runtime. We still also need to clear OMAP4_KBD_WAKEUPENABLE in omap4_keypad_close() though to pair it with omap4_keypad_open() to prevent blocking deeper SoC idle states after rmmod omap4-keypad. Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-13 10:05:32 +01:00
Dan Carpenter	ebcdd1195e	scsi: bnx2fc: Fix NULL dereference in error handling [ Upstream commit `9ae4f8420e` ] If "interface" is NULL then we can't release it and trying to will only lead to an Oops. Fixes: `aea71a0249` ("[SCSI] bnx2fc: Introduce interface structure for each vlan interface") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-13 10:05:31 +01:00

1 2 3 4 5 ...

576682 Commits