linux

mirror of https://github.com/hardkernel/linux.git synced 2026-06-11 13:27:06 +09:00

Author	SHA1	Message	Date
Cliff Chen	3f399ec8bb	f2fs: add a new limit for reserve root The reserved root blocks is not enough for booting Android due to the limit of 0.2% if the fs size too small. so we add a new mini- mum limit is 128MB. Change-Id: I5af3b182001d27e4d18b4090c5270bbb2ac6253b Signed-off-by: Cliff Chen <cliff.chen@rock-chips.com>	2019-02-25 16:39:12 +08:00
Cliff Chen	b992ad3197	f2fs: modify f_blocks for statfs The f_blocks of statfs include file system overhead，it is not normal usage of Posix. Change-Id: If481626b08c05290626938586e2dc721690f1a91 Signed-off-by: Cliff Chen <cliff.chen@rock-chips.com>	2019-02-25 16:31:50 +08:00
Greg Kroah-Hartman	c28f73fe42	Merge 4.19.20 into android-4.19 Changes in 4.19.20 Fix "net: ipv4: do not handle duplicate fragments as overlapping" drm/msm/gpu: fix building without debugfs ipv6: Consider sk_bound_dev_if when binding a socket to an address ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation ipvlan, l3mdev: fix broken l3s mode wrt local routes l2tp: copy 4 more bytes to linear part if necessary l2tp: fix reading optional fields of L2TPv3 net: ip_gre: always reports o_key to userspace net: ip_gre: use erspan key field for tunnel lookup net/mlx4_core: Add masking for a few queries on HCA caps netrom: switch to sock timer API net/rose: fix NULL ax25_cb kernel panic net: set default network namespace in init_dummy_netdev() ravb: expand rx descriptor data to accommodate hw checksum sctp: improve the events for sctp stream reset tun: move the call to tun_set_real_num_queues ucc_geth: Reset BQL queue when stopping device vhost: fix OOB in get_rx_bufs() net: ip6_gre: always reports o_key to userspace sctp: improve the events for sctp stream adding net/mlx5e: Allow MAC invalidation while spoofchk is ON ip6mr: Fix notifiers call on mroute_clean_tables() Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager" sctp: set chunk transport correctly when it's a new asoc sctp: set flow sport from saddr only when it's 0 virtio_net: Don't enable NAPI when interface is down virtio_net: Don't call free_old_xmit_skbs for xdp_frames virtio_net: Fix not restoring real_num_rx_queues virtio_net: Fix out of bounds access of sq virtio_net: Don't process redirected XDP frames when XDP is disabled virtio_net: Use xdp_return_frame to free xdp_frames on destroying vqs virtio_net: Differentiate sk_buff and xdp_frame on freeing CIFS: Do not count -ENODATA as failure for query directory CIFS: Fix trace command logging for SMB2 reads and writes CIFS: Do not consider -ENODATA as stat failure for reads fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb() iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions() selftests/seccomp: Enhance per-arch ptrace syscall skip tests NFS: Fix up return value on fatal errors in nfs_page_async_flush() ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment arm64: kaslr: ensure randomized quantities are clean also when kaslr is off arm64: Do not issue IPIs for user executable ptes arm64: hyp-stub: Forbid kprobing of the hyp-stub arm64: hibernate: Clean the __hyp_text to PoC after resume gpio: altera-a10sr: Set proper output level for direction_output gpiolib: fix line event timestamps for nested irqs gpio: pcf857x: Fix interrupts on multiple instances gpio: sprd: Fix the incorrect data register gpio: sprd: Fix incorrect irq type setting for the async EIC gfs2: Revert "Fix loop in gfs2_rbm_find" mmc: bcm2835: Fix DMA channel leak on probe error mmc: mediatek: fix incorrect register setting of hs400_cmd_int_delay ALSA: usb-audio: Add Opus #3 to quirks for native DSD support ALSA: hda/realtek - Fixed hp_pin no value IB/hfi1: Remove overly conservative VM_EXEC flag check platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes mmc: sdhci-iproc: handle mmc_of_parse() errors during probe Btrfs: fix deadlock when allocating tree block during leaf/node split btrfs: On error always free subvol_name in btrfs_mount kernel/exit.c: release ptraced tasks before zap_pid_ns_processes mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT oom, oom_reaper: do not enqueue same task twice mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages mm, oom: fix use-after-free in oom_kill_process mm: hwpoison: use do_send_sig_info() instead of force_sig() mm: migrate: don't rely on __PageMovable() of newpage after unlocking it of: Convert to using %pOFn instead of device_node.name of: overlay: add tests to validate kfrees from overlay removal of: overlay: add missing of_node_get() in __of_attach_node_sysfs of: overlay: use prop add changeset entry for property in new nodes of: overlay: do not duplicate properties from overlay for new nodes md/raid5: fix 'out of memory' during raid cache recovery cifs: Always resolve hostname before reconnecting Linux 4.19.20 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-02-07 08:40:17 +01:00
Paulo Alcantara	c0be624777	cifs: Always resolve hostname before reconnecting commit `28eb24ff75` upstream. In case a hostname resolves to a different IP address (e.g. long running mounts), make sure to resolve it every time prior to calling generic_ip_connect() in reconnect. Suggested-by: Steve French <stfrench@microsoft.com> Signed-off-by: Paulo Alcantara <palcantara@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-06 17:30:16 +01:00
Eric W. Biederman	9ee5987f31	btrfs: On error always free subvol_name in btrfs_mount commit `532b618bdf` upstream. The subvol_name is allocated in btrfs_parse_subvol_options and is consumed and freed in mount_subvol. Add a free to the error paths that don't call mount_subvol so that it is guaranteed that subvol_name is freed when an error happens. Fixes: `312c89fbca` ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()") Cc: stable@vger.kernel.org # v4.19+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-06 17:30:14 +01:00
Filipe Manana	5bce143671	Btrfs: fix deadlock when allocating tree block during leaf/node split commit `a627947076` upstream. When splitting a leaf or node from one of the trees that are modified when flushing pending block groups (extent, chunk, device and free space trees), we need to allocate a new tree block, which in turn can result in the need to allocate a new block group. After allocating the new block group we may need to flush new block groups that were previously allocated during the course of the current transaction, which is what may cause a deadlock due to attempts to write lock twice the same leaf or node, as when splitting a leaf or node we are holding a write lock on it and its parent node. The same type of deadlock can also happen when increasing the tree's height, since we are holding a lock on the existing root while allocating the tree block to use as the new root node. An example trace when the deadlock happens during the leaf split path is: [27175.293054] CPU: 0 PID: 3005 Comm: kworker/u17:6 Tainted: G W 4.19.16 #1 [27175.293942] Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018 [27175.294846] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] (...) [27175.298384] RSP: 0018:ffffab2087107758 EFLAGS: 00010246 [27175.299269] RAX: 0000000000000bbd RBX: ffff9fadc7141c48 RCX: 0000000000000001 [27175.300155] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9fadc7141c48 [27175.301023] RBP: 0000000000000001 R08: ffff9faeb6ac1040 R09: ffff9fa9c0000000 [27175.301887] R10: 0000000000000000 R11: 0000000000000040 R12: ffff9fb21aac8000 [27175.302743] R13: ffff9fb1a64d6a20 R14: 0000000000000001 R15: ffff9fb1a64d6a18 [27175.303601] FS: 0000000000000000(0000) GS:ffff9fb21fa00000(0000) knlGS:0000000000000000 [27175.304468] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [27175.305339] CR2: 00007fdc8743ead8 CR3: 0000000763e0a006 CR4: 00000000003606f0 [27175.306220] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [27175.307087] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [27175.307940] Call Trace: [27175.308802] btrfs_search_slot+0x779/0x9a0 [btrfs] [27175.309669] ? update_space_info+0xba/0xe0 [btrfs] [27175.310534] btrfs_insert_empty_items+0x67/0xc0 [btrfs] [27175.311397] btrfs_insert_item+0x60/0xd0 [btrfs] [27175.312253] btrfs_create_pending_block_groups+0xee/0x210 [btrfs] [27175.313116] do_chunk_alloc+0x25f/0x300 [btrfs] [27175.313984] find_free_extent+0x706/0x10d0 [btrfs] [27175.314855] btrfs_reserve_extent+0x9b/0x1d0 [btrfs] [27175.315707] btrfs_alloc_tree_block+0x100/0x5b0 [btrfs] [27175.316548] split_leaf+0x130/0x610 [btrfs] [27175.317390] btrfs_search_slot+0x94d/0x9a0 [btrfs] [27175.318235] btrfs_insert_empty_items+0x67/0xc0 [btrfs] [27175.319087] alloc_reserved_file_extent+0x84/0x2c0 [btrfs] [27175.319938] __btrfs_run_delayed_refs+0x596/0x1150 [btrfs] [27175.320792] btrfs_run_delayed_refs+0xed/0x1b0 [btrfs] [27175.321643] delayed_ref_async_start+0x81/0x90 [btrfs] [27175.322491] normal_work_helper+0xd0/0x320 [btrfs] [27175.323328] ? move_linked_works+0x6e/0xa0 [27175.324160] process_one_work+0x191/0x370 [27175.324976] worker_thread+0x4f/0x3b0 [27175.325763] kthread+0xf8/0x130 [27175.326531] ? rescuer_thread+0x320/0x320 [27175.327284] ? kthread_create_worker_on_cpu+0x50/0x50 [27175.328027] ret_from_fork+0x35/0x40 [27175.328741] ---[ end trace 300a1b9f0ac30e26 ]--- Fix this by preventing the flushing of new blocks groups when splitting a leaf/node and when inserting a new root node for one of the trees modified by the flushing operation, similar to what is done when COWing a node/leaf from on of these trees. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202383 Reported-by: Eli V <eliventer@gmail.com> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-06 17:30:14 +01:00
Andreas Gruenbacher	8b9be9db8a	gfs2: Revert "Fix loop in gfs2_rbm_find" commit `e74c98ca2d` upstream. This reverts commit `2d29f6b96d`. It turns out that the fix can lead to a ~20 percent performance regression in initial writes to the page cache according to iozone. Let's revert this for now to have more time for a proper fix. Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-06 17:30:13 +01:00
Trond Myklebust	0a3275d785	NFS: Fix up return value on fatal errors in nfs_page_async_flush() commit `8fc75bed96` upstream. Ensure that we return the fatal error value that caused us to exit nfs_page_async_flush(). Fixes: `c373fff7bd` ("NFSv4: Don't special case "launder"") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.12+ Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-06 17:30:11 +01:00
Waiman Long	bb4e1ff5a8	fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb() commit `1dbd449c99` upstream. The nr_dentry_unused per-cpu counter tracks dentries in both the LRU lists and the shrink lists where the DCACHE_LRU_LIST bit is set. The shrink_dcache_sb() function moves dentries from the LRU list to a shrink list and subtracts the dentry count from nr_dentry_unused. This is incorrect as the nr_dentry_unused count will also be decremented in shrink_dentry_list() via d_shrink_del(). To fix this double decrement, the decrement in the shrink_dcache_sb() function is taken out. Fixes: `4e717f5c10` ("list_lru: remove special case function list_lru_dispose_all." Cc: stable@kernel.org Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-06 17:30:11 +01:00
Pavel Shilovsky	e9d56f920b	CIFS: Do not consider -ENODATA as stat failure for reads commit `082aaa8700` upstream. When doing reads beyound the end of a file the server returns error STATUS_END_OF_FILE error which is mapped to -ENODATA. Currently we report it as a failure which confuses read stats. Change it to not consider -ENODATA as failure for stat purposes. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-06 17:30:11 +01:00
Pavel Shilovsky	6e7045ec33	CIFS: Fix trace command logging for SMB2 reads and writes commit `7d42e72fe8` upstream. Currently we log success once we send an async IO request to the server. Instead we need to analyse a response and then log success or failure for a particular command. Also fix argument list for read logging. Cc: <stable@vger.kernel.org> # 4.18 Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-06 17:30:11 +01:00
Pavel Shilovsky	c6961288a5	CIFS: Do not count -ENODATA as failure for query directory commit `8e6e72aece` upstream. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-06 17:30:11 +01:00
Greg Kroah-Hartman	18ba00a34e	Merge 4.19.19 into android-4.19 Changes in 4.19.19 amd-xgbe: Fix mdio access for non-zero ports and clause 45 PHYs net: bridge: Fix ethernet header pointer before check skb forwardable net: Fix usage of pskb_trim_rcsum net: phy: marvell: Errata for mv88e6390 internal PHYs net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling net/sched: act_tunnel_key: fix memory leak in case of action replace net_sched: refetch skb protocol for each filter openvswitch: Avoid OOB read when parsing flow nlattrs vhost: log dirty page correctly mlxsw: pci: Increase PCI SW reset timeout net: ipv4: Fix memory leak in network namespace dismantle mlxsw: spectrum_fid: Update dummy FID index mlxsw: pci: Ring CQ's doorbell before RDQ's net/sched: cls_flower: allocate mask dynamically in fl_change() udp: with udp_segment release on error path ip6_gre: fix tunnel list corruption for x-netns erspan: build the header with the right proto according to erspan_ver net: phy: marvell: Fix deadlock from wrong locking ip6_gre: update version related info when changing link tcp: allow MSG_ZEROCOPY transmission also in CLOSE_WAIT state mei: me: mark LBG devices as having dma support mei: me: add denverton innovation engine device IDs USB: leds: fix regression in usbport led trigger USB: serial: simple: add Motorola Tetra TPG2200 device id USB: serial: pl2303: add new PID to support PL2303TB ceph: clear inode pointer when snap realm gets dropped by its inode ASoC: atom: fix a missing check of snd_pcm_lib_malloc_pages ASoC: rt5514-spi: Fix potential NULL pointer dereference ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode clk: socfpga: stratix10: fix rate calculation for pll clocks clk: socfpga: stratix10: fix naming convention for the fixed-clocks inotify: Fix fd refcount leak in inotify_add_watch(). ALSA: hda/realtek - Fix typo for ALC225 model ALSA: hda - Add mute LED support for HP ProBook 470 G5 ARCv2: lib: memeset: fix doing prefetchw outside of buffer ARC: adjust memblock_reserve of kernel memory ARC: perf: map generic branches to correct hardware condition s390/mm: always force a load of the primary ASCE on context switch s390/early: improve machine detection s390/smp: fix CPU hotplug deadlock with CPU rescan misc: ibmvsm: Fix potential NULL pointer dereference char/mwave: fix potential Spectre v1 vulnerability mmc: dw_mmc-bluefield: : Fix the license information mmc: meson-gx: Free irq in release() callback staging: rtl8188eu: Add device code for D-Link DWA-121 rev B1 tty: Handle problem if line discipline does not have receive_buf uart: Fix crash in uart_write and uart_put_char tty/n_hdlc: fix __might_sleep warning hv_balloon: avoid touching uninitialized struct page during tail onlining Drivers: hv: vmbus: Check for ring when getting debug info vgacon: unconfuse vc_origin when using soft scrollback CIFS: Fix possible hang during async MTU reads and writes CIFS: Fix credits calculations for reads with errors CIFS: Fix credit calculation for encrypted reads with errors CIFS: Do not reconnect TCP session in add_credits() smb3: add credits we receive from oplock/break PDUs Input: xpad - add support for SteelSeries Stratus Duo Input: input_event - provide override for sparc64 Input: uinput - fix undefined behavior in uinput_validate_absinfo() acpi/nfit: Block function zero DSMs acpi/nfit: Fix command-supported detection scsi: ufs: Use explicit access size in ufshcd_dump_regs dm thin: fix passdown_double_checking_shared_status() dm crypt: fix parsing of extended IV arguments drm/amdgpu: Add APTX quirk for Lenovo laptop KVM: x86: Fix single-step debugging KVM: x86: Fix PV IPIs for 32-bit KVM host KVM: x86: WARN_ONCE if sending a PV IPI returns a fatal error kvm: x86/vmx: Use kzalloc for cached_vmcs12 KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned x86/pkeys: Properly copy pkey state at fork() x86/selftests/pkeys: Fork() to check for state being preserved x86/kaslr: Fix incorrect i8254 outb() parameters x86/entry/64/compat: Fix stack switching for XEN PV posix-cpu-timers: Unbreak timer rearming net: sun: cassini: Cleanup license conflict irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size can: dev: __can_get_echo_skb(): fix bogous check for non-existing skb by removing it can: bcm: check timer values before ktime conversion can: flexcan: fix NULL pointer exception during bringup vt: make vt_console_print() compatible with the unicode screen buffer vt: always call notifier with the console lock held vt: invoke notifier on screen size change drm/meson: Fix atomic mode switching regression bpf: improve verifier branch analysis bpf: add per-insn complexity limit bpf: move {prev_,}insn_idx into verifier env bpf: move tmp variable into ax register in interpreter bpf: enable access to ax register also from verifier rewrite bpf: restrict map value pointer arithmetic for unprivileged bpf: restrict stack pointer arithmetic for unprivileged bpf: restrict unknown scalars of mixed signed bounds for unprivileged bpf: fix check_map_access smin_value test when pointer contains offset bpf: prevent out of bounds speculation on pointer arithmetic bpf: fix sanitation of alu op with pointer / scalar type from different paths bpf: fix inner map masking to prevent oob under speculation s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU nvmet-rdma: Add unlikely for response allocated check nvmet-rdma: fix null dereference under heavy load Revert "mm, memory_hotplug: initialize struct pages for the full memory section" usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup ide: fix a typo in the settings proc file name Input: input_event - fix the CONFIG_SPARC64 mixup Linux 4.19.19 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-01-31 08:29:40 +01:00
Ronnie Sahlberg	06d9f98720	smb3: add credits we receive from oplock/break PDUs commit `2e5700bdde` upstream. Otherwise we gradually leak credits leading to potential hung session. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-31 08:14:37 +01:00
Pavel Shilovsky	779c65bb77	CIFS: Do not reconnect TCP session in add_credits() commit `ef68e83184` upstream. When executing add_credits() we currently call cifs_reconnect() if the number of credits is zero and there are no requests in flight. In this case we may call cifs_reconnect() recursively twice and cause memory corruption given the following sequence of functions: mid1.callback() -> add_credits() -> cifs_reconnect() -> -> mid2.callback() -> add_credits() -> cifs_reconnect(). Fix this by avoiding to call cifs_reconnect() in add_credits() and checking for zero credits in the demultiplex thread. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-31 08:14:37 +01:00
Pavel Shilovsky	2ae6fedbd5	CIFS: Fix credit calculation for encrypted reads with errors commit `ec678eae74` upstream. We do need to account for credits received in error responses to read requests on encrypted sessions. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-31 08:14:37 +01:00
Pavel Shilovsky	0380ed9b1c	CIFS: Fix credits calculations for reads with errors commit `8004c78c68` upstream. Currently we mark MID as malformed if we get an error from server in a read response. This leads to not properly processing credits in the readv callback. Fix this by marking such a response as normal received response and process it appropriately. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-31 08:14:37 +01:00
Pavel Shilovsky	07b9e5e35e	CIFS: Fix possible hang during async MTU reads and writes commit `acc58d0bab` upstream. When doing MTU i/o we need to leave some credits for possible reopen requests and other operations happening in parallel. Currently we leave 1 credit which is not enough even for reopen only: we need at least 2 credits if durable handle reconnect fails. Also there may be other operations at the same time including compounding ones which require 3 credits at a time each. Fix this by leaving 8 credits which is big enough to cover most scenarios. Was able to reproduce this when server was configured to give out fewer credits than usual. The proper fix would be to reconnect a file handle first and then obtain credits for an MTU request but this leads to bigger code changes and should happen in other patches. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-31 08:14:37 +01:00
Tetsuo Handa	a719cbe078	inotify: Fix fd refcount leak in inotify_add_watch(). commit `125892edfe` upstream. Commit `4d97f7d53d` ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()") forgot to call fdput() before bailing out. Fixes: `4d97f7d53d` ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()") CC: stable@vger.kernel.org Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-31 08:14:34 +01:00
Yan, Zheng	3e05ceedf1	ceph: clear inode pointer when snap realm gets dropped by its inode commit `d95e674c01` upstream. snap realm and corresponding inode have pointers to each other. The two pointer should get clear at the same time. Otherwise, snap realm's pointer may reference freed inode. Cc: stable@vger.kernel.org # 4.17+ Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-31 08:14:34 +01:00
Greg Kroah-Hartman	26bf816608	Merge 4.19.18 into android-4.19 Changes in 4.19.18 ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address mlxsw: spectrum: Disable lag port TX before removing it mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletion net: dsa: mv88x6xxx: mv88e6390 errata net, skbuff: do not prefer skb allocation fails early qmi_wwan: add MTU default to qmap network interface r8169: Add support for new Realtek Ethernet ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses net: clear skb->tstamp in bridge forwarding path netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets gpio: pl061: Move irq_chip definition inside struct pl061 drm/amd/display: Guard against null stream_state in set_crc_source drm/amdkfd: fix interrupt spin lock ixgbe: allow IPsec Tx offload in VEPA mode platform/x86: asus-wmi: Tell the EC the OS will handle the display off hotkey e1000e: allow non-monotonic SYSTIM readings usb: typec: tcpm: Do not disconnect link for self powered devices selftests/bpf: enable (uncomment) all tests in test_libbpf.sh of: overlay: add missing of_node_put() after add new node to changeset writeback: don't decrement wb->refcnt if !wb->bdi serial: set suppress_bind_attrs flag only if builtin bpf: Allow narrow loads with offset > 0 ALSA: oxfw: add support for APOGEE duet FireWire x86/mce: Fix -Wmissing-prototypes warnings MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur crypto: ecc - regularize scalar for scalar multiplication arm64: perf: set suppress_bind_attrs flag to true drm/atomic-helper: Complete fake_commit->flip_done potentially earlier clk: meson: meson8b: fix incorrect divider mapping in cpu_scale_table samples: bpf: fix: error handling regarding kprobe_events usb: gadget: udc: renesas_usb3: add a safety connection way for forced_b_device fpga: altera-cvp: fix probing for multiple FPGAs on the bus selinux: always allow mounting submounts ASoC: pcm3168a: Don't disable pcm3168a when CONFIG_PM defined scsi: qedi: Check for session online before getting iSCSI TLV data. drm/amdgpu: Reorder uvd ring init before uvd resume rxe: IB_WR_REG_MR does not capture MR's iova field efi/libstub: Disable some warnings for x86{,_64} jffs2: Fix use of uninitialized delayed_work, lockdep breakage clk: imx: make mux parent strings const pstore/ram: Do not treat empty buffers as valid media: uvcvideo: Refactor teardown of uvc on USB disconnect powerpc/xmon: Fix invocation inside lock region powerpc/pseries/cpuidle: Fix preempt warning media: firewire: Fix app_info parameter type in avc_ca{,_app}_info ASoC: use dma_ops of parent device for acp_audio_dma media: venus: core: Set dma maximum segment size staging: erofs: fix use-after-free of on-stack `z_erofs_vle_unzip_io' net: call sk_dst_reset when set SO_DONTROUTE scsi: target: use consistent left-aligned ASCII INQUIRY data scsi: target/core: Make sure that target_wait_for_sess_cmds() waits long enough selftests: do not macro-expand failed assertion expressions arm64: kasan: Increase stack size for KASAN_EXTRA clk: imx6q: reset exclusive gates on init arm64: Fix minor issues with the dcache_by_line_op macro bpf: relax verifier restriction on BPF_MOV \| BPF_ALU kconfig: fix file name and line number of warn_ignored_character() kconfig: fix memory leak when EOF is encountered in quotation mmc: atmel-mci: do not assume idle after atmci_request_end btrfs: volumes: Make sure there is no overlap of dev extents at mount time btrfs: alloc_chunk: fix more DUP stripe size handling btrfs: fix use-after-free due to race between replace start and cancel btrfs: improve error handling of btrfs_add_link tty/serial: do not free trasnmit buffer page under port lock perf intel-pt: Fix error with config term "pt=0" perf tests ARM: Disable breakpoint tests 32-bit perf svghelper: Fix unchecked usage of strncpy() perf parse-events: Fix unchecked usage of strncpy() perf vendor events intel: Fix Load_Miss_Real_Latency on SKL/SKX netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is set netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine x86/topology: Use total_cpus for max logical packages calculation dm crypt: use u64 instead of sector_t to store iv_offset dm kcopyd: Fix bug causing workqueue stalls perf stat: Avoid segfaults caused by negated options tools lib subcmd: Don't add the kernel sources to the include path dm snapshot: Fix excessive memory usage and workqueue stalls perf cs-etm: Correct packets swapping in cs_etm__flush() perf tools: Add missing sigqueue() prototype for systems lacking it perf tools: Add missing open_memstream() prototype for systems lacking it quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls. clocksource/drivers/integrator-ap: Add missing of_node_put() dm: Check for device sector overflow if CONFIG_LBDAF is not set Bluetooth: btusb: Add support for Intel bluetooth device 8087:0029 ALSA: bebob: fix model-id of unit for Apogee Ensemble sysfs: Disable lockdep for driver bind/unbind files IB/usnic: Fix potential deadlock scsi: mpt3sas: fix memory ordering on 64bit writes scsi: smartpqi: correct lun reset issues ath10k: fix peer stats null pointer dereference scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown() scsi: megaraid: fix out-of-bound array accesses iomap: don't search past page end in iomap_is_partially_uptodate ocfs2: fix panic due to unrecovered local alloc mm/page-writeback.c: don't break integrity writeback on ->writepage() error mm/swap: use nr_node_ids for avail_lists in swap_info_struct userfaultfd: clear flag if remap event not enabled mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps iwlwifi: mvm: Send LQ command as async when necessary Bluetooth: Fix unnecessary error message for HCI request completion ipmi: fix use-after-free of user->release_barrier.rda ipmi: msghandler: Fix potential Spectre v1 vulnerabilities ipmi: Prevent use-after-free in deliver_response ipmi:ssif: Fix handling of multi-part return messages ipmi: Don't initialize anything in the core until something uses it Linux 4.19.18 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-01-26 11:58:37 +01:00
Peter Xu	2011eb7418	userfaultfd: clear flag if remap event not enabled [ Upstream commit `3cfd22be0a` ] When the process being tracked does mremap() without UFFD_FEATURE_EVENT_REMAP on the corresponding tracking uffd file handle, we should not generate the remap event, and at the same time we should clear all the uffd flags on the new VMA. Without this patch, we can still have the VM_UFFD_MISSING\|VM_UFFD_WP flags on the new VMA even the fault handling process does not even know the existance of the VMA. Link: http://lkml.kernel.org/r/20181211053409.20317-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Hugh Dickins <hughd@google.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Pravin Shedge <pravin.shedge4linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-26 09:32:43 +01:00
Junxiao Bi	5a404f39f8	ocfs2: fix panic due to unrecovered local alloc [ Upstream commit `532e1e54c8` ] mount.ocfs2 ignore the inconsistent error that journal is clean but local alloc is unrecovered. After mount, local alloc not empty, then reserver cluster didn't alloc a new local alloc window, reserveration map is empty(ocfs2_reservation_map.m_bitmap_len = 0), that triggered the following panic. This issue was reported at https://oss.oracle.com/pipermail/ocfs2-devel/2015-May/010854.html and was advised to fixed during mount. But this is a very unusual inconsistent state, usually journal dirty flag should be cleared at the last stage of umount until every other things go right. We may need do further debug to check that. Any way to avoid possible futher corruption, mount should be abort and fsck should be run. (mount.ocfs2,1765,1):ocfs2_load_local_alloc:353 ERROR: Local alloc hasn't been recovered! found = 6518, set = 6518, taken = 8192, off = 15912372 ocfs2: Mounting device (202,64) on (node 0, slot 3) with ordered data mode. o2dlm: Joining domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 8 ) 8 nodes ocfs2: Mounting device (202,80) on (node 0, slot 3) with ordered data mode. o2hb: Region 89CEAC63CC4F4D03AC185B44E0EE0F3F (xvdf) is now a quorum device o2net: Accepted connection from node yvwsoa17p (num 7) at 172.22.77.88:7777 o2dlm: Node 7 joins domain 64FE421C8C984E6D96ED12C55FEE2435 ( 0 1 2 3 4 5 6 7 8 ) 9 nodes o2dlm: Node 7 joins domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 7 8 ) 9 nodes ------------[ cut here ]------------ kernel BUG at fs/ocfs2/reservations.c:507! invalid opcode: 0000 [#1] SMP Modules linked in: ocfs2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ovmapi ppdev parport_pc parport xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea acpi_cpufreq pcspkr i2c_piix4 i2c_core sg ext4 jbd2 mbcache2 sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 4349 Comm: startWebLogic.s Not tainted 4.1.12-124.19.2.el6uek.x86_64 #2 Hardware name: Xen HVM domU, BIOS 4.4.4OVM 09/06/2018 task: ffff8803fb04e200 ti: ffff8800ea4d8000 task.ti: ffff8800ea4d8000 RIP: 0010:[<ffffffffa05e96a8>] [<ffffffffa05e96a8>] __ocfs2_resv_find_window+0x498/0x760 [ocfs2] Call Trace: ocfs2_resmap_resv_bits+0x10d/0x400 [ocfs2] ocfs2_claim_local_alloc_bits+0xd0/0x640 [ocfs2] __ocfs2_claim_clusters+0x178/0x360 [ocfs2] ocfs2_claim_clusters+0x1f/0x30 [ocfs2] ocfs2_convert_inline_data_to_extents+0x634/0xa60 [ocfs2] ocfs2_write_begin_nolock+0x1c6/0x1da0 [ocfs2] ocfs2_write_begin+0x13e/0x230 [ocfs2] generic_perform_write+0xbf/0x1c0 __generic_file_write_iter+0x19c/0x1d0 ocfs2_file_write_iter+0x589/0x1360 [ocfs2] __vfs_write+0xb8/0x110 vfs_write+0xa9/0x1b0 SyS_write+0x46/0xb0 system_call_fastpath+0x18/0xd7 Code: ff ff 8b 75 b8 39 75 b0 8b 45 c8 89 45 98 0f 84 e5 fe ff ff 45 8b 74 24 18 41 8b 54 24 1c e9 56 fc ff ff 85 c0 0f 85 48 ff ff ff <0f> 0b 48 8b 05 cf c3 de ff 48 ba 00 00 00 00 00 00 00 10 48 85 RIP __ocfs2_resv_find_window+0x498/0x760 [ocfs2] RSP <ffff8800ea4db668> ---[ end trace 566f07529f2edf3c ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled Link: http://lkml.kernel.org/r/20181121020023.3034-2-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-26 09:32:43 +01:00
Eric Sandeen	c9dcb871b1	iomap: don't search past page end in iomap_is_partially_uptodate [ Upstream commit `3cc31fa65d` ] iomap_is_partially_uptodate() is intended to check wither blocks within the selected range of a not-uptodate page are uptodate; if the range we care about is up to date, it's an optimization. However, the iomap implementation continues to check all blocks up to from+count, which is beyond the page, and can even be well beyond the iop->uptodate bitmap. I think the worst that will happen is that we may eventually find a zero bit and return "not partially uptodate" when it would have otherwise returned true, and skip the optimization. Still, it's clearly an invalid memory access that must be fixed. So: fix this by limiting the search to within the page as is done in the non-iomap variant, block_is_partially_uptodate(). Zorro noticed thiswhen KASAN went off for 512 byte blocks on a 64k page system: BUG: KASAN: slab-out-of-bounds in iomap_is_partially_uptodate+0x1a0/0x1e0 Read of size 8 at addr ffff800120c3a318 by task fsstress/22337 Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-26 09:32:43 +01:00
Javier Barrio	876b79b973	quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls. [ Upstream commit `41c4f85cda` ] Commit `1fa5efe362` (ext4: Use generic helpers for quotaon and quotaoff) made possible to call quotactl(Q_XQUOTAON/OFF) on ext4 filesystems with sysfile quota support. This leads to calling dquot_enable/disable without s_umount held in excl. mode, because quotactl_cmd_onoff checks only for Q_QUOTAON/OFF. The following WARN_ON_ONCE triggers (in this case for dquot_enable, ext4, latest Linus' tree): [ 117.807056] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: quota,prjquota [...] [ 155.036847] WARNING: CPU: 0 PID: 2343 at fs/quota/dquot.c:2469 dquot_enable+0x34/0xb9 [ 155.036851] Modules linked in: quota_v2 quota_tree ipv6 af_packet joydev mousedev psmouse serio_raw pcspkr i2c_piix4 intel_agp intel_gtt e1000 ttm drm_kms_helper drm agpgart fb_sys_fops syscopyarea sysfillrect sysimgblt i2c_core input_leds kvm_intel kvm irqbypass qemu_fw_cfg floppy evdev parport_pc parport button crc32c_generic dm_mod ata_generic pata_acpi ata_piix libata loop ext4 crc16 mbcache jbd2 usb_storage usbcore sd_mod scsi_mod [ 155.036901] CPU: 0 PID: 2343 Comm: qctl Not tainted 4.20.0-rc6-00025-gf5d582777bcb #9 [ 155.036903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 155.036911] RIP: 0010:dquot_enable+0x34/0xb9 [ 155.036915] Code: 41 56 41 55 41 54 55 53 4c 8b 6f 28 74 02 0f 0b 4d 8d 7d 70 49 89 fc 89 cb 41 89 d6 89 f5 4c 89 ff e8 23 09 ea ff 85 c0 74 0a <0f> 0b 4c 89 ff e8 8b 09 ea ff 85 db 74 6a 41 8b b5 f8 00 00 00 0f [ 155.036918] RSP: 0018:ffffb09b00493e08 EFLAGS: 00010202 [ 155.036922] RAX: 0000000000000001 RBX: 0000000000000008 RCX: 0000000000000008 [ 155.036924] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9781b67cd870 [ 155.036926] RBP: 0000000000000002 R08: 0000000000000000 R09: 61c8864680b583eb [ 155.036929] R10: ffffb09b00493e48 R11: ffffffffff7ce7d4 R12: ffff9781b7ee8d78 [ 155.036932] R13: ffff9781b67cd800 R14: 0000000000000004 R15: ffff9781b67cd870 [ 155.036936] FS: 00007fd813250b88(0000) GS:ffff9781ba000000(0000) knlGS:0000000000000000 [ 155.036939] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 155.036942] CR2: 00007fd812ff61d6 CR3: 000000007c882000 CR4: 00000000000006b0 [ 155.036951] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 155.036953] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 155.036955] Call Trace: [ 155.037004] dquot_quota_enable+0x8b/0xd0 [ 155.037011] kernel_quotactl+0x628/0x74e [ 155.037027] ? do_mprotect_pkey+0x2a6/0x2cd [ 155.037034] __x64_sys_quotactl+0x1a/0x1d [ 155.037041] do_syscall_64+0x55/0xe4 [ 155.037078] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 155.037105] RIP: 0033:0x7fd812fe1198 [ 155.037109] Code: 02 77 0d 48 89 c1 48 c1 e9 3f 75 04 48 8b 04 24 48 83 c4 50 5b c3 48 83 ec 08 49 89 ca 48 63 d2 48 63 ff b8 b3 00 00 00 0f 05 <48> 89 c7 e8 c1 eb ff ff 5a c3 48 63 ff b8 bb 00 00 00 0f 05 48 89 [ 155.037112] RSP: 002b:00007ffe8cd7b050 EFLAGS: 00000206 ORIG_RAX: 00000000000000b3 [ 155.037116] RAX: ffffffffffffffda RBX: 00007ffe8cd7b148 RCX: 00007fd812fe1198 [ 155.037119] RDX: 0000000000000000 RSI: 00007ffe8cd7cea9 RDI: 0000000000580102 [ 155.037121] RBP: 00007ffe8cd7b0f0 R08: 000055fc8eba8a9d R09: 0000000000000000 [ 155.037124] R10: 00007ffe8cd7b074 R11: 0000000000000206 R12: 00007ffe8cd7b168 [ 155.037126] R13: 000055fc8eba8897 R14: 0000000000000000 R15: 0000000000000000 [ 155.037131] ---[ end trace 210f864257175c51 ]--- and then the syscall proceeds without s_umount locking. This patch locks the superblock ->s_umount sem. in exclusive mode for all Q_XQUOTAON/OFF quotactls too in addition to Q_QUOTAON/OFF. AFAICT, other than ext4, only xfs and ocfs2 are affected by this change. The VFS will now call in xfs_quota_* functions with s_umount held, which wasn't the case before. This looks good to me but I can not say for sure. Ext4 and ocfs2 where already beeing called with s_umount exclusive via quota_quotaon/off which is basically the same. Signed-off-by: Javier Barrio <javier.barrio.mart@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-26 09:32:42 +01:00
Johannes Thumshirn	310f8296d6	btrfs: improve error handling of btrfs_add_link [ Upstream commit `1690dd41e0` ] In the error handling block, err holds the return value of either btrfs_del_root_ref() or btrfs_del_inode_ref() but it hasn't been checked since it's introduction with commit `fe66a05a06` (Btrfs: improve error handling for btrfs_insert_dir_item callers) in 2012. If the error handling in the error handling fails, there's not much left to do and the abort either happened earlier in the callees or is necessary here. So if one of btrfs_del_root_ref() or btrfs_del_inode_ref() failed, abort the transaction, but still return the original code of the failure stored in 'ret' as this will be reported to the user. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-26 09:32:39 +01:00
Anand Jain	38b17eee70	btrfs: fix use-after-free due to race between replace start and cancel [ Upstream commit `d189dd70e2` ] The device replace cancel thread can race with the replace start thread and if fs_info::scrubs_running is not yet set, btrfs_scrub_cancel() will fail to stop the scrub thread. The scrub thread continues with the scrub for replace which then will try to write to the target device and which is already freed by the cancel thread. scrub_setup_ctx() warns as tgtdev is NULL. struct scrub_ctx scrub_setup_ctx(struct btrfs_device dev, int is_dev_replace) { ... if (is_dev_replace) { WARN_ON(!fs_info->dev_replace.tgtdev); <=== sctx->pages_per_wr_bio = SCRUB_PAGES_PER_WR_BIO; sctx->wr_tgtdev = fs_info->dev_replace.tgtdev; sctx->flush_all_writes = false; } [ 6724.497655] BTRFS info (device sdb): dev_replace from /dev/sdb (devid 1) to /dev/sdc started [ 6753.945017] BTRFS info (device sdb): dev_replace from /dev/sdb (devid 1) to /dev/sdc canceled [ 6852.426700] WARNING: CPU: 0 PID: 4494 at fs/btrfs/scrub.c:622 scrub_setup_ctx.isra.19+0x220/0x230 [btrfs] ... [ 6852.428928] RIP: 0010:scrub_setup_ctx.isra.19+0x220/0x230 [btrfs] ... [ 6852.432970] Call Trace: [ 6852.433202] btrfs_scrub_dev+0x19b/0x5c0 [btrfs] [ 6852.433471] btrfs_dev_replace_start+0x48c/0x6a0 [btrfs] [ 6852.433800] btrfs_dev_replace_by_ioctl+0x3a/0x60 [btrfs] [ 6852.434097] btrfs_ioctl+0x2476/0x2d20 [btrfs] [ 6852.434365] ? do_sigaction+0x7d/0x1e0 [ 6852.434623] do_vfs_ioctl+0xa9/0x6c0 [ 6852.434865] ? syscall_trace_enter+0x1c8/0x310 [ 6852.435124] ? syscall_trace_enter+0x1c8/0x310 [ 6852.435387] ksys_ioctl+0x60/0x90 [ 6852.435663] __x64_sys_ioctl+0x16/0x20 [ 6852.435907] do_syscall_64+0x50/0x180 [ 6852.436150] entry_SYSCALL_64_after_hwframe+0x49/0xbe Further, as the replace thread enters scrub_write_page_to_dev_replace() without the target device it panics: static int scrub_add_page_to_wr_bio(struct scrub_ctx sctx, struct scrub_page spage) { ... bio_set_dev(bio, sbio->dev->bdev); <====== [ 6929.715145] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0 .. [ 6929.717106] Workqueue: btrfs-scrub btrfs_scrub_helper [btrfs] [ 6929.717420] RIP: 0010:scrub_write_page_to_dev_replace+0xb4/0x260 [btrfs] .. [ 6929.721430] Call Trace: [ 6929.721663] scrub_write_block_to_dev_replace+0x3f/0x60 [btrfs] [ 6929.721975] scrub_bio_end_io_worker+0x1af/0x490 [btrfs] [ 6929.722277] normal_work_helper+0xf0/0x4c0 [btrfs] [ 6929.722552] process_one_work+0x1f4/0x520 [ 6929.722805] ? process_one_work+0x16e/0x520 [ 6929.723063] worker_thread+0x46/0x3d0 [ 6929.723313] kthread+0xf8/0x130 [ 6929.723544] ? process_one_work+0x520/0x520 [ 6929.723800] ? kthread_delayed_work_timer_fn+0x80/0x80 [ 6929.724081] ret_from_fork+0x3a/0x50 Fix this by letting the btrfs_dev_replace_finishing() to do the job of cleaning after the cancel, including freeing of the target device. btrfs_dev_replace_finishing() is called when btrfs_scub_dev() returns along with the scrub return status. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-26 09:32:39 +01:00
Hans van Kranenburg	720b86a53a	btrfs: alloc_chunk: fix more DUP stripe size handling [ Upstream commit `baf92114c7` ] Commit `92e222df7b` "btrfs: alloc_chunk: fix DUP stripe size handling" fixed calculating the stripe_size for a new DUP chunk. However, the same calculation reappears a bit later, and that one was not changed yet. The resulting bug that is exposed is that the newly allocated device extents ('stripes') can have a few MiB overlap with the next thing stored after them, which is another device extent or the end of the disk. The scenario in which this can happen is: * The block device for the filesystem is less than 10GiB in size. * The amount of contiguous free unallocated disk space chosen to use for chunk allocation is 20% of the total device size, or a few MiB more or less. An example: - The filesystem device is 7880MiB (max_chunk_size gets set to 788MiB) - There's 1578MiB unallocated raw disk space left in one contiguous piece. In this case stripe_size is first calculated as 789MiB, (half of 1578MiB). Since 789MiB (stripe_size * data_stripes) > 788MiB (max_chunk_size), we enter the if block. Now stripe_size value is immediately overwritten while calculating an adjusted value based on max_chunk_size, which ends up as 788MiB. Next, the value is rounded up to a 16MiB boundary, 800MiB, which is actually more than the value we had before. However, the last comparison fails to detect this, because it's comparing the value with the total amount of free space, which is about twice the size of stripe_size. In the example above, this means that the resulting raw disk space being allocated is 1600MiB, while only a gap of 1578MiB has been found. The second device extent object for this DUP chunk will overlap for 22MiB with whatever comes next. The underlying problem here is that the stripe_size is reused all the time for different things. So, when entering the code in the if block, stripe_size is immediately overwritten with something else. If later we decide we want to have the previous value back, then the logic to compute it was copy pasted in again. With this change, the value in stripe_size is not unnecessarily destroyed, so the duplicated calculation is not needed any more. Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-26 09:32:39 +01:00
Qu Wenruo	bb5717a4a1	btrfs: volumes: Make sure there is no overlap of dev extents at mount time [ Upstream commit `5eb193812a` ] Enhance btrfs_verify_dev_extents() to remember previous checked dev extents, so it can verify no dev extents can overlap. Analysis from Hans: "Imagine allocating a DATA\|DUP chunk. In the chunk allocator, we first set... max_stripe_size = SZ_1G; max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE ... which is 10GiB. Then... /* we don't want a chunk larger than 10% of writeable space / max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1), max_chunk_size); Imagine we only have one 7880MiB block device in this filesystem. Now max_chunk_size is down to 788MiB. The next step in the code is to search for max_stripe_size dev_stripes amount of free space on the device, which is in our example 1GiB * 2 = 2GiB. Imagine the device has exactly 1578MiB free in one contiguous piece. This amount of bytes will be put in devices_info[ndevs - 1].max_avail Next we recalculate the stripe_size (which is actually the device extent length), based on the actual maximum amount of available raw disk space: stripe_size = div_u64(devices_info[ndevs - 1].max_avail, dev_stripes); stripe_size is now 789MiB Next we do... data_stripes = num_stripes / ncopies ...where data_stripes ends up as 1, because num_stripes is 2 (the amount of device extents we're going to have), and DUP has ncopies 2. Next there's a check... if (stripe_size * data_stripes > max_chunk_size) ...which matches because 789MiB * 1 > 788MiB. We go into the if code, and next is... stripe_size = div_u64(max_chunk_size, data_stripes); ...which resets stripe_size to max_chunk_size: 788MiB Next is a fun one... /* bump the answer up to a 16MB boundary / stripe_size = round_up(stripe_size, SZ_16M); ...which changes stripe_size from 788MiB to 800MiB. We're not done changing stripe_size yet... / But don't go higher than the limits we found while searching * for free extents */ stripe_size = min(devices_info[ndevs - 1].max_avail, stripe_size); This is bad. max_avail is twice the stripe_size (we need to fit 2 device extents on the same device for DUP). The result here is that 800MiB < 1578MiB, so it's unchanged. However, the resulting DUP chunk will need 1600MiB disk space, which isn't there, and the second dev_extent might extend into the next thing (next dev_extent? end of device?) for 22MiB. The last shown line of code relies on a situation where there's twice the value of stripe_size present as value for the variable stripe_size when it's DUP. This was actually the case before commit `92e222df7b` "btrfs: alloc_chunk: fix DUP stripe size handling", from which I quote: "[...] in the meantime there's a check to see if the stripe_size does not exceed max_chunk_size. Since during this check stripe_size is twice the amount as intended, the check will reduce the stripe_size to max_chunk_size if the actual correct to be used stripe_size is more than half the amount of max_chunk_size." In the previous version of the code, the 16MiB alignment (why is this done, by the way?) would result in a 50% chance that it would actually do an 8MiB alignment for the individual dev_extents, since it was operating on double the size. Does this matter? Does it matter that stripe_size can be set to anything which is not 16MiB aligned because of the amount of remaining available disk space which is just taken? What is the main purpose of this round_up? The most straightforward thing to do seems something like... stripe_size = min( div_u64(devices_info[ndevs - 1].max_avail, dev_stripes), stripe_size ) ..just putting half of the max_avail into stripe_size." Link: https://lore.kernel.org/linux-btrfs/b3461a38-e5f8-f41d-c67c-2efac8129054@mendix.com/ Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Signed-off-by: Qu Wenruo <wqu@suse.com> [ add analysis from report ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-26 09:32:39 +01:00
Joel Fernandes (Google)	265242d82a	pstore/ram: Do not treat empty buffers as valid [ Upstream commit `30696378f6` ] The ramoops backend currently calls persistent_ram_save_old() even if a buffer is empty. While this appears to work, it is does not seem like the right thing to do and could lead to future bugs so lets avoid that. It also prevents misleading prints in the logs which claim the buffer is valid. I got something like: found existing buffer, size 0, start 0 When I was expecting: no valid data in buffer (sig = ...) This bails out early (and reports with pr_debug()), since it's an acceptable state. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-26 09:32:37 +01:00
Daniel Santos	c356972f27	jffs2: Fix use of uninitialized delayed_work, lockdep breakage [ Upstream commit `a788c52727` ] jffs2_sync_fs makes the assumption that if CONFIG_JFFS2_FS_WRITEBUFFER is defined then a write buffer is available and has been initialized. However, this does is not the case when the mtd device has no out-of-band buffer: int jffs2_nand_flash_setup(struct jffs2_sb_info *c) { if (!c->mtd->oobsize) return 0; ... The resulting call to cancel_delayed_work_sync passing a uninitialized (but zeroed) delayed_work struct forces lockdep to become disabled. [ 90.050639] overlayfs: upper fs does not support tmpfile. [ 90.652264] INFO: trying to register non-static key. [ 90.662171] the code is fine but needs lockdep annotation. [ 90.673090] turning off the locking correctness validator. [ 90.684021] CPU: 0 PID: 1762 Comm: mount_root Not tainted 4.14.63 #0 [ 90.696672] Stack : 00000000 00000000 80d8f6a2 00000038 805f0000 80444600 8fe364f4 805dfbe7 [ 90.713349] 80563a30 000006e2 8068370c 00000001 00000000 00000001 8e2fdc48 ffffffff [ 90.730020] 00000000 00000000 80d90000 00000000 00000106 00000000 6465746e 312e3420 [ 90.746690] 6b636f6c 03bf0000 f8000000 20676e69 00000000 80000000 00000000 8e2c2a90 [ 90.763362] 80d90000 00000001 00000000 8e2c2a90 00000003 80260dc0 08052098 80680000 [ 90.780033] ... [ 90.784902] Call Trace: [ 90.789793] [<8000f0d8>] show_stack+0xb8/0x148 [ 90.798659] [<8005a000>] register_lock_class+0x270/0x55c [ 90.809247] [<8005cb64>] __lock_acquire+0x13c/0xf7c [ 90.818964] [<8005e314>] lock_acquire+0x194/0x1dc [ 90.828345] [<8003f27c>] flush_work+0x200/0x24c [ 90.837374] [<80041dfc>] __cancel_work_timer+0x158/0x210 [ 90.847958] [<801a8770>] jffs2_sync_fs+0x20/0x54 [ 90.857173] [<80125cf4>] iterate_supers+0xf4/0x120 [ 90.866729] [<80158fc4>] sys_sync+0x44/0x9c [ 90.875067] [<80014424>] syscall_common+0x34/0x58 Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Reviewed-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-01-26 09:32:37 +01:00
Greg Kroah-Hartman	73dc755ee0	Merge 4.19.17 into android-4.19 Changes in 4.19.17 tty/ldsem: Wake up readers after timed out down_write() tty: Hold tty_ldisc_lock() during tty_reopen() tty: Simplify tty->count math in tty_reopen() tty: Don't hold ldisc lock in tty_reopen() if ldisc present can: gw: ensure DLC boundaries after CAN frame modification netfilter: nf_conncount: replace CONNCOUNT_LOCK_SLOTS with CONNCOUNT_SLOTS netfilter: nf_conncount: don't skip eviction when age is negative netfilter: nf_conncount: split gc in two phases netfilter: nf_conncount: restart search when nodes have been erased netfilter: nf_conncount: merge lookup and add functions netfilter: nf_conncount: move all list iterations under spinlock netfilter: nf_conncount: speculative garbage collection on empty lists netfilter: nf_conncount: fix argument order to find_next_bit mmc: sdhci-msm: Disable CDR function on TX Revert "scsi: target: iscsi: cxgbit: fix csk leak" scsi: target: iscsi: cxgbit: fix csk leak scsi: target: iscsi: cxgbit: fix csk leak arm64/kvm: consistently handle host HCR_EL2 flags arm64: Don't trap host pointer auth use to EL2 ipv6: fix kernel-infoleak in ipv6_local_error() net: bridge: fix a bug on using a neighbour cache entry without checking its state packet: Do not leak dev refcounts on error exit tcp: change txhash on SYN-data timeout tun: publish tfile after it's fully initialized lan743x: Remove phy_read from link status change function smc: move unhash as early as possible in smc_release() r8169: don't try to read counters if chip is in a PCI power-save state bonding: update nest level on unlink ip: on queued skb use skb_header_pointer instead of pskb_may_pull r8169: load Realtek PHY driver module before r8169 crypto: sm3 - fix undefined shift by >= width of value crypto: caam - fix zero-length buffer DMA mapping crypto: authencesn - Avoid twice completion call in decrypt path crypto: ccree - convert to use crypto_authenc_extractkeys() crypto: bcm - convert to use crypto_authenc_extractkeys() crypto: authenc - fix parsing key with misaligned rta_len crypto: talitos - reorder code in talitos_edesc_alloc() crypto: talitos - fix ablkcipher for CONFIG_VMAP_STACK xen: Fix x86 sched_clock() interface for xen Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io" btrfs: wait on ordered extents on abort cleanup Yama: Check for pid death before checking ancestry scsi: core: Synchronize request queue PM status only on successful resume scsi: sd: Fix cache_type_store() mips: fix n32 compat_ipc_parse_version MIPS: BCM47XX: Setup struct device for the SoC MIPS: lantiq: Fix IPI interrupt handling drm/i915/gvt: Fix mmap range check OF: properties: add missing of_node_put mfd: tps6586x: Handle interrupts on suspend media: v4l: ioctl: Validate num_planes for debug messages RDMA/nldev: Don't expose unsafe global rkey to regular user RDMA/vmw_pvrdma: Return the correct opcode when creating WR kbuild: Disable LD_DEAD_CODE_DATA_ELIMINATION with ftrace & GCC <= 4.7 net: dsa: realtek-smi: fix OF child-node lookup pstore/ram: Avoid allocation and leak of platform data arm64: kaslr: ensure randomized quantities are clean to the PoC arm64: dts: marvell: armada-ap806: reserve PSCI area Disable MSI also when pcie-octeon.pcie_disable on fix int_sqrt64() for very large numbers omap2fb: Fix stack memory disclosure media: vivid: fix error handling of kthread_run media: vivid: set min width/height to a value > 0 bpf: in __bpf_redirect_no_mac pull mac only if present ipv6: make icmp6_send() robust against null skb->dev LSM: Check for NULL cred-security on free media: vb2: vb2_mmap: move lock up sunrpc: handle ENOMEM in rpcb_getport_async netfilter: ebtables: account ebt_table_info to kmemcg block: use rcu_work instead of call_rcu to avoid sleep in softirq selinux: fix GPF on invalid policy blockdev: Fix livelocks on loop device sctp: allocate sctp_sockaddr_entry with kzalloc tipc: fix uninit-value in in tipc_conn_rcv_sub tipc: fix uninit-value in tipc_nl_compat_link_reset_stats tipc: fix uninit-value in tipc_nl_compat_bearer_enable tipc: fix uninit-value in tipc_nl_compat_link_set tipc: fix uninit-value in tipc_nl_compat_name_table_dump tipc: fix uninit-value in tipc_nl_compat_doit block/loop: Don't grab "struct file" for vfs_getattr() operation. block/loop: Use global lock for ioctl() operation. loop: Fold __loop_release into loop_release loop: Get rid of loop_index_mutex loop: Push lo_ctl_mutex down into individual ioctls loop: Split setting of lo_state from loop_clr_fd loop: Push loop_ctl_mutex down into loop_clr_fd() loop: Push loop_ctl_mutex down to loop_get_status() loop: Push loop_ctl_mutex down to loop_set_status() loop: Push loop_ctl_mutex down to loop_set_fd() loop: Push loop_ctl_mutex down to loop_change_fd() loop: Move special partition reread handling in loop_clr_fd() loop: Move loop_reread_partitions() out of loop_ctl_mutex loop: Fix deadlock when calling blkdev_reread_part() loop: Avoid circular locking dependency between loop_ctl_mutex and bd_mutex loop: Get rid of 'nested' acquisition of loop_ctl_mutex loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl() loop: drop caches if offset or block_size are changed drm/fb-helper: Ignore the value of fb_var_screeninfo.pixclock selftests: Fix test errors related to lib.mk khdr target media: vb2: be sure to unlock mutex on errors nbd: Use set_blocksize() to set device blocksize Linux 4.19.17 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-01-23 08:46:58 +01:00
Jan Kara	1e11b1d630	blockdev: Fix livelocks on loop device commit `04906b2f54` upstream. bd_set_size() updates also block device's block size. This is somewhat unexpected from its name and at this point, only blkdev_open() uses this functionality. Furthermore, this can result in changing block size under a filesystem mounted on a loop device which leads to livelocks inside __getblk_gfp() like: Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106 ... Call Trace: init_page_buffers+0x3e2/0x530 fs/buffer.c:904 grow_dev_page fs/buffer.c:947 [inline] grow_buffers fs/buffer.c:1009 [inline] __getblk_slow fs/buffer.c:1036 [inline] __getblk_gfp+0x906/0xb10 fs/buffer.c:1313 __bread_gfp+0x2d/0x310 fs/buffer.c:1347 sb_bread include/linux/buffer_head.h:307 [inline] fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75 fat_ent_read_block fs/fat/fatent.c:441 [inline] fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489 fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101 __fat_get_block fs/fat/inode.c:148 [inline] ... Trivial reproducer for the problem looks like: truncate -s 1G /tmp/image losetup /dev/loop0 /tmp/image mkfs.ext4 -b 1024 /dev/loop0 mount -t ext4 /dev/loop0 /mnt losetup -c /dev/loop0 l /mnt Fix the problem by moving initialization of a block device block size into a separate function and call it when needed. Thanks to Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> for help with debugging the problem. Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-22 21:40:36 +01:00
Kees Cook	483ac8e65a	pstore/ram: Avoid allocation and leak of platform data commit `5631e8576a` upstream. Yue Hu noticed that when parsing device tree the allocated platform data was never freed. Since it's not used beyond the function scope, this switches to using a stack variable instead. Reported-by: Yue Hu <huyue2@yulong.com> Fixes: `35da60941e` ("pstore/ram: add Device Tree bindings") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-22 21:40:34 +01:00
Josef Bacik	01634ac563	btrfs: wait on ordered extents on abort cleanup commit `74d5d229b1` upstream. If we flip read-only before we initiate writeback on all dirty pages for ordered extents we've created then we'll have ordered extents left over on umount, which results in all sorts of bad things happening. Fix this by making sure we wait on ordered extents if we have to do the aborted transaction cleanup stuff. generic/475 can produce this warning: [ 8531.177332] WARNING: CPU: 2 PID: 11997 at fs/btrfs/disk-io.c:3856 btrfs_free_fs_root+0x95/0xa0 [btrfs] [ 8531.183282] CPU: 2 PID: 11997 Comm: umount Tainted: G W 5.0.0-rc1-default+ #394 [ 8531.185164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014 [ 8531.187851] RIP: 0010:btrfs_free_fs_root+0x95/0xa0 [btrfs] [ 8531.193082] RSP: 0018:ffffb1ab86163d98 EFLAGS: 00010286 [ 8531.194198] RAX: ffff9f3449494d18 RBX: ffff9f34a2695000 RCX:0000000000000000 [ 8531.195629] RDX: 0000000000000002 RSI: 0000000000000001 RDI:0000000000000000 [ 8531.197315] RBP: ffff9f344e930000 R08: 0000000000000001 R09:0000000000000000 [ 8531.199095] R10: 0000000000000000 R11: ffff9f34494d4ff8 R12:ffffb1ab86163dc0 [ 8531.200870] R13: ffff9f344e9300b0 R14: ffffb1ab86163db8 R15:0000000000000000 [ 8531.202707] FS: 00007fc68e949fc0(0000) GS:ffff9f34bd800000(0000)knlGS:0000000000000000 [ 8531.204851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8531.205942] CR2: 00007ffde8114dd8 CR3: 000000002dfbd000 CR4:00000000000006e0 [ 8531.207516] Call Trace: [ 8531.208175] btrfs_free_fs_roots+0xdb/0x170 [btrfs] [ 8531.210209] ? wait_for_completion+0x5b/0x190 [ 8531.211303] close_ctree+0x157/0x350 [btrfs] [ 8531.212412] generic_shutdown_super+0x64/0x100 [ 8531.213485] kill_anon_super+0x14/0x30 [ 8531.214430] btrfs_kill_super+0x12/0xa0 [btrfs] [ 8531.215539] deactivate_locked_super+0x29/0x60 [ 8531.216633] cleanup_mnt+0x3b/0x70 [ 8531.217497] task_work_run+0x98/0xc0 [ 8531.218397] exit_to_usermode_loop+0x83/0x90 [ 8531.219324] do_syscall_64+0x15b/0x180 [ 8531.220192] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 8531.221286] RIP: 0033:0x7fc68e5e4d07 [ 8531.225621] RSP: 002b:00007ffde8116608 EFLAGS: 00000246 ORIG_RAX:00000000000000a6 [ 8531.227512] RAX: 0000000000000000 RBX: 00005580c2175970 RCX:00007fc68e5e4d07 [ 8531.229098] RDX: 0000000000000001 RSI: 0000000000000000 RDI:00005580c2175b80 [ 8531.230730] RBP: 0000000000000000 R08: 00005580c2175ba0 R09:00007ffde8114e80 [ 8531.232269] R10: 0000000000000000 R11: 0000000000000246 R12:00005580c2175b80 [ 8531.233839] R13: 00007fc68eac61c4 R14: 00005580c2175a68 R15:0000000000000000 Leaving a tree in the rb-tree: 3853 void btrfs_free_fs_root(struct btrfs_root *root) 3854 { 3855 iput(root->ino_cache_inode); 3856 WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree)); CC: stable@vger.kernel.org Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> [ add stacktrace ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-22 21:40:32 +01:00
David Sterba	4675f90ef8	Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io" commit `77b7aad195` upstream. This reverts commit `e73e81b6d0`. This patch causes a few problems: - adds latency to btrfs_finish_ordered_io - as btrfs_finish_ordered_io is used for free space cache, generating more work from btrfs_btree_balance_dirty_nodelay could end up in the same workque, effectively deadlocking 12260 kworker/u96:16+btrfs-freespace-write D [<0>] balance_dirty_pages+0x6e6/0x7ad [<0>] balance_dirty_pages_ratelimited+0x6bb/0xa90 [<0>] btrfs_finish_ordered_io+0x3da/0x770 [<0>] normal_work_helper+0x1c5/0x5a0 [<0>] process_one_work+0x1ee/0x5a0 [<0>] worker_thread+0x46/0x3d0 [<0>] kthread+0xf5/0x130 [<0>] ret_from_fork+0x24/0x30 [<0>] 0xffffffffffffffff Transaction commit will wait on the freespace cache: 838 btrfs-transacti D [<0>] btrfs_start_ordered_extent+0x154/0x1e0 [<0>] btrfs_wait_ordered_range+0xbd/0x110 [<0>] __btrfs_wait_cache_io+0x49/0x1a0 [<0>] btrfs_write_dirty_block_groups+0x10b/0x3b0 [<0>] commit_cowonly_roots+0x215/0x2b0 [<0>] btrfs_commit_transaction+0x37e/0x910 [<0>] transaction_kthread+0x14d/0x180 [<0>] kthread+0xf5/0x130 [<0>] ret_from_fork+0x24/0x30 [<0>] 0xffffffffffffffff And then writepages ends up waiting on transaction commit: 9520 kworker/u96:13+flush-btrfs-1 D [<0>] wait_current_trans+0xac/0xe0 [<0>] start_transaction+0x21b/0x4b0 [<0>] cow_file_range_inline+0x10b/0x6b0 [<0>] cow_file_range.isra.69+0x329/0x4a0 [<0>] run_delalloc_range+0x105/0x3c0 [<0>] writepage_delalloc+0x119/0x180 [<0>] __extent_writepage+0x10c/0x390 [<0>] extent_write_cache_pages+0x26f/0x3d0 [<0>] extent_writepages+0x4f/0x80 [<0>] do_writepages+0x17/0x60 [<0>] __writeback_single_inode+0x59/0x690 [<0>] writeback_sb_inodes+0x291/0x4e0 [<0>] __writeback_inodes_wb+0x87/0xb0 [<0>] wb_writeback+0x3bb/0x500 [<0>] wb_workfn+0x40d/0x610 [<0>] process_one_work+0x1ee/0x5a0 [<0>] worker_thread+0x1e0/0x3d0 [<0>] kthread+0xf5/0x130 [<0>] ret_from_fork+0x24/0x30 [<0>] 0xffffffffffffffff Eventually, we have every process in the system waiting on balance_dirty_pages(), and nobody is able to make progress on page writeback. The original patch tried to fix an OOM condition, that happened on 4.4 but no success reproducing that on later kernels (4.19 and 4.20). This is more likely a problem in OOM itself. Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/ Reported-by: Chris Mason <clm@fb.com> CC: stable@vger.kernel.org # 4.18+ CC: ethanlien <ethanlien@synology.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-22 21:40:32 +01:00
Daniel Rosenberg	138993ea82	ANDROID: mnt: Propagate remount correctly This switches over to propagation_next to respect namepsace semantics. Test: Remounting to change the options of a fs with mount based options should propagate to all shared copies of that mount, and the slaves/indirect slaves of those. Bug: 122428178 Signed-off-by: Daniel Rosenberg <drosen@google.com> Change-Id: Ic35cd2782a646435689f5bedfa1f218fe4ab8254	2019-01-19 01:25:07 +00:00
Greg Kroah-Hartman	976f78d572	Merge 4.19.16 into android-4.19 Changes in 4.19.16 Btrfs: fix deadlock when using free space tree due to block group creation staging: rtl8188eu: Fix module loading from tasklet for CCMP encryption staging: rtl8188eu: Fix module loading from tasklet for WEP encryption cpufreq: scmi: Fix frequency invariance in slow path x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE ALSA: hda/realtek - Support Dell headset mode for New AIO platform ALSA: hda/realtek - Add unplug function into unplug state of Headset Mode for ALC225 ALSA: hda/realtek - Disable headset Mic VREF for headset mode of ALC225 CIFS: Fix adjustment of credits for MTU requests CIFS: Do not set credits to 1 if the server didn't grant anything CIFS: Do not hide EINTR after sending network packets CIFS: Fix credit computation for compounded requests cifs: Fix potential OOB access of lock element array usb: cdc-acm: send ZLP for Telit 3G Intel based modems USB: storage: don't insert sane sense for SPC3+ when bad sense specified USB: storage: add quirk for SMI SM3350 USB: Add USB_QUIRK_DELAY_CTRL_MSG quirk for Corsair K70 RGB slab: alien caches must not be initialized if the allocation of the alien cache failed mm/usercopy.c: no check page span for stack objects mm, memcg: fix reclaim deadlock with writeback ACPI: power: Skip duplicate power resource references in _PRx ACPI / PMIC: xpower: Fix TS-pin current-source handling ACPI/IORT: Fix rc_dma_get_range() i2c: dev: prevent adapter retries and timeout being set as minus value mtd: rawnand: qcom: fix memory corruption that causes panic vfio/type1: Fix unmap overflow off-by-one drm/amdgpu: Add new VegaM pci id PCI: dwc: Use interrupt masking instead of disabling PCI: dwc: Take lock when ACKing an interrupt PCI: dwc: Move interrupt acking into the proper callback drm/amd/display: Fix MST dp_blank REG_WAIT timeout drm/fb_helper: Allow leaking fbdev smem_start drm/fb-helper: Partially bring back workaround for bugs of SDL 1.2 drm/i915: Unwind failure on pinning the gen7 ppgtt drm/amdgpu: Don't ignore rc from drm_dp_mst_topology_mgr_resume() drm/amdgpu: Don't fail resume process if resuming atomic state fails rbd: don't return 0 on unmap if RBD_DEV_FLAG_REMOVING is set ext4: make sure enough credits are reserved for dioread_nolock writes ext4: fix a potential fiemap/page fault deadlock w/ inline_data ext4: avoid kernel warning when writing the superblock to a dead device ext4: use ext4_write_inode() when fsyncing w/o a journal ext4: track writeback errors using the generic tracking infrastructure ext4: fix special inode number checks in __ext4_iget() mm: page_mapped: don't assume compound page is huge or THP sunrpc: use-after-free in svc_process_common() KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less arm64: compat: Don't pull syscall number from regs in arm_compat_syscall Btrfs: fix access to available allocation bits when starting balance Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation Btrfs: use nofs context when initializing security xattrs to avoid deadlock Linux 4.19.16 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-01-16 22:17:03 +01:00
Filipe Manana	7a1b9b76ba	Btrfs: use nofs context when initializing security xattrs to avoid deadlock commit `827aa18e7b` upstream. When initializing the security xattrs, we are holding a transaction handle therefore we need to use a GFP_NOFS context in order to avoid a deadlock with reclaim in case it's triggered. Fixes: `39a27ec100` ("btrfs: use GFP_KERNEL for xattr and acl allocations") Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-16 22:04:37 +01:00
Filipe Manana	79aa5c0daa	Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation commit `9a6f209e36` upstream. If the quota enable and snapshot creation ioctls are called concurrently we can get into a deadlock where the task enabling quotas will deadlock on the fs_info->qgroup_ioctl_lock mutex because it attempts to lock it twice, or the task creating a snapshot tries to commit the transaction while the task enabling quota waits for the former task to commit the transaction while holding the mutex. The following time diagrams show how both cases happen. First scenario: CPU 0 CPU 1 btrfs_ioctl() btrfs_ioctl_quota_ctl() btrfs_quota_enable() mutex_lock(fs_info->qgroup_ioctl_lock) btrfs_start_transaction() btrfs_ioctl() btrfs_ioctl_snap_create_v2 create_snapshot() --> adds snapshot to the list pending_snapshots of the current transaction btrfs_commit_transaction() create_pending_snapshots() create_pending_snapshot() qgroup_account_snapshot() btrfs_qgroup_inherit() mutex_lock(fs_info->qgroup_ioctl_lock) --> deadlock, mutex already locked by this task at btrfs_quota_enable() Second scenario: CPU 0 CPU 1 btrfs_ioctl() btrfs_ioctl_quota_ctl() btrfs_quota_enable() mutex_lock(fs_info->qgroup_ioctl_lock) btrfs_start_transaction() btrfs_ioctl() btrfs_ioctl_snap_create_v2 create_snapshot() --> adds snapshot to the list pending_snapshots of the current transaction btrfs_commit_transaction() --> waits for task at CPU 0 to release its transaction handle btrfs_commit_transaction() --> sees another task started the transaction commit first --> releases its transaction handle --> waits for the transaction commit to be completed by the task at CPU 1 create_pending_snapshot() qgroup_account_snapshot() btrfs_qgroup_inherit() mutex_lock(fs_info->qgroup_ioctl_lock) --> deadlock, task at CPU 0 has the mutex locked but it is waiting for us to finish the transaction commit So fix this by setting the quota enabled flag in fs_info after committing the transaction at btrfs_quota_enable(). This ends up serializing quota enable and snapshot creation as if the snapshot creation happened just before the quota enable request. The quota rescan task, scheduled after committing the transaction in btrfs_quote_enable(), will do the accounting. Fixes: `6426c7ad69` ("btrfs: qgroup: Fix qgroup accounting when creating snapshot") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-16 22:04:37 +01:00
Filipe Manana	829431a2a5	Btrfs: fix access to available allocation bits when starting balance commit `5a8067c0d1` upstream. The available allocation bits members from struct btrfs_fs_info are protected by a sequence lock, and when starting balance we access them incorrectly in two different ways: 1) In the read sequence lock loop at btrfs_balance() we use the values we read from fs_info->avail_*_alloc_bits and we can immediately do actions that have side effects and can not be undone (printing a message and jumping to a label). This is wrong because a retry might be needed, so our actions must not have side effects and must be repeatable as long as read_seqretry() returns a non-zero value. In other words, we were essentially ignoring the sequence lock; 2) Right below the read sequence lock loop, we were reading the values from avail_metadata_alloc_bits and avail_data_alloc_bits without any protection from concurrent writers, that is, reading them outside of the read sequence lock critical section. So fix this by making sure we only read the available allocation bits while in a read sequence lock critical section and that what we do in the critical section is repeatable (has nothing that can not be undone) so that any eventual retry that is needed is handled properly. Fixes: `de98ced9e7` ("Btrfs: use seqlock to protect fs_info->avail_{data, metadata, system}_alloc_bits") Fixes: `1450612797` ("btrfs: fix a bogus warning when converting only data or metadata") Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-16 22:04:37 +01:00
Theodore Ts'o	5dc41af3d1	ext4: fix special inode number checks in __ext4_iget() commit `191ce17876` upstream. The check for special (reserved) inode number checks in __ext4_iget() was broken by commit `8a363970d1`: ("ext4: avoid declaring fs inconsistent due to invalid file handles"). This was caused by a botched reversal of the sense of the flag now known as EXT4_IGET_SPECIAL (when it was previously named EXT4_IGET_NORMAL). Fix the logic appropriately. Fixes: `8a363970d1` ("ext4: avoid declaring fs inconsistent...") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-16 22:04:36 +01:00
Theodore Ts'o	bb80ad0dc3	ext4: track writeback errors using the generic tracking infrastructure commit `95cb671387` upstream. We already using mapping_set_error() in fs/ext4/page_io.c, so all we need to do is to use file_check_and_advance_wb_err() when handling fsync() requests in ext4_sync_file(). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-16 22:04:36 +01:00
Theodore Ts'o	da38a1b47b	ext4: use ext4_write_inode() when fsyncing w/o a journal commit `ad211f3e94` upstream. In no-journal mode, we previously used __generic_file_fsync() in no-journal mode. This triggers a lockdep warning, and in addition, it's not safe to depend on the inode writeback mechanism in the case ext4. We can solve both problems by calling ext4_write_inode() directly. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-16 22:04:36 +01:00
Theodore Ts'o	01db6e5cf8	ext4: avoid kernel warning when writing the superblock to a dead device commit `e86807862e` upstream. The xfstests generic/475 test switches the underlying device with dm-error while running a stress test. This results in a large number of file system errors, and since we can't lock the buffer head when marking the superblock dirty in the ext4_grp_locked_error() case, it's possible the superblock to be !buffer_uptodate() without buffer_write_io_error() being true. We need to set buffer_uptodate() before we call mark_buffer_dirty() or this will trigger a WARN_ON. It's safe to do this since the superblock must have been properly read into memory or the mount would have been successful. So if buffer_uptodate() is not set, we can safely assume that this happened due to a failed attempt to write the superblock. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-16 22:04:36 +01:00
Theodore Ts'o	926cdac104	ext4: fix a potential fiemap/page fault deadlock w/ inline_data commit `2b08b1f12c` upstream. The ext4_inline_data_fiemap() function calls fiemap_fill_next_extent() while still holding the xattr semaphore. This is not necessary and it triggers a circular lockdep warning. This is because fiemap_fill_next_extent() could trigger a page fault when it writes into page which triggers a page fault. If that page is mmaped from the inline file in question, this could very well result in a deadlock. This problem can be reproduced using generic/519 with a file system configuration which has the inline_data feature enabled. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-16 22:04:36 +01:00
Theodore Ts'o	7c2ea25e13	ext4: make sure enough credits are reserved for dioread_nolock writes commit `812c0cab2c` upstream. There are enough credits reserved for most dioread_nolock writes; however, if the extent tree is sufficiently deep, and/or quota is enabled, the code was not allowing for all eventualities when reserving journal credits for the unwritten extent conversion. This problem can be seen using xfstests ext4/034: WARNING: CPU: 1 PID: 257 at fs/ext4/ext4_jbd2.c:271 __ext4_handle_dirty_metadata+0x10c/0x180 Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work RIP: 0010:__ext4_handle_dirty_metadata+0x10c/0x180 ... EXT4-fs: ext4_free_blocks:4938: aborting transaction: error 28 in __ext4_handle_dirty_metadata EXT4: jbd2_journal_dirty_metadata failed: handle type 11 started at line 4921, credits 4/0, errcode -28 EXT4-fs error (device dm-1) in ext4_free_blocks:4950: error 28 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-16 22:04:36 +01:00
Ross Lagerwall	2a71a47e03	cifs: Fix potential OOB access of lock element array commit `b9a74cde94` upstream. If maxBuf is small but non-zero, it could result in a zero sized lock element array which we would then try and access OOB. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-16 22:04:31 +01:00
Pavel Shilovsky	7dcc5b36ea	CIFS: Fix credit computation for compounded requests commit `8544f4aa9d` upstream. In SMB3 protocol every part of the compound chain consumes credits individually, so we need to call wait_for_free_credits() for each of the PDUs in the chain. If an operation is interrupted, we must ensure we return all credits taken from the server structure back. Without this patch server can sometimes disconnect the session due to credit mismatches, especially when first operation(s) are large writes. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-16 22:04:31 +01:00
Pavel Shilovsky	d2f76f6f9f	CIFS: Do not hide EINTR after sending network packets commit `ee13919c2e` upstream. Currently we hide EINTR code returned from sock_sendmsg() and return 0 instead. This makes a caller think that we successfully completed the network operation which is not true. Fix this by properly returning EINTR to callers. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-01-16 22:04:30 +01:00

1 2 3 4 5 ...

56110 Commits