The reserved root blocks is not enough for booting Android due to
the limit of 0.2% if the fs size too small. so we add a new mini-
mum limit is 128MB.
Change-Id: I5af3b182001d27e4d18b4090c5270bbb2ac6253b
Signed-off-by: Cliff Chen <cliff.chen@rock-chips.com>
The f_blocks of statfs include file system overhead,it is not normal
usage of Posix.
Change-Id: If481626b08c05290626938586e2dc721690f1a91
Signed-off-by: Cliff Chen <cliff.chen@rock-chips.com>
Changes in 4.19.20
Fix "net: ipv4: do not handle duplicate fragments as overlapping"
drm/msm/gpu: fix building without debugfs
ipv6: Consider sk_bound_dev_if when binding a socket to an address
ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation
ipvlan, l3mdev: fix broken l3s mode wrt local routes
l2tp: copy 4 more bytes to linear part if necessary
l2tp: fix reading optional fields of L2TPv3
net: ip_gre: always reports o_key to userspace
net: ip_gre: use erspan key field for tunnel lookup
net/mlx4_core: Add masking for a few queries on HCA caps
netrom: switch to sock timer API
net/rose: fix NULL ax25_cb kernel panic
net: set default network namespace in init_dummy_netdev()
ravb: expand rx descriptor data to accommodate hw checksum
sctp: improve the events for sctp stream reset
tun: move the call to tun_set_real_num_queues
ucc_geth: Reset BQL queue when stopping device
vhost: fix OOB in get_rx_bufs()
net: ip6_gre: always reports o_key to userspace
sctp: improve the events for sctp stream adding
net/mlx5e: Allow MAC invalidation while spoofchk is ON
ip6mr: Fix notifiers call on mroute_clean_tables()
Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"
sctp: set chunk transport correctly when it's a new asoc
sctp: set flow sport from saddr only when it's 0
virtio_net: Don't enable NAPI when interface is down
virtio_net: Don't call free_old_xmit_skbs for xdp_frames
virtio_net: Fix not restoring real_num_rx_queues
virtio_net: Fix out of bounds access of sq
virtio_net: Don't process redirected XDP frames when XDP is disabled
virtio_net: Use xdp_return_frame to free xdp_frames on destroying vqs
virtio_net: Differentiate sk_buff and xdp_frame on freeing
CIFS: Do not count -ENODATA as failure for query directory
CIFS: Fix trace command logging for SMB2 reads and writes
CIFS: Do not consider -ENODATA as stat failure for reads
fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()
iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions()
selftests/seccomp: Enhance per-arch ptrace syscall skip tests
NFS: Fix up return value on fatal errors in nfs_page_async_flush()
ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment
arm64: kaslr: ensure randomized quantities are clean also when kaslr is off
arm64: Do not issue IPIs for user executable ptes
arm64: hyp-stub: Forbid kprobing of the hyp-stub
arm64: hibernate: Clean the __hyp_text to PoC after resume
gpio: altera-a10sr: Set proper output level for direction_output
gpiolib: fix line event timestamps for nested irqs
gpio: pcf857x: Fix interrupts on multiple instances
gpio: sprd: Fix the incorrect data register
gpio: sprd: Fix incorrect irq type setting for the async EIC
gfs2: Revert "Fix loop in gfs2_rbm_find"
mmc: bcm2835: Fix DMA channel leak on probe error
mmc: mediatek: fix incorrect register setting of hs400_cmd_int_delay
ALSA: usb-audio: Add Opus #3 to quirks for native DSD support
ALSA: hda/realtek - Fixed hp_pin no value
IB/hfi1: Remove overly conservative VM_EXEC flag check
platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK
platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes
mmc: sdhci-iproc: handle mmc_of_parse() errors during probe
Btrfs: fix deadlock when allocating tree block during leaf/node split
btrfs: On error always free subvol_name in btrfs_mount
kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT
oom, oom_reaper: do not enqueue same task twice
mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
mm, oom: fix use-after-free in oom_kill_process
mm: hwpoison: use do_send_sig_info() instead of force_sig()
mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
of: Convert to using %pOFn instead of device_node.name
of: overlay: add tests to validate kfrees from overlay removal
of: overlay: add missing of_node_get() in __of_attach_node_sysfs
of: overlay: use prop add changeset entry for property in new nodes
of: overlay: do not duplicate properties from overlay for new nodes
md/raid5: fix 'out of memory' during raid cache recovery
cifs: Always resolve hostname before reconnecting
Linux 4.19.20
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 532b618bdf upstream.
The subvol_name is allocated in btrfs_parse_subvol_options and is
consumed and freed in mount_subvol. Add a free to the error paths that
don't call mount_subvol so that it is guaranteed that subvol_name is
freed when an error happens.
Fixes: 312c89fbca ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()")
Cc: stable@vger.kernel.org # v4.19+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a627947076 upstream.
When splitting a leaf or node from one of the trees that are modified when
flushing pending block groups (extent, chunk, device and free space trees),
we need to allocate a new tree block, which in turn can result in the need
to allocate a new block group. After allocating the new block group we may
need to flush new block groups that were previously allocated during the
course of the current transaction, which is what may cause a deadlock due
to attempts to write lock twice the same leaf or node, as when splitting
a leaf or node we are holding a write lock on it and its parent node.
The same type of deadlock can also happen when increasing the tree's
height, since we are holding a lock on the existing root while allocating
the tree block to use as the new root node.
An example trace when the deadlock happens during the leaf split path is:
[27175.293054] CPU: 0 PID: 3005 Comm: kworker/u17:6 Tainted: G W 4.19.16 #1
[27175.293942] Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018
[27175.294846] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
(...)
[27175.298384] RSP: 0018:ffffab2087107758 EFLAGS: 00010246
[27175.299269] RAX: 0000000000000bbd RBX: ffff9fadc7141c48 RCX: 0000000000000001
[27175.300155] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9fadc7141c48
[27175.301023] RBP: 0000000000000001 R08: ffff9faeb6ac1040 R09: ffff9fa9c0000000
[27175.301887] R10: 0000000000000000 R11: 0000000000000040 R12: ffff9fb21aac8000
[27175.302743] R13: ffff9fb1a64d6a20 R14: 0000000000000001 R15: ffff9fb1a64d6a18
[27175.303601] FS: 0000000000000000(0000) GS:ffff9fb21fa00000(0000) knlGS:0000000000000000
[27175.304468] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[27175.305339] CR2: 00007fdc8743ead8 CR3: 0000000763e0a006 CR4: 00000000003606f0
[27175.306220] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[27175.307087] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[27175.307940] Call Trace:
[27175.308802] btrfs_search_slot+0x779/0x9a0 [btrfs]
[27175.309669] ? update_space_info+0xba/0xe0 [btrfs]
[27175.310534] btrfs_insert_empty_items+0x67/0xc0 [btrfs]
[27175.311397] btrfs_insert_item+0x60/0xd0 [btrfs]
[27175.312253] btrfs_create_pending_block_groups+0xee/0x210 [btrfs]
[27175.313116] do_chunk_alloc+0x25f/0x300 [btrfs]
[27175.313984] find_free_extent+0x706/0x10d0 [btrfs]
[27175.314855] btrfs_reserve_extent+0x9b/0x1d0 [btrfs]
[27175.315707] btrfs_alloc_tree_block+0x100/0x5b0 [btrfs]
[27175.316548] split_leaf+0x130/0x610 [btrfs]
[27175.317390] btrfs_search_slot+0x94d/0x9a0 [btrfs]
[27175.318235] btrfs_insert_empty_items+0x67/0xc0 [btrfs]
[27175.319087] alloc_reserved_file_extent+0x84/0x2c0 [btrfs]
[27175.319938] __btrfs_run_delayed_refs+0x596/0x1150 [btrfs]
[27175.320792] btrfs_run_delayed_refs+0xed/0x1b0 [btrfs]
[27175.321643] delayed_ref_async_start+0x81/0x90 [btrfs]
[27175.322491] normal_work_helper+0xd0/0x320 [btrfs]
[27175.323328] ? move_linked_works+0x6e/0xa0
[27175.324160] process_one_work+0x191/0x370
[27175.324976] worker_thread+0x4f/0x3b0
[27175.325763] kthread+0xf8/0x130
[27175.326531] ? rescuer_thread+0x320/0x320
[27175.327284] ? kthread_create_worker_on_cpu+0x50/0x50
[27175.328027] ret_from_fork+0x35/0x40
[27175.328741] ---[ end trace 300a1b9f0ac30e26 ]---
Fix this by preventing the flushing of new blocks groups when splitting a
leaf/node and when inserting a new root node for one of the trees modified
by the flushing operation, similar to what is done when COWing a node/leaf
from on of these trees.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202383
Reported-by: Eli V <eliventer@gmail.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1dbd449c99 upstream.
The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
lists and the shrink lists where the DCACHE_LRU_LIST bit is set.
The shrink_dcache_sb() function moves dentries from the LRU list to a
shrink list and subtracts the dentry count from nr_dentry_unused. This
is incorrect as the nr_dentry_unused count will also be decremented in
shrink_dentry_list() via d_shrink_del().
To fix this double decrement, the decrement in the shrink_dcache_sb()
function is taken out.
Fixes: 4e717f5c10 ("list_lru: remove special case function list_lru_dispose_all."
Cc: stable@kernel.org
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 082aaa8700 upstream.
When doing reads beyound the end of a file the server returns
error STATUS_END_OF_FILE error which is mapped to -ENODATA.
Currently we report it as a failure which confuses read stats.
Change it to not consider -ENODATA as failure for stat purposes.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7d42e72fe8 upstream.
Currently we log success once we send an async IO request to
the server. Instead we need to analyse a response and then log
success or failure for a particular command. Also fix argument
list for read logging.
Cc: <stable@vger.kernel.org> # 4.18
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 4.19.19
amd-xgbe: Fix mdio access for non-zero ports and clause 45 PHYs
net: bridge: Fix ethernet header pointer before check skb forwardable
net: Fix usage of pskb_trim_rcsum
net: phy: marvell: Errata for mv88e6390 internal PHYs
net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling
net/sched: act_tunnel_key: fix memory leak in case of action replace
net_sched: refetch skb protocol for each filter
openvswitch: Avoid OOB read when parsing flow nlattrs
vhost: log dirty page correctly
mlxsw: pci: Increase PCI SW reset timeout
net: ipv4: Fix memory leak in network namespace dismantle
mlxsw: spectrum_fid: Update dummy FID index
mlxsw: pci: Ring CQ's doorbell before RDQ's
net/sched: cls_flower: allocate mask dynamically in fl_change()
udp: with udp_segment release on error path
ip6_gre: fix tunnel list corruption for x-netns
erspan: build the header with the right proto according to erspan_ver
net: phy: marvell: Fix deadlock from wrong locking
ip6_gre: update version related info when changing link
tcp: allow MSG_ZEROCOPY transmission also in CLOSE_WAIT state
mei: me: mark LBG devices as having dma support
mei: me: add denverton innovation engine device IDs
USB: leds: fix regression in usbport led trigger
USB: serial: simple: add Motorola Tetra TPG2200 device id
USB: serial: pl2303: add new PID to support PL2303TB
ceph: clear inode pointer when snap realm gets dropped by its inode
ASoC: atom: fix a missing check of snd_pcm_lib_malloc_pages
ASoC: rt5514-spi: Fix potential NULL pointer dereference
ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode
clk: socfpga: stratix10: fix rate calculation for pll clocks
clk: socfpga: stratix10: fix naming convention for the fixed-clocks
inotify: Fix fd refcount leak in inotify_add_watch().
ALSA: hda/realtek - Fix typo for ALC225 model
ALSA: hda - Add mute LED support for HP ProBook 470 G5
ARCv2: lib: memeset: fix doing prefetchw outside of buffer
ARC: adjust memblock_reserve of kernel memory
ARC: perf: map generic branches to correct hardware condition
s390/mm: always force a load of the primary ASCE on context switch
s390/early: improve machine detection
s390/smp: fix CPU hotplug deadlock with CPU rescan
misc: ibmvsm: Fix potential NULL pointer dereference
char/mwave: fix potential Spectre v1 vulnerability
mmc: dw_mmc-bluefield: : Fix the license information
mmc: meson-gx: Free irq in release() callback
staging: rtl8188eu: Add device code for D-Link DWA-121 rev B1
tty: Handle problem if line discipline does not have receive_buf
uart: Fix crash in uart_write and uart_put_char
tty/n_hdlc: fix __might_sleep warning
hv_balloon: avoid touching uninitialized struct page during tail onlining
Drivers: hv: vmbus: Check for ring when getting debug info
vgacon: unconfuse vc_origin when using soft scrollback
CIFS: Fix possible hang during async MTU reads and writes
CIFS: Fix credits calculations for reads with errors
CIFS: Fix credit calculation for encrypted reads with errors
CIFS: Do not reconnect TCP session in add_credits()
smb3: add credits we receive from oplock/break PDUs
Input: xpad - add support for SteelSeries Stratus Duo
Input: input_event - provide override for sparc64
Input: uinput - fix undefined behavior in uinput_validate_absinfo()
acpi/nfit: Block function zero DSMs
acpi/nfit: Fix command-supported detection
scsi: ufs: Use explicit access size in ufshcd_dump_regs
dm thin: fix passdown_double_checking_shared_status()
dm crypt: fix parsing of extended IV arguments
drm/amdgpu: Add APTX quirk for Lenovo laptop
KVM: x86: Fix single-step debugging
KVM: x86: Fix PV IPIs for 32-bit KVM host
KVM: x86: WARN_ONCE if sending a PV IPI returns a fatal error
kvm: x86/vmx: Use kzalloc for cached_vmcs12
KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned
x86/pkeys: Properly copy pkey state at fork()
x86/selftests/pkeys: Fork() to check for state being preserved
x86/kaslr: Fix incorrect i8254 outb() parameters
x86/entry/64/compat: Fix stack switching for XEN PV
posix-cpu-timers: Unbreak timer rearming
net: sun: cassini: Cleanup license conflict
irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size
can: dev: __can_get_echo_skb(): fix bogous check for non-existing skb by removing it
can: bcm: check timer values before ktime conversion
can: flexcan: fix NULL pointer exception during bringup
vt: make vt_console_print() compatible with the unicode screen buffer
vt: always call notifier with the console lock held
vt: invoke notifier on screen size change
drm/meson: Fix atomic mode switching regression
bpf: improve verifier branch analysis
bpf: add per-insn complexity limit
bpf: move {prev_,}insn_idx into verifier env
bpf: move tmp variable into ax register in interpreter
bpf: enable access to ax register also from verifier rewrite
bpf: restrict map value pointer arithmetic for unprivileged
bpf: restrict stack pointer arithmetic for unprivileged
bpf: restrict unknown scalars of mixed signed bounds for unprivileged
bpf: fix check_map_access smin_value test when pointer contains offset
bpf: prevent out of bounds speculation on pointer arithmetic
bpf: fix sanitation of alu op with pointer / scalar type from different paths
bpf: fix inner map masking to prevent oob under speculation
s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU
nvmet-rdma: Add unlikely for response allocated check
nvmet-rdma: fix null dereference under heavy load
Revert "mm, memory_hotplug: initialize struct pages for the full memory section"
usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup
ide: fix a typo in the settings proc file name
Input: input_event - fix the CONFIG_SPARC64 mixup
Linux 4.19.19
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit ef68e83184 upstream.
When executing add_credits() we currently call cifs_reconnect()
if the number of credits is zero and there are no requests in
flight. In this case we may call cifs_reconnect() recursively
twice and cause memory corruption given the following sequence
of functions:
mid1.callback() -> add_credits() -> cifs_reconnect() ->
-> mid2.callback() -> add_credits() -> cifs_reconnect().
Fix this by avoiding to call cifs_reconnect() in add_credits()
and checking for zero credits in the demultiplex thread.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8004c78c68 upstream.
Currently we mark MID as malformed if we get an error from server
in a read response. This leads to not properly processing credits
in the readv callback. Fix this by marking such a response as
normal received response and process it appropriately.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit acc58d0bab upstream.
When doing MTU i/o we need to leave some credits for
possible reopen requests and other operations happening
in parallel. Currently we leave 1 credit which is not
enough even for reopen only: we need at least 2 credits
if durable handle reconnect fails. Also there may be
other operations at the same time including compounding
ones which require 3 credits at a time each. Fix this
by leaving 8 credits which is big enough to cover most
scenarios.
Was able to reproduce this when server was configured
to give out fewer credits than usual.
The proper fix would be to reconnect a file handle first
and then obtain credits for an MTU request but this leads
to bigger code changes and should happen in other patches.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d95e674c01 upstream.
snap realm and corresponding inode have pointers to each other.
The two pointer should get clear at the same time. Otherwise,
snap realm's pointer may reference freed inode.
Cc: stable@vger.kernel.org # 4.17+
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 4.19.18
ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address
mlxsw: spectrum: Disable lag port TX before removing it
mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletion
net: dsa: mv88x6xxx: mv88e6390 errata
net, skbuff: do not prefer skb allocation fails early
qmi_wwan: add MTU default to qmap network interface
r8169: Add support for new Realtek Ethernet
ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses
net: clear skb->tstamp in bridge forwarding path
netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets
gpio: pl061: Move irq_chip definition inside struct pl061
drm/amd/display: Guard against null stream_state in set_crc_source
drm/amdkfd: fix interrupt spin lock
ixgbe: allow IPsec Tx offload in VEPA mode
platform/x86: asus-wmi: Tell the EC the OS will handle the display off hotkey
e1000e: allow non-monotonic SYSTIM readings
usb: typec: tcpm: Do not disconnect link for self powered devices
selftests/bpf: enable (uncomment) all tests in test_libbpf.sh
of: overlay: add missing of_node_put() after add new node to changeset
writeback: don't decrement wb->refcnt if !wb->bdi
serial: set suppress_bind_attrs flag only if builtin
bpf: Allow narrow loads with offset > 0
ALSA: oxfw: add support for APOGEE duet FireWire
x86/mce: Fix -Wmissing-prototypes warnings
MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur
crypto: ecc - regularize scalar for scalar multiplication
arm64: perf: set suppress_bind_attrs flag to true
drm/atomic-helper: Complete fake_commit->flip_done potentially earlier
clk: meson: meson8b: fix incorrect divider mapping in cpu_scale_table
samples: bpf: fix: error handling regarding kprobe_events
usb: gadget: udc: renesas_usb3: add a safety connection way for forced_b_device
fpga: altera-cvp: fix probing for multiple FPGAs on the bus
selinux: always allow mounting submounts
ASoC: pcm3168a: Don't disable pcm3168a when CONFIG_PM defined
scsi: qedi: Check for session online before getting iSCSI TLV data.
drm/amdgpu: Reorder uvd ring init before uvd resume
rxe: IB_WR_REG_MR does not capture MR's iova field
efi/libstub: Disable some warnings for x86{,_64}
jffs2: Fix use of uninitialized delayed_work, lockdep breakage
clk: imx: make mux parent strings const
pstore/ram: Do not treat empty buffers as valid
media: uvcvideo: Refactor teardown of uvc on USB disconnect
powerpc/xmon: Fix invocation inside lock region
powerpc/pseries/cpuidle: Fix preempt warning
media: firewire: Fix app_info parameter type in avc_ca{,_app}_info
ASoC: use dma_ops of parent device for acp_audio_dma
media: venus: core: Set dma maximum segment size
staging: erofs: fix use-after-free of on-stack `z_erofs_vle_unzip_io'
net: call sk_dst_reset when set SO_DONTROUTE
scsi: target: use consistent left-aligned ASCII INQUIRY data
scsi: target/core: Make sure that target_wait_for_sess_cmds() waits long enough
selftests: do not macro-expand failed assertion expressions
arm64: kasan: Increase stack size for KASAN_EXTRA
clk: imx6q: reset exclusive gates on init
arm64: Fix minor issues with the dcache_by_line_op macro
bpf: relax verifier restriction on BPF_MOV | BPF_ALU
kconfig: fix file name and line number of warn_ignored_character()
kconfig: fix memory leak when EOF is encountered in quotation
mmc: atmel-mci: do not assume idle after atmci_request_end
btrfs: volumes: Make sure there is no overlap of dev extents at mount time
btrfs: alloc_chunk: fix more DUP stripe size handling
btrfs: fix use-after-free due to race between replace start and cancel
btrfs: improve error handling of btrfs_add_link
tty/serial: do not free trasnmit buffer page under port lock
perf intel-pt: Fix error with config term "pt=0"
perf tests ARM: Disable breakpoint tests 32-bit
perf svghelper: Fix unchecked usage of strncpy()
perf parse-events: Fix unchecked usage of strncpy()
perf vendor events intel: Fix Load_Miss_Real_Latency on SKL/SKX
netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is set
netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine
netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine
x86/topology: Use total_cpus for max logical packages calculation
dm crypt: use u64 instead of sector_t to store iv_offset
dm kcopyd: Fix bug causing workqueue stalls
perf stat: Avoid segfaults caused by negated options
tools lib subcmd: Don't add the kernel sources to the include path
dm snapshot: Fix excessive memory usage and workqueue stalls
perf cs-etm: Correct packets swapping in cs_etm__flush()
perf tools: Add missing sigqueue() prototype for systems lacking it
perf tools: Add missing open_memstream() prototype for systems lacking it
quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls.
clocksource/drivers/integrator-ap: Add missing of_node_put()
dm: Check for device sector overflow if CONFIG_LBDAF is not set
Bluetooth: btusb: Add support for Intel bluetooth device 8087:0029
ALSA: bebob: fix model-id of unit for Apogee Ensemble
sysfs: Disable lockdep for driver bind/unbind files
IB/usnic: Fix potential deadlock
scsi: mpt3sas: fix memory ordering on 64bit writes
scsi: smartpqi: correct lun reset issues
ath10k: fix peer stats null pointer dereference
scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
scsi: megaraid: fix out-of-bound array accesses
iomap: don't search past page end in iomap_is_partially_uptodate
ocfs2: fix panic due to unrecovered local alloc
mm/page-writeback.c: don't break integrity writeback on ->writepage() error
mm/swap: use nr_node_ids for avail_lists in swap_info_struct
userfaultfd: clear flag if remap event not enabled
mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps
iwlwifi: mvm: Send LQ command as async when necessary
Bluetooth: Fix unnecessary error message for HCI request completion
ipmi: fix use-after-free of user->release_barrier.rda
ipmi: msghandler: Fix potential Spectre v1 vulnerabilities
ipmi: Prevent use-after-free in deliver_response
ipmi:ssif: Fix handling of multi-part return messages
ipmi: Don't initialize anything in the core until something uses it
Linux 4.19.18
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 3cc31fa65d ]
iomap_is_partially_uptodate() is intended to check wither blocks within
the selected range of a not-uptodate page are uptodate; if the range we
care about is up to date, it's an optimization.
However, the iomap implementation continues to check all blocks up to
from+count, which is beyond the page, and can even be well beyond the
iop->uptodate bitmap.
I think the worst that will happen is that we may eventually find a zero
bit and return "not partially uptodate" when it would have otherwise
returned true, and skip the optimization. Still, it's clearly an invalid
memory access that must be fixed.
So: fix this by limiting the search to within the page as is done in the
non-iomap variant, block_is_partially_uptodate().
Zorro noticed thiswhen KASAN went off for 512 byte blocks on a 64k
page system:
BUG: KASAN: slab-out-of-bounds in iomap_is_partially_uptodate+0x1a0/0x1e0
Read of size 8 at addr ffff800120c3a318 by task fsstress/22337
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1690dd41e0 ]
In the error handling block, err holds the return value of either
btrfs_del_root_ref() or btrfs_del_inode_ref() but it hasn't been checked
since it's introduction with commit fe66a05a06 (Btrfs: improve error
handling for btrfs_insert_dir_item callers) in 2012.
If the error handling in the error handling fails, there's not much left
to do and the abort either happened earlier in the callees or is
necessary here.
So if one of btrfs_del_root_ref() or btrfs_del_inode_ref() failed, abort
the transaction, but still return the original code of the failure
stored in 'ret' as this will be reported to the user.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d189dd70e2 ]
The device replace cancel thread can race with the replace start thread
and if fs_info::scrubs_running is not yet set, btrfs_scrub_cancel() will
fail to stop the scrub thread.
The scrub thread continues with the scrub for replace which then will
try to write to the target device and which is already freed by the
cancel thread.
scrub_setup_ctx() warns as tgtdev is NULL.
struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
{
...
if (is_dev_replace) {
WARN_ON(!fs_info->dev_replace.tgtdev); <===
sctx->pages_per_wr_bio = SCRUB_PAGES_PER_WR_BIO;
sctx->wr_tgtdev = fs_info->dev_replace.tgtdev;
sctx->flush_all_writes = false;
}
[ 6724.497655] BTRFS info (device sdb): dev_replace from /dev/sdb (devid 1) to /dev/sdc started
[ 6753.945017] BTRFS info (device sdb): dev_replace from /dev/sdb (devid 1) to /dev/sdc canceled
[ 6852.426700] WARNING: CPU: 0 PID: 4494 at fs/btrfs/scrub.c:622 scrub_setup_ctx.isra.19+0x220/0x230 [btrfs]
...
[ 6852.428928] RIP: 0010:scrub_setup_ctx.isra.19+0x220/0x230 [btrfs]
...
[ 6852.432970] Call Trace:
[ 6852.433202] btrfs_scrub_dev+0x19b/0x5c0 [btrfs]
[ 6852.433471] btrfs_dev_replace_start+0x48c/0x6a0 [btrfs]
[ 6852.433800] btrfs_dev_replace_by_ioctl+0x3a/0x60 [btrfs]
[ 6852.434097] btrfs_ioctl+0x2476/0x2d20 [btrfs]
[ 6852.434365] ? do_sigaction+0x7d/0x1e0
[ 6852.434623] do_vfs_ioctl+0xa9/0x6c0
[ 6852.434865] ? syscall_trace_enter+0x1c8/0x310
[ 6852.435124] ? syscall_trace_enter+0x1c8/0x310
[ 6852.435387] ksys_ioctl+0x60/0x90
[ 6852.435663] __x64_sys_ioctl+0x16/0x20
[ 6852.435907] do_syscall_64+0x50/0x180
[ 6852.436150] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Further, as the replace thread enters scrub_write_page_to_dev_replace()
without the target device it panics:
static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx,
struct scrub_page *spage)
{
...
bio_set_dev(bio, sbio->dev->bdev); <======
[ 6929.715145] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
..
[ 6929.717106] Workqueue: btrfs-scrub btrfs_scrub_helper [btrfs]
[ 6929.717420] RIP: 0010:scrub_write_page_to_dev_replace+0xb4/0x260
[btrfs]
..
[ 6929.721430] Call Trace:
[ 6929.721663] scrub_write_block_to_dev_replace+0x3f/0x60 [btrfs]
[ 6929.721975] scrub_bio_end_io_worker+0x1af/0x490 [btrfs]
[ 6929.722277] normal_work_helper+0xf0/0x4c0 [btrfs]
[ 6929.722552] process_one_work+0x1f4/0x520
[ 6929.722805] ? process_one_work+0x16e/0x520
[ 6929.723063] worker_thread+0x46/0x3d0
[ 6929.723313] kthread+0xf8/0x130
[ 6929.723544] ? process_one_work+0x520/0x520
[ 6929.723800] ? kthread_delayed_work_timer_fn+0x80/0x80
[ 6929.724081] ret_from_fork+0x3a/0x50
Fix this by letting the btrfs_dev_replace_finishing() to do the job of
cleaning after the cancel, including freeing of the target device.
btrfs_dev_replace_finishing() is called when btrfs_scub_dev() returns
along with the scrub return status.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit baf92114c7 ]
Commit 92e222df7b "btrfs: alloc_chunk: fix DUP stripe size handling"
fixed calculating the stripe_size for a new DUP chunk.
However, the same calculation reappears a bit later, and that one was
not changed yet. The resulting bug that is exposed is that the newly
allocated device extents ('stripes') can have a few MiB overlap with the
next thing stored after them, which is another device extent or the end
of the disk.
The scenario in which this can happen is:
* The block device for the filesystem is less than 10GiB in size.
* The amount of contiguous free unallocated disk space chosen to use for
chunk allocation is 20% of the total device size, or a few MiB more or
less.
An example:
- The filesystem device is 7880MiB (max_chunk_size gets set to 788MiB)
- There's 1578MiB unallocated raw disk space left in one contiguous
piece.
In this case stripe_size is first calculated as 789MiB, (half of
1578MiB).
Since 789MiB (stripe_size * data_stripes) > 788MiB (max_chunk_size), we
enter the if block. Now stripe_size value is immediately overwritten
while calculating an adjusted value based on max_chunk_size, which ends
up as 788MiB.
Next, the value is rounded up to a 16MiB boundary, 800MiB, which is
actually more than the value we had before. However, the last comparison
fails to detect this, because it's comparing the value with the total
amount of free space, which is about twice the size of stripe_size.
In the example above, this means that the resulting raw disk space being
allocated is 1600MiB, while only a gap of 1578MiB has been found. The
second device extent object for this DUP chunk will overlap for 22MiB
with whatever comes next.
The underlying problem here is that the stripe_size is reused all the
time for different things. So, when entering the code in the if block,
stripe_size is immediately overwritten with something else. If later we
decide we want to have the previous value back, then the logic to
compute it was copy pasted in again.
With this change, the value in stripe_size is not unnecessarily
destroyed, so the duplicated calculation is not needed any more.
Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5eb193812a ]
Enhance btrfs_verify_dev_extents() to remember previous checked dev
extents, so it can verify no dev extents can overlap.
Analysis from Hans:
"Imagine allocating a DATA|DUP chunk.
In the chunk allocator, we first set...
max_stripe_size = SZ_1G;
max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE
... which is 10GiB.
Then...
/* we don't want a chunk larger than 10% of writeable space */
max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
max_chunk_size);
Imagine we only have one 7880MiB block device in this filesystem. Now
max_chunk_size is down to 788MiB.
The next step in the code is to search for max_stripe_size * dev_stripes
amount of free space on the device, which is in our example 1GiB * 2 =
2GiB. Imagine the device has exactly 1578MiB free in one contiguous
piece. This amount of bytes will be put in devices_info[ndevs - 1].max_avail
Next we recalculate the stripe_size (which is actually the device extent
length), based on the actual maximum amount of available raw disk space:
stripe_size = div_u64(devices_info[ndevs - 1].max_avail, dev_stripes);
stripe_size is now 789MiB
Next we do...
data_stripes = num_stripes / ncopies
...where data_stripes ends up as 1, because num_stripes is 2 (the amount
of device extents we're going to have), and DUP has ncopies 2.
Next there's a check...
if (stripe_size * data_stripes > max_chunk_size)
...which matches because 789MiB * 1 > 788MiB.
We go into the if code, and next is...
stripe_size = div_u64(max_chunk_size, data_stripes);
...which resets stripe_size to max_chunk_size: 788MiB
Next is a fun one...
/* bump the answer up to a 16MB boundary */
stripe_size = round_up(stripe_size, SZ_16M);
...which changes stripe_size from 788MiB to 800MiB.
We're not done changing stripe_size yet...
/* But don't go higher than the limits we found while searching
* for free extents
*/
stripe_size = min(devices_info[ndevs - 1].max_avail,
stripe_size);
This is bad. max_avail is twice the stripe_size (we need to fit 2 device
extents on the same device for DUP).
The result here is that 800MiB < 1578MiB, so it's unchanged. However,
the resulting DUP chunk will need 1600MiB disk space, which isn't there,
and the second dev_extent might extend into the next thing (next
dev_extent? end of device?) for 22MiB.
The last shown line of code relies on a situation where there's twice
the value of stripe_size present as value for the variable stripe_size
when it's DUP. This was actually the case before commit 92e222df7b
"btrfs: alloc_chunk: fix DUP stripe size handling", from which I quote:
"[...] in the meantime there's a check to see if the stripe_size does
not exceed max_chunk_size. Since during this check stripe_size is twice
the amount as intended, the check will reduce the stripe_size to
max_chunk_size if the actual correct to be used stripe_size is more than
half the amount of max_chunk_size."
In the previous version of the code, the 16MiB alignment (why is this
done, by the way?) would result in a 50% chance that it would actually
do an 8MiB alignment for the individual dev_extents, since it was
operating on double the size. Does this matter?
Does it matter that stripe_size can be set to anything which is not
16MiB aligned because of the amount of remaining available disk space
which is just taken?
What is the main purpose of this round_up?
The most straightforward thing to do seems something like...
stripe_size = min(
div_u64(devices_info[ndevs - 1].max_avail, dev_stripes),
stripe_size
)
..just putting half of the max_avail into stripe_size."
Link: https://lore.kernel.org/linux-btrfs/b3461a38-e5f8-f41d-c67c-2efac8129054@mendix.com/
Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ add analysis from report ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 30696378f6 ]
The ramoops backend currently calls persistent_ram_save_old() even
if a buffer is empty. While this appears to work, it is does not seem
like the right thing to do and could lead to future bugs so lets avoid
that. It also prevents misleading prints in the logs which claim the
buffer is valid.
I got something like:
found existing buffer, size 0, start 0
When I was expecting:
no valid data in buffer (sig = ...)
This bails out early (and reports with pr_debug()), since it's an
acceptable state.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 4.19.17
tty/ldsem: Wake up readers after timed out down_write()
tty: Hold tty_ldisc_lock() during tty_reopen()
tty: Simplify tty->count math in tty_reopen()
tty: Don't hold ldisc lock in tty_reopen() if ldisc present
can: gw: ensure DLC boundaries after CAN frame modification
netfilter: nf_conncount: replace CONNCOUNT_LOCK_SLOTS with CONNCOUNT_SLOTS
netfilter: nf_conncount: don't skip eviction when age is negative
netfilter: nf_conncount: split gc in two phases
netfilter: nf_conncount: restart search when nodes have been erased
netfilter: nf_conncount: merge lookup and add functions
netfilter: nf_conncount: move all list iterations under spinlock
netfilter: nf_conncount: speculative garbage collection on empty lists
netfilter: nf_conncount: fix argument order to find_next_bit
mmc: sdhci-msm: Disable CDR function on TX
Revert "scsi: target: iscsi: cxgbit: fix csk leak"
scsi: target: iscsi: cxgbit: fix csk leak
scsi: target: iscsi: cxgbit: fix csk leak
arm64/kvm: consistently handle host HCR_EL2 flags
arm64: Don't trap host pointer auth use to EL2
ipv6: fix kernel-infoleak in ipv6_local_error()
net: bridge: fix a bug on using a neighbour cache entry without checking its state
packet: Do not leak dev refcounts on error exit
tcp: change txhash on SYN-data timeout
tun: publish tfile after it's fully initialized
lan743x: Remove phy_read from link status change function
smc: move unhash as early as possible in smc_release()
r8169: don't try to read counters if chip is in a PCI power-save state
bonding: update nest level on unlink
ip: on queued skb use skb_header_pointer instead of pskb_may_pull
r8169: load Realtek PHY driver module before r8169
crypto: sm3 - fix undefined shift by >= width of value
crypto: caam - fix zero-length buffer DMA mapping
crypto: authencesn - Avoid twice completion call in decrypt path
crypto: ccree - convert to use crypto_authenc_extractkeys()
crypto: bcm - convert to use crypto_authenc_extractkeys()
crypto: authenc - fix parsing key with misaligned rta_len
crypto: talitos - reorder code in talitos_edesc_alloc()
crypto: talitos - fix ablkcipher for CONFIG_VMAP_STACK
xen: Fix x86 sched_clock() interface for xen
Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"
btrfs: wait on ordered extents on abort cleanup
Yama: Check for pid death before checking ancestry
scsi: core: Synchronize request queue PM status only on successful resume
scsi: sd: Fix cache_type_store()
mips: fix n32 compat_ipc_parse_version
MIPS: BCM47XX: Setup struct device for the SoC
MIPS: lantiq: Fix IPI interrupt handling
drm/i915/gvt: Fix mmap range check
OF: properties: add missing of_node_put
mfd: tps6586x: Handle interrupts on suspend
media: v4l: ioctl: Validate num_planes for debug messages
RDMA/nldev: Don't expose unsafe global rkey to regular user
RDMA/vmw_pvrdma: Return the correct opcode when creating WR
kbuild: Disable LD_DEAD_CODE_DATA_ELIMINATION with ftrace & GCC <= 4.7
net: dsa: realtek-smi: fix OF child-node lookup
pstore/ram: Avoid allocation and leak of platform data
arm64: kaslr: ensure randomized quantities are clean to the PoC
arm64: dts: marvell: armada-ap806: reserve PSCI area
Disable MSI also when pcie-octeon.pcie_disable on
fix int_sqrt64() for very large numbers
omap2fb: Fix stack memory disclosure
media: vivid: fix error handling of kthread_run
media: vivid: set min width/height to a value > 0
bpf: in __bpf_redirect_no_mac pull mac only if present
ipv6: make icmp6_send() robust against null skb->dev
LSM: Check for NULL cred-security on free
media: vb2: vb2_mmap: move lock up
sunrpc: handle ENOMEM in rpcb_getport_async
netfilter: ebtables: account ebt_table_info to kmemcg
block: use rcu_work instead of call_rcu to avoid sleep in softirq
selinux: fix GPF on invalid policy
blockdev: Fix livelocks on loop device
sctp: allocate sctp_sockaddr_entry with kzalloc
tipc: fix uninit-value in in tipc_conn_rcv_sub
tipc: fix uninit-value in tipc_nl_compat_link_reset_stats
tipc: fix uninit-value in tipc_nl_compat_bearer_enable
tipc: fix uninit-value in tipc_nl_compat_link_set
tipc: fix uninit-value in tipc_nl_compat_name_table_dump
tipc: fix uninit-value in tipc_nl_compat_doit
block/loop: Don't grab "struct file" for vfs_getattr() operation.
block/loop: Use global lock for ioctl() operation.
loop: Fold __loop_release into loop_release
loop: Get rid of loop_index_mutex
loop: Push lo_ctl_mutex down into individual ioctls
loop: Split setting of lo_state from loop_clr_fd
loop: Push loop_ctl_mutex down into loop_clr_fd()
loop: Push loop_ctl_mutex down to loop_get_status()
loop: Push loop_ctl_mutex down to loop_set_status()
loop: Push loop_ctl_mutex down to loop_set_fd()
loop: Push loop_ctl_mutex down to loop_change_fd()
loop: Move special partition reread handling in loop_clr_fd()
loop: Move loop_reread_partitions() out of loop_ctl_mutex
loop: Fix deadlock when calling blkdev_reread_part()
loop: Avoid circular locking dependency between loop_ctl_mutex and bd_mutex
loop: Get rid of 'nested' acquisition of loop_ctl_mutex
loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()
loop: drop caches if offset or block_size are changed
drm/fb-helper: Ignore the value of fb_var_screeninfo.pixclock
selftests: Fix test errors related to lib.mk khdr target
media: vb2: be sure to unlock mutex on errors
nbd: Use set_blocksize() to set device blocksize
Linux 4.19.17
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 04906b2f54 upstream.
bd_set_size() updates also block device's block size. This is somewhat
unexpected from its name and at this point, only blkdev_open() uses this
functionality. Furthermore, this can result in changing block size under
a filesystem mounted on a loop device which leads to livelocks inside
__getblk_gfp() like:
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
01/01/2011
RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106
...
Call Trace:
init_page_buffers+0x3e2/0x530 fs/buffer.c:904
grow_dev_page fs/buffer.c:947 [inline]
grow_buffers fs/buffer.c:1009 [inline]
__getblk_slow fs/buffer.c:1036 [inline]
__getblk_gfp+0x906/0xb10 fs/buffer.c:1313
__bread_gfp+0x2d/0x310 fs/buffer.c:1347
sb_bread include/linux/buffer_head.h:307 [inline]
fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75
fat_ent_read_block fs/fat/fatent.c:441 [inline]
fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489
fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101
__fat_get_block fs/fat/inode.c:148 [inline]
...
Trivial reproducer for the problem looks like:
truncate -s 1G /tmp/image
losetup /dev/loop0 /tmp/image
mkfs.ext4 -b 1024 /dev/loop0
mount -t ext4 /dev/loop0 /mnt
losetup -c /dev/loop0
l /mnt
Fix the problem by moving initialization of a block device block size
into a separate function and call it when needed.
Thanks to Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> for help with
debugging the problem.
Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5631e8576a upstream.
Yue Hu noticed that when parsing device tree the allocated platform data
was never freed. Since it's not used beyond the function scope, this
switches to using a stack variable instead.
Reported-by: Yue Hu <huyue2@yulong.com>
Fixes: 35da60941e ("pstore/ram: add Device Tree bindings")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 77b7aad195 upstream.
This reverts commit e73e81b6d0.
This patch causes a few problems:
- adds latency to btrfs_finish_ordered_io
- as btrfs_finish_ordered_io is used for free space cache, generating
more work from btrfs_btree_balance_dirty_nodelay could end up in the
same workque, effectively deadlocking
12260 kworker/u96:16+btrfs-freespace-write D
[<0>] balance_dirty_pages+0x6e6/0x7ad
[<0>] balance_dirty_pages_ratelimited+0x6bb/0xa90
[<0>] btrfs_finish_ordered_io+0x3da/0x770
[<0>] normal_work_helper+0x1c5/0x5a0
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x46/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
Transaction commit will wait on the freespace cache:
838 btrfs-transacti D
[<0>] btrfs_start_ordered_extent+0x154/0x1e0
[<0>] btrfs_wait_ordered_range+0xbd/0x110
[<0>] __btrfs_wait_cache_io+0x49/0x1a0
[<0>] btrfs_write_dirty_block_groups+0x10b/0x3b0
[<0>] commit_cowonly_roots+0x215/0x2b0
[<0>] btrfs_commit_transaction+0x37e/0x910
[<0>] transaction_kthread+0x14d/0x180
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
And then writepages ends up waiting on transaction commit:
9520 kworker/u96:13+flush-btrfs-1 D
[<0>] wait_current_trans+0xac/0xe0
[<0>] start_transaction+0x21b/0x4b0
[<0>] cow_file_range_inline+0x10b/0x6b0
[<0>] cow_file_range.isra.69+0x329/0x4a0
[<0>] run_delalloc_range+0x105/0x3c0
[<0>] writepage_delalloc+0x119/0x180
[<0>] __extent_writepage+0x10c/0x390
[<0>] extent_write_cache_pages+0x26f/0x3d0
[<0>] extent_writepages+0x4f/0x80
[<0>] do_writepages+0x17/0x60
[<0>] __writeback_single_inode+0x59/0x690
[<0>] writeback_sb_inodes+0x291/0x4e0
[<0>] __writeback_inodes_wb+0x87/0xb0
[<0>] wb_writeback+0x3bb/0x500
[<0>] wb_workfn+0x40d/0x610
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x1e0/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff
Eventually, we have every process in the system waiting on
balance_dirty_pages(), and nobody is able to make progress on page
writeback.
The original patch tried to fix an OOM condition, that happened on 4.4 but no
success reproducing that on later kernels (4.19 and 4.20). This is more likely
a problem in OOM itself.
Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/
Reported-by: Chris Mason <clm@fb.com>
CC: stable@vger.kernel.org # 4.18+
CC: ethanlien <ethanlien@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This switches over to propagation_next to respect
namepsace semantics.
Test: Remounting to change the options of a fs with mount based
options should propagate to all shared copies of that mount,
and the slaves/indirect slaves of those.
Bug: 122428178
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: Ic35cd2782a646435689f5bedfa1f218fe4ab8254
Changes in 4.19.16
Btrfs: fix deadlock when using free space tree due to block group creation
staging: rtl8188eu: Fix module loading from tasklet for CCMP encryption
staging: rtl8188eu: Fix module loading from tasklet for WEP encryption
cpufreq: scmi: Fix frequency invariance in slow path
x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE
ALSA: hda/realtek - Support Dell headset mode for New AIO platform
ALSA: hda/realtek - Add unplug function into unplug state of Headset Mode for ALC225
ALSA: hda/realtek - Disable headset Mic VREF for headset mode of ALC225
CIFS: Fix adjustment of credits for MTU requests
CIFS: Do not set credits to 1 if the server didn't grant anything
CIFS: Do not hide EINTR after sending network packets
CIFS: Fix credit computation for compounded requests
cifs: Fix potential OOB access of lock element array
usb: cdc-acm: send ZLP for Telit 3G Intel based modems
USB: storage: don't insert sane sense for SPC3+ when bad sense specified
USB: storage: add quirk for SMI SM3350
USB: Add USB_QUIRK_DELAY_CTRL_MSG quirk for Corsair K70 RGB
slab: alien caches must not be initialized if the allocation of the alien cache failed
mm/usercopy.c: no check page span for stack objects
mm, memcg: fix reclaim deadlock with writeback
ACPI: power: Skip duplicate power resource references in _PRx
ACPI / PMIC: xpower: Fix TS-pin current-source handling
ACPI/IORT: Fix rc_dma_get_range()
i2c: dev: prevent adapter retries and timeout being set as minus value
mtd: rawnand: qcom: fix memory corruption that causes panic
vfio/type1: Fix unmap overflow off-by-one
drm/amdgpu: Add new VegaM pci id
PCI: dwc: Use interrupt masking instead of disabling
PCI: dwc: Take lock when ACKing an interrupt
PCI: dwc: Move interrupt acking into the proper callback
drm/amd/display: Fix MST dp_blank REG_WAIT timeout
drm/fb_helper: Allow leaking fbdev smem_start
drm/fb-helper: Partially bring back workaround for bugs of SDL 1.2
drm/i915: Unwind failure on pinning the gen7 ppgtt
drm/amdgpu: Don't ignore rc from drm_dp_mst_topology_mgr_resume()
drm/amdgpu: Don't fail resume process if resuming atomic state fails
rbd: don't return 0 on unmap if RBD_DEV_FLAG_REMOVING is set
ext4: make sure enough credits are reserved for dioread_nolock writes
ext4: fix a potential fiemap/page fault deadlock w/ inline_data
ext4: avoid kernel warning when writing the superblock to a dead device
ext4: use ext4_write_inode() when fsyncing w/o a journal
ext4: track writeback errors using the generic tracking infrastructure
ext4: fix special inode number checks in __ext4_iget()
mm: page_mapped: don't assume compound page is huge or THP
sunrpc: use-after-free in svc_process_common()
KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less
arm64: compat: Don't pull syscall number from regs in arm_compat_syscall
Btrfs: fix access to available allocation bits when starting balance
Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation
Btrfs: use nofs context when initializing security xattrs to avoid deadlock
Linux 4.19.16
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 827aa18e7b upstream.
When initializing the security xattrs, we are holding a transaction handle
therefore we need to use a GFP_NOFS context in order to avoid a deadlock
with reclaim in case it's triggered.
Fixes: 39a27ec100 ("btrfs: use GFP_KERNEL for xattr and acl allocations")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a6f209e36 upstream.
If the quota enable and snapshot creation ioctls are called concurrently
we can get into a deadlock where the task enabling quotas will deadlock
on the fs_info->qgroup_ioctl_lock mutex because it attempts to lock it
twice, or the task creating a snapshot tries to commit the transaction
while the task enabling quota waits for the former task to commit the
transaction while holding the mutex. The following time diagrams show how
both cases happen.
First scenario:
CPU 0 CPU 1
btrfs_ioctl()
btrfs_ioctl_quota_ctl()
btrfs_quota_enable()
mutex_lock(fs_info->qgroup_ioctl_lock)
btrfs_start_transaction()
btrfs_ioctl()
btrfs_ioctl_snap_create_v2
create_snapshot()
--> adds snapshot to the
list pending_snapshots
of the current
transaction
btrfs_commit_transaction()
create_pending_snapshots()
create_pending_snapshot()
qgroup_account_snapshot()
btrfs_qgroup_inherit()
mutex_lock(fs_info->qgroup_ioctl_lock)
--> deadlock, mutex already locked
by this task at
btrfs_quota_enable()
Second scenario:
CPU 0 CPU 1
btrfs_ioctl()
btrfs_ioctl_quota_ctl()
btrfs_quota_enable()
mutex_lock(fs_info->qgroup_ioctl_lock)
btrfs_start_transaction()
btrfs_ioctl()
btrfs_ioctl_snap_create_v2
create_snapshot()
--> adds snapshot to the
list pending_snapshots
of the current
transaction
btrfs_commit_transaction()
--> waits for task at
CPU 0 to release
its transaction
handle
btrfs_commit_transaction()
--> sees another task started
the transaction commit first
--> releases its transaction
handle
--> waits for the transaction
commit to be completed by
the task at CPU 1
create_pending_snapshot()
qgroup_account_snapshot()
btrfs_qgroup_inherit()
mutex_lock(fs_info->qgroup_ioctl_lock)
--> deadlock, task at CPU 0
has the mutex locked but
it is waiting for us to
finish the transaction
commit
So fix this by setting the quota enabled flag in fs_info after committing
the transaction at btrfs_quota_enable(). This ends up serializing quota
enable and snapshot creation as if the snapshot creation happened just
before the quota enable request. The quota rescan task, scheduled after
committing the transaction in btrfs_quote_enable(), will do the accounting.
Fixes: 6426c7ad69 ("btrfs: qgroup: Fix qgroup accounting when creating snapshot")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a8067c0d1 upstream.
The available allocation bits members from struct btrfs_fs_info are
protected by a sequence lock, and when starting balance we access them
incorrectly in two different ways:
1) In the read sequence lock loop at btrfs_balance() we use the values we
read from fs_info->avail_*_alloc_bits and we can immediately do actions
that have side effects and can not be undone (printing a message and
jumping to a label). This is wrong because a retry might be needed, so
our actions must not have side effects and must be repeatable as long
as read_seqretry() returns a non-zero value. In other words, we were
essentially ignoring the sequence lock;
2) Right below the read sequence lock loop, we were reading the values
from avail_metadata_alloc_bits and avail_data_alloc_bits without any
protection from concurrent writers, that is, reading them outside of
the read sequence lock critical section.
So fix this by making sure we only read the available allocation bits
while in a read sequence lock critical section and that what we do in the
critical section is repeatable (has nothing that can not be undone) so
that any eventual retry that is needed is handled properly.
Fixes: de98ced9e7 ("Btrfs: use seqlock to protect fs_info->avail_{data, metadata, system}_alloc_bits")
Fixes: 1450612797 ("btrfs: fix a bogus warning when converting only data or metadata")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 191ce17876 upstream.
The check for special (reserved) inode number checks in __ext4_iget()
was broken by commit 8a363970d1: ("ext4: avoid declaring fs
inconsistent due to invalid file handles"). This was caused by a
botched reversal of the sense of the flag now known as
EXT4_IGET_SPECIAL (when it was previously named EXT4_IGET_NORMAL).
Fix the logic appropriately.
Fixes: 8a363970d1 ("ext4: avoid declaring fs inconsistent...")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95cb671387 upstream.
We already using mapping_set_error() in fs/ext4/page_io.c, so all we
need to do is to use file_check_and_advance_wb_err() when handling
fsync() requests in ext4_sync_file().
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad211f3e94 upstream.
In no-journal mode, we previously used __generic_file_fsync() in
no-journal mode. This triggers a lockdep warning, and in addition,
it's not safe to depend on the inode writeback mechanism in the case
ext4. We can solve both problems by calling ext4_write_inode()
directly.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e86807862e upstream.
The xfstests generic/475 test switches the underlying device with
dm-error while running a stress test. This results in a large number
of file system errors, and since we can't lock the buffer head when
marking the superblock dirty in the ext4_grp_locked_error() case, it's
possible the superblock to be !buffer_uptodate() without
buffer_write_io_error() being true.
We need to set buffer_uptodate() before we call mark_buffer_dirty() or
this will trigger a WARN_ON. It's safe to do this since the
superblock must have been properly read into memory or the mount would
have been successful. So if buffer_uptodate() is not set, we can
safely assume that this happened due to a failed attempt to write the
superblock.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b08b1f12c upstream.
The ext4_inline_data_fiemap() function calls fiemap_fill_next_extent()
while still holding the xattr semaphore. This is not necessary and it
triggers a circular lockdep warning. This is because
fiemap_fill_next_extent() could trigger a page fault when it writes
into page which triggers a page fault. If that page is mmaped from
the inline file in question, this could very well result in a
deadlock.
This problem can be reproduced using generic/519 with a file system
configuration which has the inline_data feature enabled.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 812c0cab2c upstream.
There are enough credits reserved for most dioread_nolock writes;
however, if the extent tree is sufficiently deep, and/or quota is
enabled, the code was not allowing for all eventualities when
reserving journal credits for the unwritten extent conversion.
This problem can be seen using xfstests ext4/034:
WARNING: CPU: 1 PID: 257 at fs/ext4/ext4_jbd2.c:271 __ext4_handle_dirty_metadata+0x10c/0x180
Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work
RIP: 0010:__ext4_handle_dirty_metadata+0x10c/0x180
...
EXT4-fs: ext4_free_blocks:4938: aborting transaction: error 28 in __ext4_handle_dirty_metadata
EXT4: jbd2_journal_dirty_metadata failed: handle type 11 started at line 4921, credits 4/0, errcode -28
EXT4-fs error (device dm-1) in ext4_free_blocks:4950: error 28
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8544f4aa9d upstream.
In SMB3 protocol every part of the compound chain consumes credits
individually, so we need to call wait_for_free_credits() for each
of the PDUs in the chain. If an operation is interrupted, we must
ensure we return all credits taken from the server structure back.
Without this patch server can sometimes disconnect the session
due to credit mismatches, especially when first operation(s)
are large writes.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee13919c2e upstream.
Currently we hide EINTR code returned from sock_sendmsg()
and return 0 instead. This makes a caller think that we
successfully completed the network operation which is not
true. Fix this by properly returning EINTR to callers.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>