Patch series "Fragmentation avoidance improvements", v5.
It has been noted before that fragmentation avoidance (aka
anti-fragmentation) is not perfect. Given sufficient time or an adverse
workload, memory gets fragmented and the long-term success of high-order
allocations degrades. This series defines an adverse workload, a definition
of external fragmentation events (including serious) ones and a series
that reduces the level of those fragmentation events.
The details of the workload and the consequences are described in more
detail in the changelogs. However, from patch 1, this is a high-level
summary of the adverse workload. The exact details are found in the
mmtests implementation.
The broad details of the workload are as follows;
1. Create an XFS filesystem (not specified in the configuration but done
as part of the testing for this patch)
2. Start 4 fio threads that write a number of 64K files inefficiently.
Inefficiently means that files are created on first access and not
created in advance (fio parameterr create_on_open=1) and fallocate
is not used (fallocate=none). With multiple IO issuers this creates
a mix of slab and page cache allocations over time. The total size
of the files is 150% physical memory so that the slabs and page cache
pages get mixed
3. Warm up a number of fio read-only threads accessing the same files
created in step 2. This part runs for the same length of time it
took to create the files. It'll fault back in old data and further
interleave slab and page cache allocations. As it's now low on
memory due to step 2, fragmentation occurs as pageblocks get
stolen.
4. While step 3 is still running, start a process that tries to allocate
75% of memory as huge pages with a number of threads. The number of
threads is based on a (NR_CPUS_SOCKET - NR_FIO_THREADS)/4 to avoid THP
threads contending with fio, any other threads or forcing cross-NUMA
scheduling. Note that the test has not been used on a machine with less
than 8 cores. The benchmark records whether huge pages were allocated
and what the fault latency was in microseconds
5. Measure the number of events potentially causing external fragmentation,
the fault latency and the huge page allocation success rate.
6. Cleanup
Overall the series reduces external fragmentation causing events by over 94%
on 1 and 2 socket machines, which in turn impacts high-order allocation
success rates over the long term. There are differences in latencies and
high-order allocation success rates. Latencies are a mixed bag as they
are vulnerable to exact system state and whether allocations succeeded
so they are treated as a secondary metric.
Patch 1 uses lower zones if they are populated and have free memory
instead of fragmenting a higher zone. It's special cased to
handle a Normal->DMA32 fallback with the reasons explained
in the changelog.
Patch 2-4 boosts watermarks temporarily when an external fragmentation
event occurs. kswapd wakes to reclaim a small amount of old memory
and then wakes kcompactd on completion to recover the system
slightly. This introduces some overhead in the slowpath. The level
of boosting can be tuned or disabled depending on the tolerance
for fragmentation vs allocation latency.
Patch 5 stalls some movable allocation requests to let kswapd from patch 4
make some progress. The duration of the stalls is very low but it
is possible to tune the system to avoid fragmentation events if
larger stalls can be tolerated.
The bulk of the improvement in fragmentation avoidance is from patches
1-4 but patch 5 can deal with a rare corner case and provides the option
of tuning a system for THP allocation success rates in exchange for
some stalls to control fragmentation.
This patch (of 5):
The page allocator zone lists are iterated based on the watermarks of each
zone which does not take anti-fragmentation into account. On x86, node 0
may have multiple zones while other nodes have one zone. A consequence is
that tasks running on node 0 may fragment ZONE_NORMAL even though
ZONE_DMA32 has plenty of free memory. This patch special cases the
allocator fast path such that it'll try an allocation from a lower local
zone before fragmenting a higher zone. In this case, stealing of
pageblocks or orders larger than a pageblock are still allowed in the fast
path as they are uninteresting from a fragmentation point of view.
This was evaluated using a benchmark designed to fragment memory before
attempting THP allocations. It's implemented in mmtests as the following
configurations
configs/config-global-dhp__workload_thpfioscale
configs/config-global-dhp__workload_thpfioscale-defrag
configs/config-global-dhp__workload_thpfioscale-madvhugepage
e.g. from mmtests
./run-mmtests.sh --run-monitor --config configs/config-global-dhp__workload_thpfioscale test-run-1
The broad details of the workload are as follows;
1. Create an XFS filesystem (not specified in the configuration but done
as part of the testing for this patch).
2. Start 4 fio threads that write a number of 64K files inefficiently.
Inefficiently means that files are created on first access and not
created in advance (fio parameter create_on_open=1) and fallocate
is not used (fallocate=none). With multiple IO issuers this creates
a mix of slab and page cache allocations over time. The total size
of the files is 150% physical memory so that the slabs and page cache
pages get mixed.
3. Warm up a number of fio read-only processes accessing the same files
created in step 2. This part runs for the same length of time it
took to create the files. It'll refault old data and further
interleave slab and page cache allocations. As it's now low on
memory due to step 2, fragmentation occurs as pageblocks get
stolen.
4. While step 3 is still running, start a process that tries to allocate
75% of memory as huge pages with a number of threads. The number of
threads is based on a (NR_CPUS_SOCKET - NR_FIO_THREADS)/4 to avoid THP
threads contending with fio, any other threads or forcing cross-NUMA
scheduling. Note that the test has not been used on a machine with less
than 8 cores. The benchmark records whether huge pages were allocated
and what the fault latency was in microseconds.
5. Measure the number of events potentially causing external fragmentation,
the fault latency and the huge page allocation success rate.
6. Cleanup the test files.
Note that due to the use of IO and page cache that this benchmark is not
suitable for running on large machines where the time to fragment memory
may be excessive. Also note that while this is one mix that generates
fragmentation that it's not the only mix that generates fragmentation.
Differences in workload that are more slab-intensive or whether SLUB is
used with high-order pages may yield different results.
When the page allocator fragments memory, it records the event using the
mm_page_alloc_extfrag ftrace event. If the fallback_order is smaller than
a pageblock order (order-9 on 64-bit x86) then it's considered to be an
"external fragmentation event" that may cause issues in the future.
Hence, the primary metric here is the number of external fragmentation
events that occur with order < 9. The secondary metric is allocation
latency and huge page allocation success rates but note that differences
in latencies and what the success rate also can affect the number of
external fragmentation event which is why it's a secondary metric.
1-socket Skylake machine
config-global-dhp__workload_thpfioscale XFS (no special madvise)
4 fio threads, 1 THP allocating thread
--------------------------------------
4.20-rc3 extfrag events < order 9: 804694
4.20-rc3+patch: 408912 (49% reduction)
thpfioscale Fault Latencies
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Amean fault-base-1 662.92 ( 0.00%) 653.58 * 1.41%*
Amean fault-huge-1 0.00 ( 0.00%) 0.00 ( 0.00%)
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Percentage huge-1 0.00 ( 0.00%) 0.00 ( 0.00%)
Fault latencies are slightly reduced while allocation success rates remain
at zero as this configuration does not make any special effort to allocate
THP and fio is heavily active at the time and either filling memory or
keeping pages resident. However, a 49% reduction of serious fragmentation
events reduces the changes of external fragmentation being a problem in
the future.
Vlastimil asked during review for a breakdown of the allocation types
that are falling back.
vanilla
3816 MIGRATE_UNMOVABLE
800845 MIGRATE_MOVABLE
33 MIGRATE_UNRECLAIMABLE
patch
735 MIGRATE_UNMOVABLE
408135 MIGRATE_MOVABLE
42 MIGRATE_UNRECLAIMABLE
The majority of the fallbacks are due to movable allocations and this is
consistent for the workload throughout the series so will not be presented
again as the primary source of fallbacks are movable allocations.
Movable fallbacks are sometimes considered "ok" to fallback because they
can be migrated. The problem is that they can fill an
unmovable/reclaimable pageblock causing those allocations to fallback
later and polluting pageblocks with pages that cannot move. If there is a
movable fallback, it is pretty much guaranteed to affect an
unmovable/reclaimable pageblock and while it might not be enough to
actually cause a unmovable/reclaimable fallback in the future, we cannot
know that in advance so the patch takes the only option available to it.
Hence, it's important to control them. This point is also consistent
throughout the series and will not be repeated.
1-socket Skylake machine
global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
-----------------------------------------------------------------
4.20-rc3 extfrag events < order 9: 291392
4.20-rc3+patch: 191187 (34% reduction)
thpfioscale Fault Latencies
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Amean fault-base-1 1495.14 ( 0.00%) 1467.55 ( 1.85%)
Amean fault-huge-1 1098.48 ( 0.00%) 1127.11 ( -2.61%)
thpfioscale Percentage Faults Huge
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Percentage huge-1 78.57 ( 0.00%) 77.64 ( -1.18%)
Fragmentation events were reduced quite a bit although this is known
to be a little variable. The latencies and allocation success rates
are similar but they were already quite high.
2-socket Haswell machine
config-global-dhp__workload_thpfioscale XFS (no special madvise)
4 fio threads, 5 THP allocating threads
----------------------------------------------------------------
4.20-rc3 extfrag events < order 9: 215698
4.20-rc3+patch: 200210 (7% reduction)
thpfioscale Fault Latencies
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Amean fault-base-5 1350.05 ( 0.00%) 1346.45 ( 0.27%)
Amean fault-huge-5 4181.01 ( 0.00%) 3418.60 ( 18.24%)
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Percentage huge-5 1.15 ( 0.00%) 0.78 ( -31.88%)
The reduction of external fragmentation events is slight and this is
partially due to the removal of __GFP_THISNODE in commit ac5b2c1891
("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings") as THP
allocations can now spill over to remote nodes instead of fragmenting
local memory.
2-socket Haswell machine
global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
-----------------------------------------------------------------
4.20-rc3 extfrag events < order 9: 166352
4.20-rc3+patch: 147463 (11% reduction)
thpfioscale Fault Latencies
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Amean fault-base-5 6138.97 ( 0.00%) 6217.43 ( -1.28%)
Amean fault-huge-5 2294.28 ( 0.00%) 3163.33 * -37.88%*
thpfioscale Percentage Faults Huge
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Percentage huge-5 96.82 ( 0.00%) 95.14 ( -1.74%)
There was a slight reduction in external fragmentation events although the
latencies were higher. The allocation success rate is high enough that
the system is struggling and there is quite a lot of parallel reclaim and
compaction activity. There is also a certain degree of luck on whether
processes start on node 0 or not for this patch but the relevance is
reduced later in the series.
Overall, the patch reduces the number of external fragmentation causing
events so the success of THP over long periods of time would be improved
for this adverse workload.
Link: http://lkml.kernel.org/r/20181123114528.28802-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: If804b49c46fe359ca6addacd4c3a8b36d8571ca6
Git-commit: 6bb154504f
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
(cherry picked from commit 6bb154504f)
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 150378964
Leaf changes summary: 2 artifacts changed
Changed leaf types summary: 2 leaf types changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
'struct ufs_hba at ufshcd.h:545:1' changed:
type size changed from 14528 to 14592 (in bits)
1 data member insertion:
'size_t ufs_hba::sg_entry_size', at offset 1920 (in bits) at ufshcd.h:593:1
there are data member changes:
'unsigned int ufs_hba::irq' offset changed from 1920 to 1984 (in bits) (by +64 bits)
'bool ufs_hba::is_irq_enabled' offset changed from 1952 to 2016 (in bits) (by +64 bits)
'unsigned int ufs_hba::quirks' offset changed from 1984 to 2048 (in bits) (by +64 bits)
'unsigned int ufs_hba::dev_quirks' offset changed from 2016 to 2080 (in bits) (by +64 bits)
'wait_queue_head_t ufs_hba::tm_wq' offset changed from 2048 to 2112 (in bits) (by +64 bits)
'wait_queue_head_t ufs_hba::tm_tag_wq' offset changed from 2240 to 2304 (in bits) (by +64 bits)
'unsigned long int ufs_hba::tm_condition' offset changed from 2432 to 2496 (in bits) (by +64 bits)
'unsigned long int ufs_hba::tm_slots_in_use' offset changed from 2496 to 2560 (in bits) (by +64 bits)
'uic_command* ufs_hba::active_uic_cmd' offset changed from 2560 to 2624 (in bits) (by +64 bits)
'mutex ufs_hba::uic_cmd_mutex' offset changed from 2624 to 2688 (in bits) (by +64 bits)
'completion* ufs_hba::uic_async_done' offset changed from 2880 to 2944 (in bits) (by +64 bits)
'u32 ufs_hba::ufshcd_state' offset changed from 2944 to 3008 (in bits) (by +64 bits)
'u32 ufs_hba::eh_flags' offset changed from 2976 to 3040 (in bits) (by +64 bits)
'u32 ufs_hba::intr_mask' offset changed from 3008 to 3072 (in bits) (by +64 bits)
'u16 ufs_hba::ee_ctrl_mask' offset changed from 3040 to 3104 (in bits) (by +64 bits)
'bool ufs_hba::is_powered' offset changed from 3056 to 3120 (in bits) (by +64 bits)
'bool ufs_hba::is_init_prefetch' offset changed from 3064 to 3128 (in bits) (by +64 bits)
'ufs_init_prefetch ufs_hba::init_prefetch_data' offset changed from 3072 to 3136 (in bits) (by +64 bits)
'work_struct ufs_hba::eh_work' offset changed from 3136 to 3200 (in bits) (by +64 bits)
'work_struct ufs_hba::eeh_work' offset changed from 3392 to 3456 (in bits) (by +64 bits)
'u32 ufs_hba::errors' offset changed from 3648 to 3712 (in bits) (by +64 bits)
'u32 ufs_hba::uic_error' offset changed from 3680 to 3744 (in bits) (by +64 bits)
'u32 ufs_hba::saved_err' offset changed from 3712 to 3776 (in bits) (by +64 bits)
'u32 ufs_hba::saved_uic_err' offset changed from 3744 to 3808 (in bits) (by +64 bits)
'ufs_stats ufs_hba::ufs_stats' offset changed from 3776 to 3840 (in bits) (by +64 bits)
'bool ufs_hba::silence_err_logs' offset changed from 8064 to 8128 (in bits) (by +64 bits)
'ufs_dev_cmd ufs_hba::dev_cmd' offset changed from 8128 to 8192 (in bits) (by +64 bits)
'ktime_t ufs_hba::last_dme_cmd_tstamp' offset changed from 9152 to 9216 (in bits) (by +64 bits)
'ufs_dev_info ufs_hba::dev_info' offset changed from 9216 to 9280 (in bits) (by +64 bits)
'bool ufs_hba::auto_bkops_enabled' offset changed from 9232 to 9296 (in bits) (by +64 bits)
'ufs_vreg_info ufs_hba::vreg_info' offset changed from 9280 to 9344 (in bits) (by +64 bits)
'list_head ufs_hba::clk_list_head' offset changed from 9536 to 9600 (in bits) (by +64 bits)
'bool ufs_hba::wlun_dev_clr_ua' offset changed from 9664 to 9728 (in bits) (by +64 bits)
'int ufs_hba::req_abort_count' offset changed from 9696 to 9760 (in bits) (by +64 bits)
'u32 ufs_hba::lanes_per_direction' offset changed from 9728 to 9792 (in bits) (by +64 bits)
'ufs_pa_layer_attr ufs_hba::pwr_info' offset changed from 9760 to 9824 (in bits) (by +64 bits)
'ufs_pwr_mode_info ufs_hba::max_pwr_info' offset changed from 9984 to 10048 (in bits) (by +64 bits)
'ufs_clk_gating ufs_hba::clk_gating' offset changed from 10240 to 10304 (in bits) (by +64 bits)
'u32 ufs_hba::caps' offset changed from 12032 to 12096 (in bits) (by +64 bits)
'devfreq* ufs_hba::devfreq' offset changed from 12096 to 12160 (in bits) (by +64 bits)
'ufs_clk_scaling ufs_hba::clk_scaling' offset changed from 12160 to 12224 (in bits) (by +64 bits)
'bool ufs_hba::is_sys_suspended' offset changed from 13568 to 13632 (in bits) (by +64 bits)
'bkops_status ufs_hba::urgent_bkops_lvl' offset changed from 13600 to 13664 (in bits) (by +64 bits)
'bool ufs_hba::is_urgent_bkops_lvl_checked' offset changed from 13632 to 13696 (in bits) (by +64 bits)
'rw_semaphore ufs_hba::clk_scaling_lock' offset changed from 13696 to 13760 (in bits) (by +64 bits)
'ufs_desc_size ufs_hba::desc_size' offset changed from 14016 to 14080 (in bits) (by +64 bits)
'atomic_t ufs_hba::scsi_block_reqs_cnt' offset changed from 14240 to 14304 (in bits) (by +64 bits)
'ufs_crypto_capabilities ufs_hba::crypto_capabilities' offset changed from 14272 to 14336 (in bits) (by +64 bits)
'ufs_crypto_cap_entry* ufs_hba::crypto_cap_array' offset changed from 14336 to 14400 (in bits) (by +64 bits)
'u32 ufs_hba::crypto_cfg_register' offset changed from 14400 to 14464 (in bits) (by +64 bits)
'keyslot_manager* ufs_hba::ksm' offset changed from 14464 to 14528 (in bits) (by +64 bits)
7 impacted interfaces
'struct utp_transfer_cmd_desc at ufshci.h:451:1' changed:
type size changed from 24576 to 8192 (in bits)
there are data member changes:
type 'ufshcd_sg_entry[128]' of 'utp_transfer_cmd_desc::prd_table' changed:
type name changed from 'ufshcd_sg_entry[128]' to 'u8[]'
array type size changed from 16384 to infinity
array type subrange 1 changed length from 128 to infinity
array element type 'struct ufshcd_sg_entry' changed:
entity changed from 'struct ufshcd_sg_entry' to 'typedef u8' at int-ll64.h:17:1
type size changed from 128 to 8 (in bits)
and size changed from 16384 to 0 (in bits) (by -16384 bits)
7 impacted interfaces
Bug: 129991660
Change-Id: I239c2c3bf5de37f4522922a24d46e921e1e2cbd7
Signed-off-by: Alistair Delva <adelva@google.com>
Even if the bridge module is not enabled, we may need the tracepoints
downstream in products that enable bridge.ko, so avoid defining the
export of these symbols based on a config option.
Bug: 150625937
Change-Id: Ib961fd6e353fe3bdfde11a38488568f42f1dbe7a
Signed-off-by: Alistair Delva <adelva@google.com>
This option was accidentally skipped when it was added on arm64.
Bug: 144867487
Change-Id: Ifa87a894954ec9a26d5ab40e7d18e2f2f5e4f416
Signed-off-by: Alistair Delva <adelva@google.com>
Modify the UFSHCD core to allow 'struct ufshcd_sg_entry' to be
variable-length. The default is the standard length, but variants can
override ufs_hba::sg_entry_size with a larger value if there are
vendor-specific fields following the standard ones.
This is needed to support inline encryption with ufs-exynos (FMP).
Bug: 129991660
Signed-off-by: Eric Biggers <ebiggers@google.com>
(cherry picked from android-mainline
commit 8de80df7d7)
(resolved trivial merge conflict in ufshcd_alloc_host())
Change-Id: I6ab9458d5c23331013e6b736d6fea378a6b5b86c
Signed-off-by: Eric Biggers <ebiggers@google.com>
This is a partial revert of Change-Id
I6943cf6b1fedc2b82332c1dcf9a91281a3ca5627.
Bug: 139431025
Test: Treehugger
Signed-off-by: Ram Muthiah <rammuthiah@google.com>
Change-Id: I2ebbe2f20e318cb01206ccbd8c6a5803f7f503b6
Cuttlefish and Goldfish both rely on the virtio console and
HVC_DRIVER is a binary config which is a dep for that driver.
Bug: 150620456
Test: Treehugger
Signed-off-by: Ram Muthiah <rammuthiah@google.com>
Change-Id: I54e7d95da4fcddd534d0f0f48b5c546cd2f2718d
Goldfish and Cuttlefish already use software encrytion drivers and
don't use this one.
Bug: 150620456
Test: Treehugger
Signed-off-by: Ram Muthiah <rammuthiah@google.com>
Change-Id: I72b0155b5db9bc54bfca0ed99734b7c2c513ceac
This binary module gets enabled if BRIDGE, a tristate config, gets
enabled as either a builtin or y. This dependent config should also
be tristate but it seems like that hasn't been done upstream yet.
Bug: 150620456
Test: Treehugger
Signed-off-by: Ram Muthiah <rammuthiah@google.com>
Change-Id: I699b73bfac8a0c6cb5e14fefe56b6c013e2410a8
The check to ensure that the new written value into cpu.uclamp.{min,max}
is within range, [0:100], wasn't working because of the signed
comparison
7301 if (req.percent > UCLAMP_PERCENT_SCALE) {
7302 req.ret = -ERANGE;
7303 return req;
7304 }
# echo -1 > cpu.uclamp.min
# cat cpu.uclamp.min
42949671.96
Cast req.percent into u64 to force the comparison to be unsigned and
work as intended in capacity_from_percent().
# echo -1 > cpu.uclamp.min
sh: write error: Numerical result out of range
Bug: 120440300
Fixes: 2480c09313 ("sched/uclamp: Extend CPU's cgroup controller")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200114210947.14083-1-qais.yousef@arm.com
(cherry picked from commit b562d14064)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I17fc2b119dcbffb212e130ed2c37ae3a8d5bbb61
Changes in 4.19.107
iommu/qcom: Fix bogus detach logic
ALSA: hda: Use scnprintf() for printing texts for sysfs/procfs
ALSA: hda/realtek - Apply quirk for MSI GP63, too
ALSA: hda/realtek - Apply quirk for yet another MSI laptop
ASoC: sun8i-codec: Fix setting DAI data format
ecryptfs: fix a memory leak bug in parse_tag_1_packet()
ecryptfs: fix a memory leak bug in ecryptfs_init_messaging()
thunderbolt: Prevent crash if non-active NVMem file is read
USB: misc: iowarrior: add support for 2 OEMed devices
USB: misc: iowarrior: add support for the 28 and 28L devices
USB: misc: iowarrior: add support for the 100 device
floppy: check FDC index for errors before assigning it
vt: fix scrollback flushing on background consoles
vt: selection, handle pending signals in paste_selection
vt: vt_ioctl: fix race in VT_RESIZEX
staging: android: ashmem: Disallow ashmem memory from being remapped
staging: vt6656: fix sign of rx_dbm to bb_pre_ed_rssi.
xhci: Force Maximum Packet size for Full-speed bulk devices to valid range.
xhci: fix runtime pm enabling for quirky Intel hosts
xhci: Fix memory leak when caching protocol extended capability PSI tables - take 2
usb: host: xhci: update event ring dequeue pointer on purpose
USB: core: add endpoint-blacklist quirk
USB: quirks: blacklist duplicate ep on Sound Devices USBPre2
usb: uas: fix a plug & unplug racing
USB: Fix novation SourceControl XL after suspend
USB: hub: Don't record a connect-change event during reset-resume
USB: hub: Fix the broken detection of USB3 device in SMSC hub
usb: dwc2: Fix SET/CLEAR_FEATURE and GET_STATUS flows
usb: dwc3: gadget: Check for IOC/LST bit in TRB->ctrl fields
staging: rtl8188eu: Fix potential security hole
staging: rtl8188eu: Fix potential overuse of kernel memory
staging: rtl8723bs: Fix potential security hole
staging: rtl8723bs: Fix potential overuse of kernel memory
powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery
jbd2: fix ocfs2 corrupt when clearing block group bits
x86/mce/amd: Publish the bank pointer only after setup has succeeded
x86/mce/amd: Fix kobject lifetime
x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF
serial: 8250: Check UPF_IRQ_SHARED in advance
tty/serial: atmel: manage shutdown in case of RS485 or ISO7816 mode
tty: serial: imx: setup the correct sg entry for tx dma
serdev: ttyport: restore client ops on deregistration
MAINTAINERS: Update drm/i915 bug filing URL
Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"
mm/memcontrol.c: lost css_put in memcg_expand_shrinker_maps()
nvme-multipath: Fix memory leak with ana_log_buf
genirq/irqdomain: Make sure all irq domain flags are distinct
mm/vmscan.c: don't round up scan size for online memory cgroup
drm/amdgpu/soc15: fix xclk for raven
xhci: apply XHCI_PME_STUCK_QUIRK to Intel Comet Lake platforms
KVM: nVMX: Don't emulate instructions in guest mode
KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI
tty: serial: qcom_geni_serial: Fix UART hang
tty: serial: qcom_geni_serial: Remove interrupt storm
tty: serial: qcom_geni_serial: Remove use of *_relaxed() and mb()
tty: serial: qcom_geni_serial: Remove set_rfr_wm() and related variables
tty: serial: qcom_geni_serial: Remove xfer_mode variable
tty: serial: qcom_geni_serial: Fix RX cancel command failure
lib/stackdepot.c: fix global out-of-bounds in stack_slabs
drm/nouveau/kms/gv100-: Re-set LUT after clearing for modesets
ext4: fix a data race in EXT4_I(inode)->i_disksize
ext4: add cond_resched() to __ext4_find_entry()
ext4: fix potential race between online resizing and write operations
ext4: fix potential race between s_group_info online resizing and access
ext4: fix potential race between s_flex_groups online resizing and access
ext4: fix mount failure with quota configured as module
ext4: rename s_journal_flag_rwsem to s_writepages_rwsem
ext4: fix race between writepages and enabling EXT4_EXTENTS_FL
KVM: nVMX: Refactor IO bitmap checks into helper function
KVM: nVMX: Check IO instruction VM-exit conditions
KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1
KVM: apic: avoid calculating pending eoi from an uninitialized val
btrfs: fix bytes_may_use underflow in prealloc error condtition
btrfs: reset fs_root to NULL on error in open_ctree
btrfs: do not check delayed items are empty for single transaction cleanup
Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents
Revert "dmaengine: imx-sdma: Fix memory leak"
scsi: Revert "RDMA/isert: Fix a recently introduced regression related to logout"
scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session"
usb: gadget: composite: Fix bMaxPower for SuperSpeedPlus
usb: dwc2: Fix in ISOC request length checking
staging: rtl8723bs: fix copy of overlapping memory
staging: greybus: use after free in gb_audio_manager_remove_all()
ecryptfs: replace BUG_ON with error handling code
iommu/vt-d: Fix compile warning from intel-svm.h
genirq/proc: Reject invalid affinity masks (again)
bpf, offload: Replace bitwise AND by logical AND in bpf_prog_offload_info_fill
ALSA: rawmidi: Avoid bit fields for state flags
ALSA: seq: Avoid concurrent access to queue flags
ALSA: seq: Fix concurrent access to queue current tick/time
netfilter: xt_hashlimit: limit the max size of hashtable
rxrpc: Fix call RCU cleanup using non-bh-safe locks
ata: ahci: Add shutdown to freeze hardware resources of ahci
xen: Enable interrupts when calling _cond_resched()
s390/mm: Explicitly compare PAGE_DEFAULT_KEY against zero in storage_key_init_range
Revert "char/random: silence a lockdep splat with printk()"
Linux 4.19.107
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I74e3d49c54d4afcfa4049042163cb879c3de3100
Disable CONFIG_RT_GROUP_SCHED to control RT cpu allowance globally.
ABI update report:
ABI DIFFERENCES HAVE BEEN DETECTED! (RC=8)
========================================================
Leaf changes summary: 2 artifacts changed
Changed leaf types summary: 2 leaf types changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
'struct sched_rt_entity at sched.h:481:1' changed:
type size changed from 576 to 384 (in bits)
3 data member deletions:
'sched_rt_entity* sched_rt_entity::parent', at offset 384 (in bits) at sched.h:491:1
'rt_rq* sched_rt_entity::rt_rq', at offset 448 (in bits) at sched.h:493:1
'rt_rq* sched_rt_entity::my_q', at offset 512 (in bits) at sched.h:495:1
1033 impacted interfaces
========================================================
Bug: 149954332
Change-Id: I9487bd113502e52f19637e43109433cb13e97a23
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
CONFIG_BRIDGE is not needed at boot time and is tristate.
Any GKI device which requires this config can load the bridge module
during init.
Bug: 135666008
Test: Treehugger
Signed-off-by: Ram Muthiah <rammuthiah@google.com>
Change-Id: If22ceac2982a0f6b7a922393fb1dd08c68f6bc70
This config will enable the Nintendo Switch Pro controller driver.
Change-Id: I50645a611566928e20a1afd4024f71803ed5fefa
Signed-off-by: Siarhei Vishniakou <svv@google.com>
Bug: 135136477
Test: tested via custom test app
Test: atest NintendoSwitchProTest
The description says 'If unsure, say N.' but
the module is built as M by default (once
the dependencies are satisfied).
When the module is selected (Y or M), it enables
NETFILTER_FAMILY_BRIDGE and SKB_EXTENSIONS
which alter kernel internal structures.
We (Android Studio Emulator) currently do not
use this module and think this it is more consistent
to have it disabled by default as opposite to
disabling it explicitly to prevent enabling
NETFILTER_FAMILY_BRIDGE and SKB_EXTENSIONS.
Signed-off-by: Roman Kiryanov <rkir@google.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 98bda63e20)
[adelva: rediff against missing SKB_EXTENSIONS change]
Bug: 150463745
Change-Id: I664d752f504598d21747d046930e6d7257c31253
The backport of the UNUSED_KSYMS_WHITELIST feature to android-4.19
missed a dependency on $MODVERDIR (which is still there in 4.19), hence
causing it to un-export any symbol that is not on the whitelist, even if
it has an in-tree user.
I was _really_ close to call that a 'feature', but it wasn't exactly
intended, so I'll call it a 'bug' instead.
Kill the bugger.
Bug: 148277666
Fixes: 3d0431a87a ("BACKPORT: FROMLIST: kbuild: allow symbol
whitelisting with TRIM_UNUSED_KSYMS")
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I35449dc92437f2928a659c8ecd6fbf725e0e1b87
This reverts commit 15341b1dd4 which is
commit 1b710b1b10 upstream.
Lech writes:
After upgrading kernel on our boards from v4.19.105 to v4.19.106
we found out that syslog fails to read the messages after ones
read initially after opening /proc/kmsg just after booting.
I also found out, that output of 'dmesg --follow' also doesn't
react on new printks appearing for whatever reason - to read new
messages, reopening /proc/kmsg or /dev/kmsg was needed.
I bisected this down to commit
15341b1dd4 ("char/random: silence
a lockdep splat with printk()"), and reverting it on top of
v4.19.106 restored correct behaviour.
While people dig to find out how such an odd change causes a lockup,
let's just revert this for now as it's not all that big of a deal for
4.19.y.
Reported-by: Lech Perczak <l.perczak@camlintechnologies.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3803247349 upstream.
Clang warns:
In file included from ../arch/s390/purgatory/purgatory.c:10:
In file included from ../include/linux/kexec.h:18:
In file included from ../include/linux/crash_core.h:6:
In file included from ../include/linux/elfcore.h:5:
In file included from ../include/linux/user.h:1:
In file included from ../arch/s390/include/asm/user.h:11:
../arch/s390/include/asm/page.h:45:6: warning: converting the result of
'<<' to a boolean always evaluates to false
[-Wtautological-constant-compare]
if (PAGE_DEFAULT_KEY)
^
../arch/s390/include/asm/page.h:23:44: note: expanded from macro
'PAGE_DEFAULT_KEY'
#define PAGE_DEFAULT_KEY (PAGE_DEFAULT_ACC << 4)
^
1 warning generated.
Explicitly compare this against zero to silence the warning as it is
intended to be used in a boolean context.
Fixes: de3fa841e4 ("s390/mm: fix compile for PAGE_DEFAULT_KEY != 0")
Link: https://github.com/ClangBuiltLinux/linux/issues/860
Link: https://lkml.kernel.org/r/20200214064207.10381-1-natechancellor@gmail.com
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8645e56a4a upstream.
xen_maybe_preempt_hcall() is called from the exception entry point
xen_do_hypervisor_callback with interrupts disabled.
_cond_resched() evades the might_sleep() check in cond_resched() which
would have caught that and schedule_debug() unfortunately lacks a check
for irqs_disabled().
Enable interrupts around the call and use cond_resched() to catch future
issues.
Fixes: fdfd811ddd ("x86/xen: allow privcmd hypercalls to be preempted")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/878skypjrh.fsf@nanos.tec.linutronix.de
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10a663a1b1 upstream.
device_shutdown() called from reboot or power_shutdown expect
all devices to be shutdown. Same is true for even ahci pci driver.
As no ahci shutdown function is implemented, the ata subsystem
always remains alive with DMA & interrupt support. File system
related calls should not be honored after device_shutdown().
So defining ahci pci driver shutdown to freeze hardware (mask
interrupt, stop DMA engine and free DMA resources).
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 963485d436 upstream.
rxrpc_rcu_destroy_call(), which is called as an RCU callback to clean up a
put call, calls rxrpc_put_connection() which, deep in its bowels, takes a
number of spinlocks in a non-BH-safe way, including rxrpc_conn_id_lock and
local->client_conns_lock. RCU callbacks, however, are normally called from
softirq context, which can cause lockdep to notice the locking
inconsistency.
To get lockdep to detect this, it's necessary to have the connection
cleaned up on the put at the end of the last of its calls, though normally
the clean up is deferred. This can be induced, however, by starting a call
on an AF_RXRPC socket and then closing the socket without reading the
reply.
Fix this by having rxrpc_rcu_destroy_call() punt the destruction to a
workqueue if in softirq-mode and defer the destruction to process context.
Note that another way to fix this could be to add a bunch of bh-disable
annotations to the spinlocks concerned - and there might be more than just
those two - but that means spending more time with BHs disabled.
Note also that some of these places were covered by bh-disable spinlocks
belonging to the rxrpc_transport object, but these got removed without the
_bh annotation being retained on the next lock in.
Fixes: 999b69f892 ("rxrpc: Kill the client connection bundle concept")
Reported-by: syzbot+d82f3ac8d87e7ccbb2c9@syzkaller.appspotmail.com
Reported-by: syzbot+3f1fd6b8cbf8702d134e@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dc7497795e upstream.
snd_seq_check_queue() passes the current tick and time of the given
queue as a pointer to snd_seq_prioq_cell_out(), but those might be
updated concurrently by the seq timer update.
Fix it by retrieving the current tick and time via the proper helper
functions at first, and pass those values to snd_seq_prioq_cell_out()
later in the loops.
snd_seq_timer_get_cur_time() takes a new argument and adjusts with the
current system time only when it's requested so; this update isn't
needed for snd_seq_check_queue(), as it's called either from the
interrupt handler or right after queuing.
Also, snd_seq_timer_get_cur_tick() is changed to read the value in the
spinlock for the concurrency, too.
Reported-by: syzbot+fd5e0eaa1a32999173b2@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20200214111316.26939-3-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dfa9a5efe8 upstream.
The rawmidi state flags (opened, append, active_sensing) are stored in
bit fields that can be potentially racy when concurrently accessed
without any locks. Although the current code should be fine, there is
also no any real benefit by keeping the bitfields for this kind of
short number of members.
This patch changes those bit fields flags to the simple bool fields.
There should be no size increase of the snd_rawmidi_substream by this
change.
Reported-by: syzbot+576cc007eb9f2c968200@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20200214111316.26939-4-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e20d3a055a upstream.
This if guards whether user-space wants a copy of the offload-jited
bytecode and whether this bytecode exists. By erroneously doing a bitwise
AND instead of a logical AND on user- and kernel-space buffer-size can lead
to no data being copied to user-space especially when user-space size is a
power of two and bigger then the kernel-space buffer.
Fixes: fcfb126def ("bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info")
Signed-off-by: Johannes Krude <johannes@krude.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/20200212193227.GA3769@phlox.h.transitiv.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cba6437a18 upstream.
Qian Cai reported that the WARN_ON() in the x86/msi affinity setting code,
which catches cases where the affinity setting is not done on the CPU which
is the current target of the interrupt, triggers during CPU hotplug stress
testing.
It turns out that the warning which was added with the commit addressing
the MSI affinity race unearthed yet another long standing bug.
If user space writes a bogus affinity mask, i.e. it contains no online CPUs,
then it calls irq_select_affinity_usr(). This was introduced for ALPHA in
eee45269b0 ("[PATCH] Alpha: convert to generic irq framework (generic part)")
and subsequently made available for all architectures in
1840475676 ("genirq: Expose default irq affinity mask (take 3)")
which introduced the circumvention of the affinity setting restrictions for
interrupt which cannot be moved in process context.
The whole exercise is bogus in various aspects:
1) If the interrupt is already started up then there is absolutely
no point to honour a bogus interrupt affinity setting from user
space. The interrupt is already assigned to an online CPU and it
does not make any sense to reassign it to some other randomly
chosen online CPU.
2) If the interupt is not yet started up then there is no point
either. A subsequent startup of the interrupt will invoke
irq_setup_affinity() anyway which will chose a valid target CPU.
So the only correct solution is to just return -EINVAL in case user space
wrote an affinity mask which does not contain any online CPUs, except for
ALPHA which has it's own magic sauce for this.
Fixes: 1840475676 ("genirq: Expose default irq affinity mask (take 3)")
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Qian Cai <cai@lca.pw>
Link: https://lkml.kernel.org/r/878sl8xdbm.fsf@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e7598fac32 upstream.
The intel_svm_is_pasid_valid() needs to be marked inline, otherwise it
causes the compile warning below:
CC [M] drivers/dma/idxd/cdev.o
In file included from drivers/dma/idxd/cdev.c:9:0:
./include/linux/intel-svm.h:125:12: warning: ‘intel_svm_is_pasid_valid’ defined but not used [-Wunused-function]
static int intel_svm_is_pasid_valid(struct device *dev, int pasid)
^~~~~~~~~~~~~~~~~~~~~~~~
Reported-by: Borislav Petkov <bp@alien8.de>
Fixes: 15060aba71 ('iommu/vt-d: Helper function to query if a pasid has any active users')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c2a7552dd upstream.
In crypt_scatterlist, if the crypt_stat argument is not set up
correctly, the kernel crashes. Instead, by returning an error code
upstream, the error is handled safely.
The issue is detected via a static analysis tool written by us.
Fixes: 237fead619 (ecryptfs: fs/Makefile and fs/Kconfig)
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: Tyler Hicks <code@tyhicks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c724417baf upstream.
SuperSpeedPlus peripherals must report their bMaxPower of the
configuration descriptor in units of 8mA as per the USB 3.2
specification. The current switch statement in encode_bMaxPower()
only checks for USB_SPEED_SUPER but not USB_SPEED_SUPER_PLUS so
the latter falls back to USB 2.0 encoding which uses 2mA units.
Replace the switch with a simple if/else.
Fixes: eae5820b85 ("usb: gadget: composite: Write SuperSpeedPlus config descriptors")
Signed-off-by: Jack Pham <jackp@codeaurora.org>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e75fd33b3f upstream.
In btrfs_wait_ordered_range() once we find an ordered extent that has
finished with an error we exit the loop and don't wait for any other
ordered extents that might be still in progress.
All the users of btrfs_wait_ordered_range() expect that there are no more
ordered extents in progress after that function returns. So past fixes
such like the ones from the two following commits:
ff612ba784 ("btrfs: fix panic during relocation after ENOSPC before
writeback happens")
28aeeac1dd ("Btrfs: fix panic when starting bg cache writeout after
IO error")
don't work when there are multiple ordered extents in the range.
Fix that by making btrfs_wait_ordered_range() wait for all ordered extents
even after it finds one that had an error.
Link: https://github.com/kdave/btrfs-progs/issues/228#issuecomment-569777554
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e90315149 upstream.
btrfs_assert_delayed_root_empty() will check if the delayed root is
completely empty, but this is a filesystem-wide check. On cleanup we
may have allowed other transactions to begin, for whatever reason, and
thus the delayed root is not empty.
So remove this check from cleanup_one_transation(). This however can
stay in btrfs_cleanup_transaction(), because it checks only after all of
the transactions have been properly cleaned up, and thus is valid.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 315bf8ef91 upstream.
While running my error injection script I hit a panic when we tried to
clean up the fs_root when freeing the fs_root. This is because
fs_info->fs_root == PTR_ERR(-EIO), which isn't great. Fix this by
setting fs_info->fs_root = NULL; if we fail to read the root.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b778cf962d upstream.
I hit the following warning while running my error injection stress
testing:
WARNING: CPU: 3 PID: 1453 at fs/btrfs/space-info.h:108 btrfs_free_reserved_data_space_noquota+0xfd/0x160 [btrfs]
RIP: 0010:btrfs_free_reserved_data_space_noquota+0xfd/0x160 [btrfs]
Call Trace:
btrfs_free_reserved_data_space+0x4f/0x70 [btrfs]
__btrfs_prealloc_file_range+0x378/0x470 [btrfs]
elfcorehdr_read+0x40/0x40
? elfcorehdr_read+0x40/0x40
? btrfs_commit_transaction+0xca/0xa50 [btrfs]
? dput+0xb4/0x2a0
? btrfs_log_dentry_safe+0x55/0x70 [btrfs]
? btrfs_sync_file+0x30e/0x420 [btrfs]
? do_fsync+0x38/0x70
? __x64_sys_fdatasync+0x13/0x20
? do_syscall_64+0x5b/0x1b0
? entry_SYSCALL_64_after_hwframe+0x44/0xa9
This happens if we fail to insert our reserved file extent. At this
point we've already converted our reservation from ->bytes_may_use to
->bytes_reserved. However once we break we will attempt to free
everything from [cur_offset, end] from ->bytes_may_use, but our extent
reservation will overlap part of this.
Fix this problem by adding ins.offset (our extent allocation size) to
cur_offset so we remove the actual remaining part from ->bytes_may_use.
I validated this fix using my inject-error.py script
python inject-error.py -o should_fail_bio -t cache_save_setup -t \
__btrfs_prealloc_file_range \
-t insert_reserved_file_extent.constprop.0 \
-r "-5" ./run-fsstress.sh
where run-fsstress.sh simply mounts and runs fsstress on a disk.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 91a5f413af upstream.
Even when APICv is disabled for L1 it can (and, actually, is) still
available for L2, this means we need to always call
vmx_deliver_nested_posted_interrupt() when attempting an interrupt
delivery.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>