This reverts commit 5c5bb51465.
It breaks the abi and is nothing that is needed for any Android devices.
Bug: 161946584
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6f14268eae3250b096ed9e12c5d66bec8935dc11
Changes in 4.19.178
HID: make arrays usage and value to be the same
USB: quirks: sort quirk entries
usb: quirks: add quirk to start video capture on ELMO L-12F document camera reliable
ntfs: check for valid standard information attribute
arm64: tegra: Add power-domain for Tegra210 HDA
scripts: use pkg-config to locate libcrypto
scripts: set proper OpenSSL include dir also for sign-file
block: add helper for checking if queue is registered
block: split .sysfs_lock into two locks
block: fix race between switching elevator and removing queues
block: don't release queue's sysfs lock during switching elevator
NET: usb: qmi_wwan: Adding support for Cinterion MV31
cifs: Set CIFS_MOUNT_USE_PREFIX_PATH flag on setting cifs_sb->prepath.
scripts/recordmcount.pl: support big endian for ARCH sh
jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations
locking/static_key: Fix false positive warnings on concurrent dec/inc
vmlinux.lds.h: add DWARF v5 sections
kdb: Make memory allocations more robust
PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064
bfq: Avoid false bfq queue merging
ALSA: usb-audio: Fix PCM buffer allocation in non-vmalloc mode
MIPS: vmlinux.lds.S: add missing PAGE_ALIGNED_DATA() section
random: fix the RNDRESEEDCRNG ioctl
ath10k: Fix error handling in case of CE pipe init failure
Bluetooth: btqcomsmd: Fix a resource leak in error handling paths in the probe function
Bluetooth: Fix initializing response id after clearing struct
ARM: dts: exynos: correct PMIC interrupt trigger level on Artik 5
ARM: dts: exynos: correct PMIC interrupt trigger level on Monk
ARM: dts: exynos: correct PMIC interrupt trigger level on Rinato
ARM: dts: exynos: correct PMIC interrupt trigger level on Spring
ARM: dts: exynos: correct PMIC interrupt trigger level on Arndale Octa
ARM: dts: exynos: correct PMIC interrupt trigger level on Odroid XU3 family
arm64: dts: exynos: correct PMIC interrupt trigger level on TM2
arm64: dts: exynos: correct PMIC interrupt trigger level on Espresso
bpf: Avoid warning when re-casting __bpf_call_base into __bpf_call_base_args
arm64: dts: allwinner: A64: properly connect USB PHY to port 0
arm64: dts: allwinner: Drop non-removable from SoPine/LTS SD card
arm64: dts: allwinner: A64: Limit MMC2 bus frequency to 150 MHz
cpufreq: brcmstb-avs-cpufreq: Free resources in error path
cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove()
ACPICA: Fix exception code class checks
usb: gadget: u_audio: Free requests only after callback
Bluetooth: drop HCI device reference before return
Bluetooth: Put HCI device if inquiry procedure interrupts
memory: ti-aemif: Drop child node when jumping out loop
ARM: dts: Configure missing thermal interrupt for 4430
usb: dwc2: Do not update data length if it is 0 on inbound transfers
usb: dwc2: Abort transaction after errors with unknown reason
usb: dwc2: Make "trimming xfer length" a debug message
staging: rtl8723bs: wifi_regd.c: Fix incorrect number of regulatory rules
ARM: dts: armada388-helios4: assign pinctrl to LEDs
ARM: dts: armada388-helios4: assign pinctrl to each fan
arm64: dts: msm8916: Fix reserved and rfsa nodes unit address
ARM: s3c: fix fiq for clang IAS
soc: aspeed: snoop: Add clock control logic
bpf_lru_list: Read double-checked variable once without lock
ath9k: fix data bus crash when setting nf_override via debugfs
ibmvnic: Set to CLOSED state even on error
bnxt_en: reverse order of TX disable and carrier off
xen/netback: fix spurious event detection for common event case
mac80211: fix potential overflow when multiplying to u32 integers
bpf: Fix bpf_fib_lookup helper MTU check for SKB ctx
tcp: fix SO_RCVLOWAT related hangs under mem pressure
cxgb4/chtls/cxgbit: Keeping the max ofld immediate data size same in cxgb4 and ulds
b43: N-PHY: Fix the update of coef for the PHY revision >= 3case
ibmvnic: add memory barrier to protect long term buffer
ibmvnic: skip send_request_unmap for timeout reset
net: amd-xgbe: Reset the PHY rx data path when mailbox command timeout
net: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning
net: amd-xgbe: Reset link when the link never comes back
net: amd-xgbe: Fix network fluctuations when using 1G BELFUSE SFP
net: mvneta: Remove per-cpu queue mapping for Armada 3700
fbdev: aty: SPARC64 requires FB_ATY_CT
drm/gma500: Fix error return code in psb_driver_load()
gma500: clean up error handling in init
crypto: sun4i-ss - fix kmap usage
drm/amdgpu: Fix macro name _AMDGPU_TRACE_H_ in preprocessor if condition
MIPS: c-r4k: Fix section mismatch for loongson2_sc_init
MIPS: lantiq: Explicitly compare LTQ_EBU_PCC_ISTAT against 0
media: i2c: ov5670: Fix PIXEL_RATE minimum value
media: camss: missing error code in msm_video_register()
media: vsp1: Fix an error handling path in the probe function
media: em28xx: Fix use-after-free in em28xx_alloc_urbs
media: media/pci: Fix memleak in empress_init
media: tm6000: Fix memleak in tm6000_start_stream
ASoC: cs42l56: fix up error handling in probe
crypto: bcm - Rename struct device_private to bcm_device_private
drm/amd/display: Fix 10/12 bpc setup in DCE output bit depth reduction.
media: lmedm04: Fix misuse of comma
media: qm1d1c0042: fix error return code in qm1d1c0042_init()
media: cx25821: Fix a bug when reallocating some dma memory
media: pxa_camera: declare variable when DEBUG is defined
media: uvcvideo: Accept invalid bFormatIndex and bFrameIndex values
crypto: talitos - Work around SEC6 ERRATA (AES-CTR mode data size error)
ata: ahci_brcm: Add back regulators management
ASoC: cpcap: fix microphone timeslot mask
f2fs: fix to avoid inconsistent quota data
drm/amdgpu: Prevent shift wrapping in amdgpu_read_mask()
Drivers: hv: vmbus: Avoid use-after-free in vmbus_onoffer_rescind()
btrfs: clarify error returns values in __load_free_space_cache
hwrng: timeriomem - Fix cooldown period calculation
crypto: ecdh_helper - Ensure 'len >= secret.len' in decode_key()
ima: Free IMA measurement buffer on error
ima: Free IMA measurement buffer after kexec syscall
fs/jfs: fix potential integer overflow on shift of a int
jffs2: fix use after free in jffs2_sum_write_data()
capabilities: Don't allow writing ambiguous v3 file capabilities
clk: meson: clk-pll: fix initializing the old rate (fallback) for a PLL
quota: Fix memory leak when handling corrupted quota file
spi: cadence-quadspi: Abort read if dummy cycles required are too many
clk: sunxi-ng: h6: Fix CEC clock
HID: core: detect and skip invalid inputs to snto32()
dmaengine: fsldma: Fix a resource leak in the remove function
dmaengine: fsldma: Fix a resource leak in an error handling path of the probe function
dmaengine: owl-dma: Fix a resource leak in the remove function
dmaengine: hsu: disable spurious interrupt
mfd: bd9571mwv: Use devm_mfd_add_devices()
fdt: Properly handle "no-map" field in the memory region
of/fdt: Make sure no-map does not remove already reserved regions
power: reset: at91-sama5d2_shdwc: fix wkupdbc mask
rtc: s5m: select REGMAP_I2C
clocksource/drivers/mxs_timer: Add missing semicolon when DEBUG is defined
RDMA/mlx5: Use the correct obj_id upon DEVX TIR creation
clk: sunxi-ng: h6: Fix clock divider range on some clocks
regulator: axp20x: Fix reference cout leak
certs: Fix blacklist flag type confusion
spi: atmel: Put allocated master before return
regulator: s5m8767: Drop regulators OF node reference
isofs: release buffer head before return
auxdisplay: ht16k33: Fix refresh rate handling
IB/umad: Return EIO in case of when device disassociated
IB/umad: Return EPOLLERR in case of when device disassociated
KVM: PPC: Make the VMX instruction emulation routines static
powerpc/47x: Disable 256k page size
mmc: usdhi6rol0: Fix a resource leak in the error handling path of the probe
mmc: renesas_sdhi_internal_dmac: Fix DMA buffer alignment from 8 to 128-bytes
ARM: 9046/1: decompressor: Do not clear SCTLR.nTLSMD for ARMv7+ cores
amba: Fix resource leak for drivers without .remove
tracepoint: Do not fail unregistering a probe due to memory failure
perf tools: Fix DSO filtering when not finding a map for a sampled address
RDMA/rxe: Fix coding error in rxe_recv.c
RDMA/rxe: Correct skb on loopback path
spi: stm32: properly handle 0 byte transfer
mfd: wm831x-auxadc: Prevent use after free in wm831x_auxadc_read_irq()
powerpc/pseries/dlpar: handle ibm, configure-connector delay status
powerpc/8xx: Fix software emulation interrupt
clk: qcom: gcc-msm8998: Fix Alpha PLL type for all GPLLs
spi: pxa2xx: Fix the controller numbering for Wildcat Point
Input: sur40 - fix an error code in sur40_probe()
perf intel-pt: Fix missing CYC processing in PSB
perf test: Fix unaligned access in sample parsing test
Input: elo - fix an error code in elo_connect()
sparc64: only select COMPAT_BINFMT_ELF if BINFMT_ELF is set
misc: eeprom_93xx46: Fix module alias to enable module autoprobe
misc: eeprom_93xx46: Add module alias to avoid breaking support for non device tree users
pwm: rockchip: rockchip_pwm_probe(): Remove superfluous clk_unprepare()
VMCI: Use set_page_dirty_lock() when unregistering guest memory
PCI: Align checking of syscall user config accessors
drm/msm/dsi: Correct io_start for MSM8994 (20nm PHY)
ext4: fix potential htree index checksum corruption
regmap: sdw: use _no_pm functions in regmap_read/write
i40e: Fix flow for IPv6 next header (extension header)
i40e: Add zero-initialization of AQ command structures
i40e: Fix overwriting flow control settings during driver loading
i40e: Fix VFs not created
i40e: Fix add TC filter for IPv6
net/mlx4_core: Add missed mlx4_free_cmd_mailbox()
vxlan: move debug check after netdev unregister
ocfs2: fix a use after free on error
mm/memory.c: fix potential pte_unmap_unlock pte error
mm/hugetlb: fix potential double free in hugetlb_register_node() error path
r8169: fix jumbo packet handling on RTL8168e
arm64: Add missing ISB after invalidating TLB in __primary_switch
i2c: brcmstb: Fix brcmstd_send_i2c_cmd condition
mm/rmap: fix potential pte_unmap on an not mapped pte
scsi: bnx2fc: Fix Kconfig warning & CNIC build errors
blk-settings: align max_sectors on "logical_block_size" boundary
ACPI: property: Fix fwnode string properties matching
ACPI: configfs: add missing check after configfs_register_default_group()
HID: wacom: Ignore attempts to overwrite the touch_max value from HID
Input: raydium_ts_i2c - do not send zero length
Input: xpad - add support for PowerA Enhanced Wired Controller for Xbox Series X|S
Input: joydev - prevent potential read overflow in ioctl
Input: i8042 - add ASUS Zenbook Flip to noselftest list
USB: serial: option: update interface mapping for ZTE P685M
usb: musb: Fix runtime PM race in musb_queue_resume_work
usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1
usb: dwc3: gadget: Fix dep->interval for fullspeed interrupt
USB: serial: ftdi_sio: fix FTX sub-integer prescaler
USB: serial: mos7840: fix error code in mos7840_write()
USB: serial: mos7720: fix error code in mos7720_write()
ALSA: hda/realtek: modify EAPD in the ALC886
tpm_tis: Fix check_locality for correct locality acquisition
tpm_tis: Clean up locality release
KEYS: trusted: Fix migratable=1 failing
btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root
btrfs: fix reloc root leak with 0 ref reloc roots on recovery
btrfs: fix extent buffer leak on failure to copy root
crypto: arm64/sha - add missing module aliases
crypto: sun4i-ss - checking sg length is not sufficient
crypto: sun4i-ss - handle BigEndian for cipher
seccomp: Add missing return in non-void function
misc: rtsx: init of rts522a add OCP power off when no card is present
drivers/misc/vmw_vmci: restrict too big queue size in qp_host_alloc_queue
pstore: Fix typo in compression option name
dts64: mt7622: fix slow sd card access
staging/mt7621-dma: mtk-hsdma.c->hsdma-mt7621.c
staging: gdm724x: Fix DMA from stack
staging: rtl8188eu: Add Edimax EW-7811UN V2 to device table
media: ipu3-cio2: Fix mbus_code processing in cio2_subdev_set_fmt()
x86/reboot: Force all cpus to exit VMX root if VMX is supported
floppy: reintroduce O_NDELAY fix
arm64: uprobe: Return EOPNOTSUPP for AARCH32 instruction probing
watchdog: mei_wdt: request stop on unregister
mtd: spi-nor: hisi-sfc: Put child node np on error path
fs/affs: release old buffer head on error path
seq_file: document how per-entry resources are managed.
x86: fix seq_file iteration for pat/memtype.c
hugetlb: fix copy_huge_page_from_user contig page struct assumption
libnvdimm/dimm: Avoid race between probe and available_slots_show()
arm64: Extend workaround for erratum 1024718 to all versions of Cortex-A55
module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols
mmc: sdhci-esdhc-imx: fix kernel panic when remove module
gpio: pcf857x: Fix missing first interrupt
printk: fix deadlock when kernel panic
cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if available
f2fs: fix out-of-repair __setattr_copy()
sparc32: fix a user-triggerable oops in clear_user()
gfs2: Don't skip dlm unlock if glock has an lvb
dm: fix deadlock when swapping to encrypted device
dm era: Recover committed writeset after crash
dm era: Verify the data block size hasn't changed
dm era: Fix bitset memory leaks
dm era: Use correct value size in equality function of writeset tree
dm era: Reinitialize bitset cache before digesting a new writeset
dm era: only resize metadata in preresume
icmp: introduce helper for nat'd source address in network device context
icmp: allow icmpv6_ndo_send to work with CONFIG_IPV6=n
gtp: use icmp_ndo_send helper
sunvnet: use icmp_ndo_send helper
xfrm: interface: use icmp_ndo_send helper
ipv6: icmp6: avoid indirect call for icmpv6_send()
ipv6: silence compilation warning for non-IPV6 builds
net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending
dm era: Update in-core bitset after committing the metadata
net: qrtr: Fix memory leak in qrtr_tun_open
ARM: dts: aspeed: Add LCLK to lpc-snoop
Linux 4.19.178
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8c07c10dd29a1233f238b533622d7b32bd22bdb0
commit 2099b145d7 upstream.
In case of a system crash, dm-era might fail to mark blocks as written
in its metadata, although the corresponding writes to these blocks were
passed down to the origin device and completed successfully.
Consider the following sequence of events:
1. We write to a block that has not been yet written in the current era
2. era_map() checks the in-core bitmap for the current era and sees
that the block is not marked as written.
3. The write is deferred for submission after the metadata have been
updated and committed.
4. The worker thread processes the deferred write
(process_deferred_bios()) and marks the block as written in the
in-core bitmap, **before** committing the metadata.
5. The worker thread starts committing the metadata.
6. We do more writes that map to the same block as the write of step (1)
7. era_map() checks the in-core bitmap and sees that the block is marked
as written, **although the metadata have not been committed yet**.
8. These writes are passed down to the origin device immediately and the
device reports them as completed.
9. The system crashes, e.g., power failure, before the commit from step
(5) finishes.
When the system recovers and we query the dm-era target for the list of
written blocks it doesn't report the aforementioned block as written,
although the writes of step (6) completed successfully.
The issue is that era_map() decides whether to defer or not a write
based on non committed information. The root cause of the bug is that we
update the in-core bitmap, **before** committing the metadata.
Fix this by updating the in-core bitmap **after** successfully
committing the metadata.
Fixes: eec40579d8 ("dm: add era target")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee576c47db upstream.
The icmp{,v6}_send functions make all sorts of use of skb->cb, casting
it with IPCB or IP6CB, assuming the skb to have come directly from the
inet layer. But when the packet comes from the ndo layer, especially
when forwarded, there's no telling what might be in skb->cb at that
point. As a result, the icmp sending code risks reading bogus memory
contents, which can result in nasty stack overflows such as this one
reported by a user:
panic+0x108/0x2ea
__stack_chk_fail+0x14/0x20
__icmp_send+0x5bd/0x5c0
icmp_ndo_send+0x148/0x160
In icmp_send, skb->cb is cast with IPCB and an ip_options struct is read
from it. The optlen parameter there is of particular note, as it can
induce writes beyond bounds. There are quite a few ways that can happen
in __ip_options_echo. For example:
// sptr/skb are attacker-controlled skb bytes
sptr = skb_network_header(skb);
// dptr/dopt points to stack memory allocated by __icmp_send
dptr = dopt->__data;
// sopt is the corrupt skb->cb in question
if (sopt->rr) {
optlen = sptr[sopt->rr+1]; // corrupt skb->cb + skb->data
soffset = sptr[sopt->rr+2]; // corrupt skb->cb + skb->data
// this now writes potentially attacker-controlled data, over
// flowing the stack:
memcpy(dptr, sptr+sopt->rr, optlen);
}
In the icmpv6_send case, the story is similar, but not as dire, as only
IP6CB(skb)->iif and IP6CB(skb)->dsthao are used. The dsthao case is
worse than the iif case, but it is passed to ipv6_find_tlv, which does
a bit of bounds checking on the value.
This is easy to simulate by doing a `memset(skb->cb, 0x41,
sizeof(skb->cb));` before calling icmp{,v6}_ndo_send, and it's only by
good fortune and the rarity of icmp sending from that context that we've
avoided reports like this until now. For example, in KASAN:
BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xa0e/0x12b0
Write of size 38 at addr ffff888006f1f80e by task ping/89
CPU: 2 PID: 89 Comm: ping Not tainted 5.10.0-rc7-debug+ #5
Call Trace:
dump_stack+0x9a/0xcc
print_address_description.constprop.0+0x1a/0x160
__kasan_report.cold+0x20/0x38
kasan_report+0x32/0x40
check_memory_region+0x145/0x1a0
memcpy+0x39/0x60
__ip_options_echo+0xa0e/0x12b0
__icmp_send+0x744/0x1700
Actually, out of the 4 drivers that do this, only gtp zeroed the cb for
the v4 case, while the rest did not. So this commit actually removes the
gtp-specific zeroing, while putting the code where it belongs in the
shared infrastructure of icmp{,v6}_ndo_send.
This commit fixes the issue by passing an empty IPCB or IP6CB along to
the functions that actually do the work. For the icmp_send, this was
already trivial, thanks to __icmp_send providing the plumbing function.
For icmpv6_send, this required a tiny bit of refactoring to make it
behave like the v4 case, after which it was straight forward.
Fixes: a2b78e9b2c ("sunvnet: generate ICMP PTMUD messages for smaller port MTUs")
Reported-by: SinYu <liuxyon@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/netdev/CAF=yD-LOF116aHub6RMe8vB8ZpnrrnoTdqhobEx+bvoA8AsP0w@mail.gmail.com/T/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20210223131858.72082-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1faba27f11 upstream.
The W=1 compilation of allmodconfig generates the following warning:
net/ipv6/icmp.c:448:6: warning: no previous prototype for 'icmp6_send' [-Wmissing-prototypes]
448 | void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
| ^~~~~~~~~~
Fix it by providing function declaration for builds with ipv6 as a module.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc7a21b6fb upstream.
If IPv6 is builtin, we do not need an expensive indirect call
to reach icmp6_send().
v2: put inline keyword before the type to avoid sparse warnings.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67c9a7e1e3 upstream.
Because sunvnet is calling icmp from network device context, it should use
the ndo helper so that the rate limiting applies correctly. While we're
at it, doing the additional route lookup before calling icmp_ndo_send is
superfluous, since this is the job of the icmp code in the first place.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0fce6f945 upstream.
Because gtp is calling icmp from network device context, it should use
the ndo helper so that the rate limiting applies correctly.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8e41f6033 upstream.
The icmpv6_send function has long had a static inline implementation
with an empty body for CONFIG_IPV6=n, so that code calling it doesn't
need to be ifdef'd. The new icmpv6_ndo_send function, which is intended
for drivers as a drop-in replacement with an identical function
signature, should follow the same pattern. Without this patch, drivers
that used to work with CONFIG_IPV6=n now result in a linker error.
Cc: Chen Zhou <chenzhou10@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 0b41713b60 ("icmp: introduce helper for nat'd source address in network device context")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b41713b60 upstream.
This introduces a helper function to be called only by network drivers
that wraps calls to icmp[v6]_send in a conntrack transformation, in case
NAT has been used. We don't want to pollute the non-driver path, though,
so we introduce this as a helper to be called by places that actually
make use of this, as suggested by Florian.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cca2c6aebe upstream.
Metadata resize shouldn't happen in the ctr. The ctr loads a temporary
(inactive) table that will only become active upon resume. That is why
resize should always be done in terms of resume. Otherwise a load (ctr)
whose inactive table never becomes active will incorrectly resize the
metadata.
Also, perform the resize directly in preresume, instead of using the
worker to do it.
The worker might run other metadata operations, e.g., it could start
digestion, before resizing the metadata. These operations will end up
using the old size.
This could lead to errors, like:
device-mapper: era: metadata_digest_transcribe_writeset: dm_array_set_value failed
device-mapper: era: process_old_eras: digest step failed, stopping digestion
The reason of the above error is that the worker started the digestion
of the archived writeset using the old, larger size.
As a result, metadata_digest_transcribe_writeset tried to write beyond
the end of the era array.
Fixes: eec40579d8 ("dm: add era target")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2524933307 upstream.
In case of devices with at most 64 blocks, the digestion of consecutive
eras uses the writeset of the first era as the writeset of all eras to
digest, leading to lost writes. That is, we lose the information about
what blocks were written during the affected eras.
The digestion code uses a dm_disk_bitset object to access the archived
writesets. This structure includes a one word (64-bit) cache to reduce
the number of array lookups.
This structure is initialized only once, in metadata_digest_start(),
when we kick off digestion.
But, when we insert a new writeset into the writeset tree, before the
digestion of the previous writeset is done, or equivalently when there
are multiple writesets in the writeset tree to digest, then all these
writesets are digested using the same cache and the cache is not
re-initialized when moving from one writeset to the next.
For devices with more than 64 blocks, i.e., the size of the cache, the
cache is indirectly invalidated when we move to a next set of blocks, so
we avoid the bug.
But for devices with at most 64 blocks we end up using the same cached
data for digesting all archived writesets, i.e., the cache is loaded
when digesting the first writeset and it never gets reloaded, until the
digestion is done.
As a result, the writeset of the first era to digest is used as the
writeset of all the following archived eras, leading to lost writes.
Fix this by reinitializing the dm_disk_bitset structure, and thus
invalidating the cache, every time the digestion code starts digesting a
new writeset.
Fixes: eec40579d8 ("dm: add era target")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c8e846ff93 upstream.
dm-era doesn't support changing the data block size of existing devices,
so check explicitly that the requested block size for a new target
matches the one stored in the metadata.
Fixes: eec40579d8 ("dm: add era target")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Reviewed-by: Ming-Hung Tsai <mtsai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit de89afc1e4 upstream.
Following a system crash, dm-era fails to recover the committed writeset
for the current era, leading to lost writes. That is, we lose the
information about what blocks were written during the affected era.
dm-era assumes that the writeset of the current era is archived when the
device is suspended. So, when resuming the device, it just moves on to
the next era, ignoring the committed writeset.
This assumption holds when the device is properly shut down. But, when
the system crashes, the code that suspends the target never runs, so the
writeset for the current era is not archived.
There are three issues that cause the committed writeset to get lost:
1. dm-era doesn't load the committed writeset when opening the metadata
2. The code that resizes the metadata wipes the information about the
committed writeset (assuming it was loaded at step 1)
3. era_preresume() starts a new era, without taking into account that
the current era might not have been archived, due to a system crash.
To fix this:
1. Load the committed writeset when opening the metadata
2. Fix the code that resizes the metadata to make sure it doesn't wipe
the loaded writeset
3. Fix era_preresume() to check for a loaded writeset and archive it,
before starting a new era.
Fixes: eec40579d8 ("dm: add era target")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a666e5c05e upstream.
The system would deadlock when swapping to a dm-crypt device. The reason
is that for each incoming write bio, dm-crypt allocates memory that holds
encrypted data. These excessive allocations exhaust all the memory and the
result is either deadlock or OOM trigger.
This patch limits the number of in-flight swap bios, so that the memory
consumed by dm-crypt is limited. The limit is enforced if the target set
the "limit_swap_bios" variable and if the bio has REQ_SWAP set.
Non-swap bios are not affected becuase taking the semaphore would cause
performance degradation.
This is similar to request-based drivers - they will also block when the
number of requests is over the limit.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 78178ca844 upstream.
Patch fb6791d100 was designed to allow gfs2 to unmount quicker by
skipping the step where it tells dlm to unlock glocks in EX with lvbs.
This was done because when gfs2 unmounts a file system, it destroys the
dlm lockspace shortly after it destroys the glocks so it doesn't need to
unlock them all: the unlock is implied when the lockspace is destroyed
by dlm.
However, that patch introduced a use-after-free in dlm: as part of its
normal dlm_recoverd process, it can call ls_recovery to recover dead
locks. In so doing, it can call recover_rsbs which calls recover_lvb for
any mastered rsbs. Func recover_lvb runs through the list of lkbs queued
to the given rsb (if the glock is cached but unlocked, it will still be
queued to the lkb, but in NL--Unlocked--mode) and if it has an lvb,
copies it to the rsb, thus trying to preserve the lkb. However, when
gfs2 skips the dlm unlock step, it frees the glock and its lvb, which
means dlm's function recover_lvb references the now freed lvb pointer,
copying the freed lvb memory to the rsb.
This patch changes the check in gdlm_put_lock so that it calls
dlm_unlock for all glocks that contain an lvb pointer.
Fixes: fb6791d100 ("GFS2: skip dlm_unlock calls in unmount")
Cc: stable@vger.kernel.org # v3.8+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7780918b36 upstream.
Back in 2.1.29 the clear_user() guts (__bzero()) had been merged
with memset(). Unfortunately, while all exception handlers had been
copied, one of the exception table entries got lost. As the result,
clear_user() starting at 128*n bytes before the end of page and
spanning between 8 and 127 bytes into the next page would oops when
the second page is unmapped. It's trivial to reproduce - all
it takes is
main()
{
int fd = open("/dev/zero", O_RDONLY);
char *p = mmap(NULL, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANON, -1, 0);
munmap(p + 8192, 8192);
read(fd, p + 8192 - 128, 192);
}
which had been oopsing since March 1997. Says something about
the quality of test coverage... ;-/ And while today sparc32 port
is nearly dead, back in '97 it had been very much alive; in fact,
sparc64 had only been in mainline for 3 months by that point...
Cc: stable@kernel.org
Fixes: v2.1.29
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f67e06008 upstream.
Currently, when turbo is disabled (either by BIOS or by the user),
the intel_pstate driver reads the max non-turbo frequency from the
package-wide MSR_PLATFORM_INFO(0xce) register.
However, on asymmetric platforms it is possible in theory that small
and big core with HWP enabled might have different max non-turbo CPU
frequency, because MSR_HWP_CAPABILITIES is per-CPU scope according
to Intel Software Developer Manual.
The turbo max freq is already per-CPU in current code, so make
similar change to the max non-turbo frequency as well.
Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[ rjw: Subject and changelog edits ]
Cc: 4.18+ <stable@vger.kernel.org> # 4.18+: a45ee4d4e1: cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8a8109f303 upstream.
printk_safe_flush_on_panic() caused the following deadlock on our
server:
CPU0: CPU1:
panic rcu_dump_cpu_stacks
kdump_nmi_shootdown_cpus nmi_trigger_cpumask_backtrace
register_nmi_handler(crash_nmi_callback) printk_safe_flush
__printk_safe_flush
raw_spin_lock_irqsave(&read_lock)
// send NMI to other processors
apic_send_IPI_allbutself(NMI_VECTOR)
// NMI interrupt, dead loop
crash_nmi_callback
printk_safe_flush_on_panic
printk_safe_flush
__printk_safe_flush
// deadlock
raw_spin_lock_irqsave(&read_lock)
DEADLOCK: read_lock is taken on CPU1 and will never get released.
It happens when panic() stops a CPU by NMI while it has been in
the middle of printk_safe_flush().
Handle the lock the same way as logbuf_lock. The printk_safe buffers
are flushed only when both locks can be safely taken. It can avoid
the deadlock _in this particular case_ at expense of losing contents
of printk_safe buffers.
Note: It would actually be safe to re-init the locks when all CPUs were
stopped by NMI. But it would require passing this information
from arch-specific code. It is not worth the complexity.
Especially because logbuf_lock and printk_safe buffers have been
obsoleted by the lockless ring buffer.
Fixes: cf9b1106c8 ("printk/nmi: flush NMI messages on the system panic")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: <stable@vger.kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210210034823.64867-1-songmuchun@bytedance.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8002a3593 upstream.
If no n_latch value will be provided at driver probe then all pins will
be used as an input:
gpio->out = ~n_latch;
In that case initial state for all pins is "one":
gpio->status = gpio->out;
So if pcf857x IRQ happens with change pin value from "zero" to "one"
then we miss it, because of "one" from IRQ and "one" from initial state
leaves corresponding pin unchanged:
change = (gpio->status ^ status) & gpio->irq_enabled;
The right solution will be to read actual state at driver probe.
Cc: stable@vger.kernel.org
Fixes: 6e20a0a429 ("gpio: pcf857x: enable gpio_to_irq() support")
Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ebfac7b778 upstream.
clang-12 -fno-pic (since
a084c0388e)
can emit `call __stack_chk_fail@PLT` instead of `call __stack_chk_fail`
on x86. The two forms should have identical behaviors on x86-64 but the
former causes GNU as<2.37 to produce an unreferenced undefined symbol
_GLOBAL_OFFSET_TABLE_.
(On x86-32, there is an R_386_PC32 vs R_386_PLT32 difference but the
linker behavior is identical as far as Linux kernel is concerned.)
Simply ignore _GLOBAL_OFFSET_TABLE_ for now, like what
scripts/mod/modpost.c:ignore_undef_symbol does. This also fixes the
problem for gcc/clang -fpie and -fpic, which may emit `call foo@PLT` for
external function calls on x86.
Note: ld -z defs and dynamic loaders do not error for unreferenced
undefined symbols so the module loader is reading too much. If we ever
need to ignore more symbols, the code should be refactored to ignore
unreferenced symbols.
Cc: <stable@vger.kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1250
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27178
Reported-by: Marco Elver <elver@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Marco Elver <elver@google.com>
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7018c897c2 upstream
Richard reports that the following test:
(while true; do
cat /sys/bus/nd/devices/nmem*/available_slots 2>&1 > /dev/null
done) &
while true; do
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/bind
done
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/unbind
done
done
...fails with a crash signature like:
divide error: 0000 [#1] SMP KASAN PTI
RIP: 0010:nd_label_nfree+0x134/0x1a0 [libnvdimm]
[..]
Call Trace:
available_slots_show+0x4e/0x120 [libnvdimm]
dev_attr_show+0x42/0x80
? memset+0x20/0x40
sysfs_kf_seq_show+0x218/0x410
The root cause is that available_slots_show() consults driver-data, but
fails to synchronize against device-unbind setting up a TOCTOU race to
access uninitialized memory.
Validate driver-data under the device-lock.
Fixes: 4d88a97aa9 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure")
Cc: <stable@vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Coly Li <colyli@suse.com>
Reported-by: Richard Palethorpe <rpalethorpe@suse.com>
Acked-by: Richard Palethorpe <rpalethorpe@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[sudip: use device_lock()]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3272cfc252 upstream.
page structs are not guaranteed to be contiguous for gigantic pages. The
routine copy_huge_page_from_user can encounter gigantic pages, yet it
assumes page structs are contiguous when copying pages from user space.
Since page structs for the target gigantic page are not contiguous, the
data copied from user space could overwrite other pages not associated
with the gigantic page and cause data corruption.
Non-contiguous page structs are generally not an issue. However, they can
exist with a specific kernel configuration and hotplug operations. For
example: Configure the kernel with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where
the gigantic page will be allocated.
Link: https://lkml.kernel.org/r/20210217184926.33567-2-mike.kravetz@oracle.com
Fixes: 8fb5debc5f ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3656d8227 upstream.
Patch series "Fix some seq_file users that were recently broken".
A recent change to seq_file broke some users which were using seq_file
in a non-"standard" way ... though the "standard" isn't documented, so
they can be excused. The result is a possible leak - of memory in one
case, of references to a 'transport' in the other.
These three patches:
1/ document and explain the problem
2/ fix the problem user in x86
3/ fix the problem user in net/sctp
This patch (of 3):
Users of seq_file will sometimes find it convenient to take a resource,
such as a lock or memory allocation, in the ->start or ->next operations.
These are per-entry resources, distinct from per-session resources which
are taken in ->start and released in ->stop.
The preferred management of these is release the resource on the
subsequent call to ->next or ->stop.
However prior to Commit 1f4aace60b ("fs/seq_file.c: simplify seq_file
iteration code and interface") it happened that ->show would always be
called after ->start or ->next, and a few users chose to release the
resource in ->show.
This is no longer reliable. Since the mentioned commit, ->next will
always come after a successful ->show (to ensure m->index is updated
correctly), so the original ordering cannot be maintained.
This patch updates the documentation to clearly state the required
behaviour. Other patches will fix the few problematic users.
[akpm@linux-foundation.org: fix typo, per Willy]
Link: https://lkml.kernel.org/r/161248518659.21478.2484341937387294998.stgit@noble1
Link: https://lkml.kernel.org/r/161248539020.21478.3147971477400875336.stgit@noble1
Fixes: 1f4aace60b ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 740c0a57b8 upstream.
The MEI bus has a special behavior on suspend it destroys
all the attached devices, this is due to the fact that also
firmware context is not persistent across power flows.
If watchdog on MEI bus is ticking before suspending the firmware
times out and reports that the OS is missing watchdog tick.
Send the stop command to the firmware on watchdog unregistered
to eliminate the false event on suspend.
This does not make the things worse from the user-space perspective
as a user-space should re-open watchdog device after
suspending before this patch.
Cc: <stable@vger.kernel.org>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20210124114938.373885-1-tomas.winkler@intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8a0c014cd2 upstream.
This issue was originally fixed in 09954bad4 ("floppy: refactor open()
flags handling").
The fix as a side-effect, however, introduce issue for open(O_ACCMODE)
that is being used for ioctl-only open. I wrote a fix for that, but
instead of it being merged, full revert of 09954bad4 was performed,
re-introducing the O_NDELAY / O_NONBLOCK issue, and it strikes again.
This is a forward-port of the original fix to current codebase; the
original submission had the changelog below:
====
Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().
Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.
Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2101221209060.5622@cbobk.fhfr.pm
Cc: stable@vger.kernel.org
Reported-by: Wim Osterholt <wim@djo.tudelft.nl>
Tested-by: Wim Osterholt <wim@djo.tudelft.nl>
Reported-and-tested-by: Kurt Garloff <kurt@garloff.de>
Fixes: 09954bad4 ("floppy: refactor open() flags handling")
Fixes: f2791e7ead ("Revert "floppy: refactor open() flags handling"")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed72736183 upstream.
Force all CPUs to do VMXOFF (via NMI shootdown) during an emergency
reboot if VMX is _supported_, as VMX being off on the current CPU does
not prevent other CPUs from being in VMX root (post-VMXON). This fixes
a bug where a crash/panic reboot could leave other CPUs in VMX root and
prevent them from being woken via INIT-SIPI-SIPI in the new kernel.
Fixes: d176720d34 ("x86: disable VMX on all CPUs on reboot")
Cc: stable@vger.kernel.org
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David P. Reed <dpreed@deepplum.com>
[sean: reworked changelog and further tweaked comment]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20201231002702.2223707-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>