Changes in 4.9.104
MIPS: c-r4k: Fix data corruption related to cache coherence
MIPS: ptrace: Expose FIR register through FP regset
MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
affs_lookup(): close a race with affs_remove_link()
aio: fix io_destroy(2) vs. lookup_ioctx() race
ALSA: timer: Fix pause event notification
do d_instantiate/unlock_new_inode combinations safely
mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
libata: Blacklist some Sandisk SSDs for NCQ
libata: blacklist Micron 500IT SSD with MU01 firmware
xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
IB/hfi1: Use after free race condition in send context error path
Revert "ipc/shm: Fix shmat mmap nil-page protection"
ipc/shm: fix shmat() nil address after round-down when remapping
kasan: fix memory hotplug during boot
kernel/sys.c: fix potential Spectre v1 issue
kernel/signal.c: avoid undefined behaviour in kill_something_info
KVM/VMX: Expose SSBD properly to guests
KVM: s390: vsie: fix < 8k check for the itdba
KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
kvm: x86: IA32_ARCH_CAPABILITIES is always supported
firewire-ohci: work around oversized DMA reads on JMicron controllers
x86/tsc: Allow TSC calibration without PIT
NFSv4: always set NFS_LOCK_LOST when a lock is lost.
ALSA: hda - Use IS_REACHABLE() for dependency on input
kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
netfilter: ipv6: nf_defrag: Pass on packets to stack per RFC2460
tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into account
PCI: Add function 1 DMA alias quirk for Marvell 9128
Input: psmouse - fix Synaptics detection when protocol is disabled
i40iw: Zero-out consumer key on allocate stag for FMR
tools lib traceevent: Simplify pointer print logic and fix %pF
perf callchain: Fix attr.sample_max_stack setting
tools lib traceevent: Fix get_field_str() for dynamic strings
perf record: Fix failed memory allocation for get_cpuid_str
iommu/vt-d: Use domain instead of cache fetching
dm thin: fix documentation relative to low water mark threshold
net: stmmac: dwmac-meson8b: fix setting the RGMII TX clock on Meson8b
net: stmmac: dwmac-meson8b: propagate rate changes to the parent clock
nfs: Do not convert nfs_idmap_cache_timeout to jiffies
watchdog: sp5100_tco: Fix watchdog disable bit
kconfig: Don't leak main menus during parsing
kconfig: Fix automatic menu creation mem leak
kconfig: Fix expr_free() E_NOT leak
mac80211_hwsim: fix possible memory leak in hwsim_new_radio_nl()
ipmi/powernv: Fix error return code in ipmi_powernv_probe()
Btrfs: set plug for fsync
btrfs: Fix out of bounds access in btrfs_search_slot
Btrfs: fix scrub to repair raid6 corruption
btrfs: fail mount when sb flag is not in BTRFS_SUPER_FLAG_SUPP
HID: roccat: prevent an out of bounds read in kovaplus_profile_activated()
fm10k: fix "failed to kill vid" message for VF
device property: Define type of PROPERTY_ENRTY_*() macros
jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path
powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
powerpc/numa: Ensure nodes initialized for hotplug
RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure
ntb_transport: Fix bug with max_mw_size parameter
gianfar: prevent integer wrapping in the rx handler
tcp_nv: fix potential integer overflow in tcpnv_acked
kvm: Map PFN-type memory regions as writable (if possible)
ocfs2: return -EROFS to mount.ocfs2 if inode block is invalid
ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attribute
ocfs2: return error when we attempt to access a dirty bh in jbd2
mm/mempolicy: fix the check of nodemask from user
mm/mempolicy: add nodes_empty check in SYSC_migrate_pages
asm-generic: provide generic_pmdp_establish()
sparc64: update pmdp_invalidate() to return old pmd value
mm: thp: use down_read_trylock() in khugepaged to avoid long block
mm: pin address_space before dereferencing it while isolating an LRU page
mm/fadvise: discard partial page if endbyte is also EOF
openvswitch: Remove padding from packet before L3+ conntrack processing
IB/ipoib: Fix for potential no-carrier state
drm/nouveau/pmu/fuc: don't use movw directly anymore
netfilter: ipv6: nf_defrag: Kill frag queue on RFC2460 failure
x86/power: Fix swsusp_arch_resume prototype
firmware: dmi_scan: Fix handling of empty DMI strings
ACPI: processor_perflib: Do not send _PPC change notification if not ready
ACPI / scan: Use acpi_bus_get_status() to initialize ACPI_TYPE_DEVICE devs
bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y
MIPS: generic: Fix machine compatible matching
MIPS: TXx9: use IS_BUILTIN() for CONFIG_LEDS_CLASS
xen-netfront: Fix race between device setup and open
xen/grant-table: Use put_page instead of free_page
RDS: IB: Fix null pointer issue
arm64: spinlock: Fix theoretical trylock() A-B-A with LSE atomics
proc: fix /proc/*/map_files lookup
cifs: silence compiler warnings showing up with gcc-8.0.0
bcache: properly set task state in bch_writeback_thread()
bcache: fix for allocator and register thread race
bcache: fix for data collapse after re-attaching an attached device
bcache: return attach error when no cache set exist
tools/libbpf: handle issues with bpf ELF objects containing .eh_frames
bpf: fix rlimit in reuseport net selftest
vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
locking/qspinlock: Ensure node->count is updated before initialising node
irqchip/gic-v3: Ignore disabled ITS nodes
cpumask: Make for_each_cpu_wrap() available on UP as well
irqchip/gic-v3: Change pr_debug message to pr_devel
ARC: Fix malformed ARC_EMUL_UNALIGNED default
ptr_ring: prevent integer overflow when calculating size
libata: Fix compile warning with ATA_DEBUG enabled
selftests: pstore: Adding config fragment CONFIG_PSTORE_RAM=m
selftests: memfd: add config fragment for fuse
ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
ARM: OMAP3: Fix prm wake interrupt for resume
ARM: OMAP1: clock: Fix debugfs_create_*() usage
ibmvnic: Free RX socket buffer in case of adapter error
iwlwifi: mvm: fix security bug in PN checking
iwlwifi: mvm: always init rs with 20mhz bandwidth rates
NFC: llcp: Limit size of SDP URI
rxrpc: Work around usercopy check
mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4
mac80211: fix a possible leak of station stats
mac80211: fix calling sleeping function in atomic context
mac80211: Do not disconnect on invalid operating class
md raid10: fix NULL deference in handle_write_completed()
drm/exynos: g2d: use monotonic timestamps
drm/exynos: fix comparison to bitshift when dealing with a mask
locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()
md: raid5: avoid string overflow warning
kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data access
s390/cio: fix ccw_device_start_timeout API
s390/cio: fix return code after missing interrupt
s390/cio: clear timer when terminating driver I/O
PKCS#7: fix direct verification of SignerInfo signature
ARM: OMAP: Fix dmtimer init for omap1
smsc75xx: fix smsc75xx_set_features()
regulatory: add NUL to request alpha2
integrity/security: fix digsig.c build error with header file
locking/xchg/alpha: Fix xchg() and cmpxchg() memory ordering bugs
x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
mac80211: drop frames with unexpected DS bits from fast-rx to slow path
arm64: fix unwind_frame() for filtered out fn for function graph tracing
macvlan: fix use-after-free in macvlan_common_newlink()
kvm: fix warning for CONFIG_HAVE_KVM_EVENTFD builds
fs: dcache: Avoid livelock between d_alloc_parallel and __d_add
fs: dcache: Use READ_ONCE when accessing i_dir_seq
md: fix a potential deadlock of raid5/raid10 reshape
md/raid1: fix NULL pointer dereference
batman-adv: fix packet checksum in receive path
batman-adv: invalidate checksum on fragment reassembly
netfilter: ebtables: convert BUG_ONs to WARN_ONs
batman-adv: Ignore invalid batadv_iv_gw during netlink send
batman-adv: Ignore invalid batadv_v_gw during netlink send
batman-adv: Fix netlink dumping of BLA claims
batman-adv: Fix netlink dumping of BLA backbones
nvme-pci: Fix nvme queue cleanup if IRQ setup fails
clocksource/drivers/fsl_ftm_timer: Fix error return checking
ceph: fix dentry leak when failing to init debugfs
ARM: orion5x: Revert commit 4904dbda41.
qrtr: add MODULE_ALIAS macro to smd
r8152: fix tx packets accounting
virtio-gpu: fix ioctl and expose the fixed status to userspace.
dmaengine: rcar-dmac: fix max_chunk_size for R-Car Gen3
bcache: fix kcrashes with fio in RAID5 backend dev
ip6_tunnel: fix IFLA_MTU ignored on NEWLINK
sit: fix IFLA_MTU ignored on NEWLINK
ARM: dts: NSP: Fix amount of RAM on BCM958625HR
powerpc/boot: Fix random libfdt related build errors
gianfar: Fix Rx byte accounting for ndev stats
net/tcp/illinois: replace broken algorithm reference link
nvmet: fix PSDT field check in command format
xen/pirq: fix error path cleanup when binding MSIs
drm/sun4i: Fix dclk_set_phase
Btrfs: send, fix issuing write op when processing hole in no data mode
selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable
KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing
iwlwifi: mvm: fix TX of CCMP 256
watchdog: f71808e_wdt: Fix magic close handling
watchdog: sbsa: use 32-bit read for WCV
batman-adv: Fix multicast packet loss with a single WANT_ALL_IPV4/6 flag
e1000e: Fix check_for_link return value with autoneg off
e1000e: allocate ring descriptors with dma_zalloc_coherent
ia64/err-inject: Use get_user_pages_fast()
RDMA/qedr: Fix kernel panic when running fio over NFSoRDMA
RDMA/qedr: Fix iWARP write and send with immediate
IB/mlx4: Fix corruption of RoCEv2 IPv4 GIDs
IB/mlx4: Include GID type when deleting GIDs from HW table under RoCE
IB/mlx5: Fix an error code in __mlx5_ib_modify_qp()
fbdev: Fixing arbitrary kernel leak in case FBIOGETCMAP_SPARC in sbusfb_ioctl_helper().
fsl/fman: avoid sleeping in atomic context while adding an address
net: qcom/emac: Use proper free methods during TX
net: smsc911x: Fix unload crash when link is up
IB/core: Fix possible crash to access NULL netdev
xen: xenbus: use put_device() instead of kfree()
arm64: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery
dmaengine: mv_xor_v2: Fix clock resource by adding a register clock
netfilter: ebtables: fix erroneous reject of last rule
bnxt_en: Check valid VNIC ID in bnxt_hwrm_vnic_set_tpa().
workqueue: use put_device() instead of kfree()
ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu
sunvnet: does not support GSO for sctp
drm/imx: move arming of the vblank event to atomic_flush
net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off
batman-adv: fix header size check in batadv_dbg_arp()
batman-adv: Fix skbuff rcsum on packet reroute
vti4: Don't count header length twice on tunnel setup
vti4: Don't override MTU passed on link creation via IFLA_MTU
perf/cgroup: Fix child event counting bug
brcmfmac: Fix check for ISO3166 code
kbuild: make scripts/adjust_autoksyms.sh robust against timestamp races
RDMA/ucma: Correct option size check using optlen
RDMA/qedr: fix QP's ack timeout configuration
RDMA/qedr: Fix rc initialization on CNQ allocation failure
mm/mempolicy.c: avoid use uninitialized preferred_node
mm, thp: do not cause memcg oom for thp
selftests: ftrace: Add probe event argument syntax testcase
selftests: ftrace: Add a testcase for string type with kprobe_event
selftests: ftrace: Add a testcase for probepoint
batman-adv: fix multicast-via-unicast transmission with AP isolation
batman-adv: fix packet loss for broadcasted DHCP packets to a server
ARM: 8748/1: mm: Define vdso_start, vdso_end as array
net: qmi_wwan: add BroadMobi BM806U 2020:2033
perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUs
llc: properly handle dev_queue_xmit() return value
builddeb: Fix header package regarding dtc source links
mm/kmemleak.c: wait for scan completion before disabling free
net: Fix untag for vlan packets without ethernet header
net: mvneta: fix enable of all initialized RXQs
sh: fix debug trap failure to process signals before return to user
nvme: don't send keep-alives to the discovery controller
x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init
fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table
swap: divide-by-zero when zero length swap file on ssd
sr: get/drop reference to device in revalidate and check_events
Force log to disk before reading the AGF during a fstrim
cpufreq: CPPC: Initialize shared perf capabilities of CPUs
dp83640: Ensure against premature access to PHY registers after reset
mm/ksm: fix interaction with THP
mm: fix races between address_space dereference and free in page_evicatable
Btrfs: bail out on error during replay_dir_deletes
Btrfs: fix NULL pointer dereference in log_dir_items
btrfs: Fix possible softlock on single core machines
ocfs2/dlm: don't handle migrate lockres if already in shutdown
sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
KVM: VMX: raise internal error for exception during invalid protected mode state
fscache: Fix hanging wait on page discarded by writeback
sparc64: Make atomic_xchg() an inline function rather than a macro.
net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
btrfs: tests/qgroup: Fix wrong tree backref level
Btrfs: fix copy_items() return value when logging an inode
btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
rxrpc: Fix Tx ring annotation after initial Tx failure
rxrpc: Don't treat call aborts as conn aborts
xen/acpi: off by one in read_acpi_id()
drivers: macintosh: rack-meter: really fix bogus memsets
ACPI: acpi_pad: Fix memory leak in power saving threads
powerpc/mpic: Check if cpu_possible() in mpic_physmask()
m68k: set dma and coherent masks for platform FEC ethernets
parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode
hwmon: (nct6775) Fix writing pwmX_mode
powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer
powerpc/perf: Fix kernel address leak via sampling registers
tools/thermal: tmon: fix for segfault
selftests: Print the test we're running to /dev/kmsg
net/mlx5: Protect from command bit overflow
ath10k: Fix kernel panic while using worker (ath10k_sta_rc_update_wk)
cxgb4: Setup FW queues before registering netdev
ima: Fallback to the builtin hash algorithm
virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUS
arm: dts: socfpga: fix GIC PPI warning
cpufreq: cppc_cpufreq: Fix cppc_cpufreq_init() failure path
zorro: Set up z->dev.dma_mask for the DMA API
bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set
ACPICA: Events: add a return on failure from acpi_hw_register_read
ACPICA: acpi: acpica: fix acpi operand cache leak in nseval.c
cxgb4: Fix queue free path of ULD drivers
i2c: mv64xxx: Apply errata delay only in standard mode
KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use
perf top: Fix top.call-graph config option reading
perf stat: Fix core dump when flag T is used
IB/core: Honor port_num while resolving GID for IB link layer
regulator: gpio: Fix some error handling paths in 'gpio_regulator_probe()'
spi: bcm-qspi: fIX some error handling paths
MIPS: ath79: Fix AR724X_PLL_REG_PCIE_CONFIG offset
PCI: Restore config space on runtime resume despite being unbound
ipmi_ssif: Fix kernel panic at msg_done_handler
powerpc: Add missing prototype for arch_irq_work_raise()
f2fs: fix to check extent cache in f2fs_drop_extent_tree
perf/core: Fix perf_output_read_group()
drm/panel: simple: Fix the bus format for the Ontat panel
hwmon: (pmbus/max8688) Accept negative page register values
hwmon: (pmbus/adm1275) Accept negative page register values
perf/x86/intel: Properly save/restore the PMU state in the NMI handler
cdrom: do not call check_disk_change() inside cdrom_open()
perf/x86/intel: Fix large period handling on Broadwell CPUs
perf/x86/intel: Fix event update for auto-reload
arm64: dts: qcom: Fix SPI5 config on MSM8996
soc: qcom: wcnss_ctrl: Fix increment in NV upload
gfs2: Fix fallocate chunk size
x86/devicetree: Initialize device tree before using it
x86/devicetree: Fix device IRQ settings in DT
ALSA: vmaster: Propagate slave error
dmaengine: pl330: fix a race condition in case of threaded irqs
dmaengine: rcar-dmac: Check the done lists in rcar_dmac_chan_get_residue()
enic: enable rq before updating rq descriptors
hwrng: stm32 - add reset during probe
dmaengine: qcom: bam_dma: get num-channels and num-ees from dt
net: stmmac: ensure that the device has released ownership before reading data
net: stmmac: ensure that the MSS desc is the last desc to set the own bit
cpufreq: Reorder cpufreq_online() error code path
PCI: Add function 1 DMA alias quirk for Marvell 88SE9220
udf: Provide saner default for invalid uid / gid
ARM: dts: bcm283x: Fix probing of bcm2835-i2s
audit: return on memory error to avoid null pointer dereference
rcu: Call touch_nmi_watchdog() while printing stall warnings
pinctrl: sh-pfc: r8a7796: Fix MOD_SEL register pin assignment for SSI pins group
MIPS: Octeon: Fix logging messages with spurious periods after newlines
drm/rockchip: Respect page offset for PRIME mmap calls
x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified
perf tests: Use arch__compare_symbol_names to compare symbols
perf report: Fix memory corruption in --branch-history mode --branch-history
selftests/net: fixes psock_fanout eBPF test case
netlabel: If PF_INET6, check sk_buff ip header version
regmap: Correct comparison in regmap_cached
ARM: dts: imx7d: cl-som-imx7: fix pinctrl_enet
ARM: dts: porter: Fix HDMI output routing
regulator: of: Add a missing 'of_node_put()' in an error handling path of 'of_regulator_match()'
pinctrl: msm: Use dynamic GPIO numbering
kdb: make "mdr" command repeat
Linux 4.9.104
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 9b28a1102e ]
Fixes:
1. The use of "exceeds" when the opposite of exceeds, falls below,
was meant.
2. Properly speaking, a table can not exceed a threshold.
It emphasizes the important point, which is that it is the userspace
daemon's responsibility to check for low free space when a device
is resumed, since it won't get a special event indicating low free
space in that situation.
Signed-off-by: mulhern <amulhern@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This allows platforms that are CPU/memory contrained to verify data
blocks only the first time they are read from the data device, rather
than every time. As such, it provides a reduced level of security
because only offline tampering of the data device's content will be
detected, not online tampering.
Hash blocks are still verified each time they are read from the hash
device, since verification of hash blocks is less performance critical
than data blocks, and a hash block will not be verified any more after
all the data blocks it covers have been verified anyway.
This option introduces a bitset that is used to check if a block has
been validated before or not. A block can be validated more than once
as there is no thread protection for the bitset.
These changes were developed and tested on entry-level Android Go
devices.
Change-Id: I4a0be2552acb04f7c2ce6665031199d9dcbc2430
Bug: 72664474
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit 843f38d382)
(changed alloc to vzalloc/vfree)
Signed-off-by: Patrik Torstensson <totte@google.com>
This is a wrap-up of three patches pending upstream approval.
I'm bundling them because they are interdependent, and it'll be
easier to drop it on rebase later.
1. dm: allow a dm-fs-style device to be shared via dm-ioctl
Integrates feedback from Alisdair, Mike, and Kiyoshi.
Two main changes occur here:
- One function is added which allows for a programmatically created
mapped device to be inserted into the dm-ioctl hash table. This binds
the device to a name and, optional, uuid which is needed by udev and
allows for userspace management of the mapped device.
- dm_table_complete() was extended to handle all of the final
functional changes required for the table to be operational once
called.
2. init: boot to device-mapper targets without an initr*
Add a dm= kernel parameter modeled after the md= parameter from
do_mounts_md. It allows for device-mapper targets to be configured at
boot time for use early in the boot process (as the root device or
otherwise). It also replaces /dev/XXX calls with major:minor opportunistically.
The format is dm="name uuid ro,table line 1,table line 2,...". The
parser expects the comma to be safe to use as a newline substitute but,
otherwise, uses the normal separator of space. Some attempt has been
made to make it forgiving of additional spaces (using skip_spaces()).
A mapped device created during boot will be assigned a minor of 0 and
may be access via /dev/dm-0.
An example dm-linear root with no uuid may look like:
root=/dev/dm-0 dm="lroot none ro, 0 4096 linear /dev/ubdb 0, 4096 4096 linear /dv/ubdc 0"
Once udev is started, /dev/dm-0 will become /dev/mapper/lroot.
Older upstream threads:
http://marc.info/?l=dm-devel&m=127429492521964&w=2http://marc.info/?l=dm-devel&m=127429499422096&w=2http://marc.info/?l=dm-devel&m=127429493922000&w=2
Latest upstream threads:
https://patchwork.kernel.org/patch/104859/https://patchwork.kernel.org/patch/104860/https://patchwork.kernel.org/patch/104861/
Bug: 27175947
Signed-off-by: Will Drewry <wad@chromium.org>
Review URL: http://codereview.chromium.org/2020011
Change-Id: I92bd53432a11241228d2e5ac89a3b20d19b05a31
dm-raid 1.9.0 fails to activate existing RAID4/10 devices that have the
old superblock format (which does not have takeover/reshaping support
that was added via commit 33e53f0685).
Fix validation path for old superblocks by reverting to the old raid4
layout and basing checks on mddev->new_{level,layout,...} members in
super_init_validation().
Cc: stable@vger.kernel.org # 4.8
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Since commit 63a4cc2486, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.
No intended functional changes in this commit.
Signed-off-by: Jens Axboe <axboe@fb.com>
To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
smq seems to be performing better than the old mq policy in all
situations, as well as using a quarter of the memory.
Make 'mq' an alias for 'smq' when choosing a cache policy. The tunables
that were present for the old mq are faked, and have no effect. mq
should be considered deprecated now.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If ignore_zero_blocks is enabled dm-verity will return zeroes for blocks
matching a zero hash without validating the content.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Add support for correcting corrupted blocks using Reed-Solomon.
This code uses RS(255, N) interleaved across data and hash
blocks. Each error-correcting block covers N bytes evenly
distributed across the combined total data, so that each byte is a
maximum distance away from the others. This makes it possible to
recover from several consecutive corrupted blocks with relatively
small space overhead.
In addition, using verity hashes to locate erasures nearly doubles
the effectiveness of error correction. Being able to detect
corrupted blocks also improves performance, because only corrupted
blocks need to corrected.
For a 2 GiB partition, RS(255, 253) (two parity bytes for each
253-byte block) can correct up to 16 MiB of consecutive corrupted
blocks if erasures can be located, and 8 MiB if they cannot, with
16 MiB space overhead.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull device mapper updates from Mike Snitzer:
"Smaller set of DM changes for this merge. I've based these changes on
Jens' for-4.4/reservations branch because the associated DM changes
required it.
- Revert a dm-multipath change that caused a regression for
unprivledged users (e.g. kvm guests) that issued ioctls when a
multipath device had no available paths.
- Include Christoph's refactoring of DM's ioctl handling and add
support for passing through persistent reservations with DM
multipath.
- All other changes are very simple cleanups"
* tag 'dm-4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm switch: simplify conditional in alloc_region_table()
dm delay: document that offsets are specified in sectors
dm delay: capitalize the start of an delay_ctr() error message
dm delay: Use DM_MAPIO macros instead of open-coded equivalents
dm linear: remove redundant target name from error messages
dm persistent data: eliminate unnecessary return values
dm: eliminate unused "bioset" process for each bio-based DM device
dm: convert ffs to __ffs
dm: drop NULL test before kmem_cache_destroy() and mempool_destroy()
dm: add support for passing through persistent reservations
dm: refactor ioctl handling
Revert "dm mpath: fix stalls when handling invalid ioctls"
dm: initialize non-blk-mq queue data before queue is used
Only delay params are mentioned in delay.txt.
Mention offsets just like documents for linear and flakey do.
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Commit 76c44f6d80 introduced the possibly for "Overflow" to be reported
by the snapshot device's status. Older userspace (e.g. lvm2) does not
handle the "Overflow" status response.
Fix this incompatibility by requiring newer userspace code, that can
cope with "Overflow", request the persistent store with overflow support
by using "PO" (Persistent with Overflow) for the snapshot store type.
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Fixes: 76c44f6d80 ("dm snapshot: don't invalidate on-disk image on snapshot write overflow")
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
For RAID 4/5/6 data integrity reasons 'discard_zeroes_data' must work
properly.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If the user selected the precise_timestamps or histogram options, report
it in the @stats_list message output.
If the user didn't select these options, no extra tokens are reported,
thus it is backward compatible with old software that doesn't know about
precise timestamps and histogram.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.2
There is currently no way to see that the needs_check flag has been set
in the metadata. Display 'needs_check' in the cache status if it is set
in the cache metadata.
Also, update cache documentation.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
There is currently no way to see that the needs_check flag has been set
in the metadata. Display 'needs_check' in the thin-pool status if it is
set in the thinp metadata.
Also, update thinp documentation.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Add an option to dm statistics to collect and report a histogram of
IO latencies.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Make it possible to use precise timestamps with nanosecond granularity
in dm statistics.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The Stochastic multiqueue (SMQ) policy (vs MQ) offers the promise of
less memory utilization, improved performance and increased adaptability
in the face of changing workloads. SMQ also does not have any
cumbersome tuning knobs.
Users may switch from "mq" to "smq" simply by appropriately reloading a
DM table that is using the cache target. Doing so will cause all of the
mq policy's hints to be dropped. Also, performance of the cache may
degrade slightly until smq recalculates the origin device's hotspots
that should be cached.
In the future the "mq" policy will just silently make use of "smq" and
the mq code will be removed.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
If a cache metadata operation fails (e.g. transaction commit) the
cache's metadata device will abort the current transaction, set a new
needs_check flag, and the cache will transition to "read-only" mode. If
aborting the transaction or setting the needs_check flag fails the cache
will transition to "fail-io" mode.
Once needs_check is set the cache device will not be allowed to
activate. Activation requires write access to metadata. Future work is
needed to add proper support for running the cache in read-only mode.
Once in fail-io mode the cache will report a status of "Fail".
Also, add commit() wrapper that will disallow commits if in read_only or
fail mode.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Add dm-raid access to the MD RAID0 personality to enable single zone
striping.
The following changes enable that access:
- add type definition to raid_types array
- make bitmap creation conditonal in super_validate(), because
bitmaps are not allowed in raid0
- set rdev->sectors to the data image size in super_validate()
to allow the raid0 personality to calculate the MD array
size properly
- use mdddev(un)lock() functions instead of direct mutex_(un)lock()
(wrapped in here because it's a trivial change)
- enhance raid_status() to always report full sync for raid0
so that userspace checks for 100% sync will succeed and allow
for resize (and takeover/reshape once added in future paches)
- enhance raid_resume() to not load bitmap in case of raid0
- add merge function to avoid data corruption (seen with readahead)
that resulted from bio payloads that grew too large. This problem
did not occur with the other raid levels because it either did not
apply without striping (raid1) or was avoided via stripe caching.
- raise version to 1.7.0 because of the raid0 API change
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Remove comment above parse_raid_params() that claims
"devices_handle_discard_safely" is a table line argument when it is
actually is a module parameter.
Also, backfill dm-raid target version 1.6.0 documentation.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cryptsetup home page moved to GitLab.
Also remove link to abandonded Truecrypt page.
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Introduce a new target that is meant for file system developers to test file
system integrity at particular points in the life of a file system. We capture
all write requests and associated data and log them to a separate device
for later replay. There is a userspace utility to do this replay. The
idea behind this is to give file system developers a tool to verify that
the file system is always consistent.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Zach Brown <zab@zabbo.net>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Add device specific modes to dm-verity to specify how corrupted
blocks should be handled. The following modes are defined:
- DM_VERITY_MODE_EIO is the default behavior, where reading a
corrupted block results in -EIO.
- DM_VERITY_MODE_LOGGING only logs corrupted blocks, but does
not block the read.
- DM_VERITY_MODE_RESTART calls kernel_restart when a corrupted
block is discovered.
In addition, each mode sends a uevent to notify userspace of
corruption and to allow further recovery actions.
The driver defaults to previous behavior (DM_VERITY_MODE_EIO)
and other modes can be enabled with an additional parameter to
the verity table.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Make it possible to disable offloading writes by setting the optional
'submit_from_crypt_cpus' table argument.
There are some situations where offloading write bios from the
encryption threads to a single thread degrades performance
significantly.
The default is to offload write bios to the same thread because it
benefits CFQ to have writes submitted using the same IO context.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Use unbound workqueue by default so that work is automatically balanced
between available CPUs. The original behavior of encrypting using the
same cpu that IO was submitted on can still be enabled by setting the
optional 'same_cpu_crypt' table argument.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Before, if the user wanted sequential IO to be promoted to the cache
they'd have to set sequential_threshold to some nebulous large value.
Now, the user may easily disable sequential IO detection (and sequential
IO's implicit bypass of the cache) by setting sequential_threshold to 0.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Rather than maintaining a separate promote_threshold variable that we
periodically update we now use the hit count of the oldest clean
block. Also add a fudge factor to discourage demoting dirty blocks.
With some tests this has a sizeable difference, because the old code
was too eager to demote blocks. For example, device-mapper-test-suite's
git_extract_cache_quick test goes from taking 190 seconds, to 142
(linear on spindle takes 250).
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Add support for quickly loading a repetitive pattern into the
dm-switch target.
In the "set_regions_mappings" message, the user may now use "Rn,m" as
one of the arguments. "n" and "m" are hexadecimal numbers. The "Rn,m"
argument repeats the last "n" arguments in the following "m" slots.
For example:
dmsetup message switch 0 set_region_mappings 1000:1 :2 R2,10
is equivalent to
dmsetup message switch 0 set_region_mappings 1000:1 :2 :1 :2 :1 :2 :1 :2 \
:1 :2 :1 :2 :1 :2 :1 :2 :1 :2
Requested-by: Jay Wang <jwang@nimblestorage.com>
Tested-by: Jay Wang <jwang@nimblestorage.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Commit 85ad643b ("dm thin: add timeout to stop out-of-data-space mode
holding IO forever") introduced a fixed 60 second timeout. Users may
want to either disable or modify this timeout.
Allow the out-of-data-space timeout to be configured using the
'no_space_timeout' dm-thin-pool module param. Setting it to 0 will
disable the timeout, resulting in IO being queued until more data space
is added to the thin-pool.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.14+
dm-era is a target that behaves similar to the linear target. In
addition it keeps track of which blocks were written within a user
defined period of time called an 'era'. Each era target instance
maintains the current era as a monotonically increasing 32-bit
counter.
Use cases include tracking changed blocks for backup software, and
partially invalidating the contents of a cache to restore cache
coherency after rolling back a vendor snapshot.
dm-era is primarily expected to be paired with the dm-cache target.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The Documentation for the thin provisioning target's held metadata root
feature was incorrect. It is now available and the value for the held
metadata root is in block units (not 512b sectors).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If a thin metadata operation fails the current transaction will abort,
whereby causing potential for IO layers up the stack (e.g. filesystems)
to have data loss. As such, set THIN_METADATA_NEEDS_CHECK_FLAG in the
thin metadata's superblock which:
1) requires the user verify the thin metadata is consistent (e.g. use
thin_check, etc)
2) suggests the user verify the thin data is consistent (e.g. use fsck)
The only way to clear the superblock's THIN_METADATA_NEEDS_CHECK_FLAG is
to run thin_repair.
On metadata operation failure: abort current metadata transaction, set
pool in read-only mode, and now set the needs_check flag.
As part of this change, constraints are introduced or relaxed:
* don't allow a pool to transition to write mode if needs_check is set
* don't allow data or metadata space to be resized if needs_check is set
* if a thin pool's metadata space is exhausted: the kernel will now
force the user to take the pool offline for repair before the kernel
will allow the metadata space to be extended.
Also, update Documentation to include information about when the thin
provisioning target commits metadata, how it handles metadata failures
and running out of space.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
The cache's policy may have been established using the "default" alias,
which is currently the "mq" policy but the default policy may change in
the future. It is useful to know exactly which policy is being used.
Add a 'real' member to the dm_cache_policy_type structure and have the
"default" dm_cache_policy_type point to the real "mq"
dm_cache_policy_type. Update dm_cache_policy_get_name() to check if
real is set, if so report the name of the real policy (not the alias).
Requested-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Improve cache_status to emit:
<metadata block size> <#used metadata blocks>/<#total metadata blocks>
<cache block size> <#used cache blocks>/<#total cache blocks>
...
Adding the block sizes allows for easier calculation of the overall size
of both the metadata and cache devices. Adding <#total cache blocks>
provides useful context for how much of the cache is used.
Unfortunately these additions to the status will require updates to
users' scripts that monitor the cache status. But these changes help
provide more comprehensive information about the cache device and will
simplify tools that are being developed to manage dm-cache devices --
because they won't need to issue 3 operations to cobble together the
information that we can easily provide via a single status ioctl.
While updating the status documentation in cache.txt spaces were
tabify'd.
Requested-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Internally the mq policy maintains a promotion threshold variable. If
the hit count of a block not in the cache goes above this threshold it
gets promoted to the cache.
This patch introduces three new tunables that allow you to tweak the
promotion threshold by adding a small value. These adjustments depend
on the io type:
read_promote_adjustment: READ io, default 4
write_promote_adjustment: WRITE io, default 8
discard_promote_adjustment: READ/WRITE io to a discarded block, default 1
If you're trying to quickly warm a new cache device you may wish to
reduce these to encourage promotion. Remember to switch them back to
their defaults after the cache fills though.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If the pool runs out of data or metadata space, the pool can either
queue or error the IO destined to the data device. The default is to
queue the IO until more space is added.
An admin may now configure the pool to error IO when no space is
available by setting the 'error_if_no_space' feature when loading the
thin-pool table.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
The cache target's invalidate_cblocks message allows cache block
(cblock) ranges to be expressed with: <cblock start>-<cblock end>
The range's <cblock end> value is "one past the end", so the range
includes <cblock start> through <cblock end>-1.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Document passthrough mode, cache shrinking, and cache invalidation.
Also, use strcasecmp() and hlist_unhashed().
Reported-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cache block invalidation is removing an entry from the cache without
writing it back. Cache blocks can be invalidated via the
'invalidate_cblocks' message, which takes an arbitrary number of cblock
ranges:
invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
E.g.
dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
"Passthrough" is a dm-cache operating mode (like writethrough or
writeback) which is intended to be used when the cache contents are not
known to be coherent with the origin device. It behaves as follows:
* All reads are served from the origin device (all reads miss the cache)
* All writes are forwarded to the origin device; additionally, write
hits cause cache block invalidates
This mode decouples cache coherency checks from cache device creation,
largely to avoid having to perform coherency checks while booting. Boot
scripts can create cache devices in passthrough mode and put them into
service (mount cached filesystems, for example) without having to worry
about coherency. Coherency that exists is maintained, although the
cache will gradually cool as writes take place.
Later, applications can perform coherency checks, the nature of which
will depend on the type of the underlying storage. If coherency can be
verified, the cache device can be transitioned to writethrough or
writeback mode while still warm; otherwise, the cache contents can be
discarded prior to transitioning to the desired operating mode.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Morgan Mears <Morgan.Mears@netapp.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
There are now two multiqueues for in cache blocks. A clean one and a
dirty one.
writeback_work comes from the dirty one. Demotions come from the clean
one.
There are two benefits:
- Performance improvement, since demoting a clean block is a noop.
- The cache cleans itself when io load is light.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
dm-crypt can already activate TCRYPT (TrueCrypt compatible) containers
in LRW or XTS block encryption mode.
TCRYPT containers prior to version 4.1 use CBC mode with some additional
tweaks, this patch adds support for these containers.
This new mode is implemented using special IV generator named TCW
(TrueCrypt IV with whitening). TCW IV only supports containers that are
encrypted with one cipher (Tested with AES, Twofish, Serpent, CAST5 and
TripleDES).
While this mode is legacy and is known to be vulnerable to some
watermarking attacks (e.g. revealing of hidden disk existence) it can
still be useful to activate old containers without using 3rd party
software or for independent forensic analysis of such containers.
(Both the userspace and kernel code is an independent implementation
based on the format documentation and it completely avoids use of
original source code.)
The TCW IV generator uses two additional keys: Kw (whitening seed, size
is always 16 bytes - TCW_WHITENING_SIZE) and Kiv (IV seed, size is
always the IV size of the selected cipher). These keys are concatenated
at the end of the main encryption key provided in mapping table.
While whitening is completely independent from IV, it is implemented
inside IV generator for simplification.
The whitening value is always 16 bytes long and is calculated per sector
from provided Kw as initial seed, xored with sector number and mixed
with CRC32 algorithm. Resulting value is xored with ciphertext sector
content.
IV is calculated from the provided Kiv as initial IV seed and xored with
sector number.
Detailed calculation can be found in the Truecrypt documentation for
version < 4.1 and will also be described on dm-crypt site, see:
http://code.google.com/p/cryptsetup/wiki/DMCrypt
The experimental support for activation of these containers is already
present in git devel brach of cryptsetup.
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>