Commit Graph

1067029 Commits

Author SHA1 Message Date
Helge Deller
68d247ad38 parisc: Default to 16 CPUs on 32-bit kernel
Qemu currently supports up to 16 CPUs, so increase the default from 4 to 16.
Bload-o-meter shows only an increase of 800 bytes with this change.

Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-11 11:57:38 +01:00
Helge Deller
16f035d9e2 sections: Fix __is_kernel() to include init ranges
With CONFIG_KALLSYMS_ALL=y, the function is_ksym_addr() is used to
determine if a symbol is from inside the kernel range. For that the
given symbol address is checked if it's inside the _stext to _end range.

Although this is correct, some architectures (e.g. parisc) may have the
init area before the _stext address and as such the check in
is_ksym_addr() fails.  By extending the range check to include the init
section, __is_kernel() will now detect symbols in this range as well.

This fixes an issue on parisc where addresses of kernel functions in
init sections aren't resolved to their symbol names.

Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-11 11:57:38 +01:00
Helge Deller
e486288d11 parisc: Re-use toc_stack as hpmc_stack
No need to have an own hpmc_stack. Just re-use the toc_stack of the
monarch CPU as either a TOC or a HPMC will happen at the same time.
This reduces the kernel memory footprint by 16k.

Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-11 11:57:37 +01:00
Helge Deller
d6ab9fc745 parisc: Enable TOC (transfer of contents) feature unconditionally
Before this patch, the TOC code used a pre-allocated stack of 16kb for
each possible CPU. That space overhead was the reason why the TOC
feature wasn't enabled by default for 32-bit kernels.

This patch rewrites the TOC code to use a per-cpu stack. That way we use
much less memory now and as such we enable the TOC feature by default on
all kernels.

Additionally the dump of the registers and the stacktrace wasn't
serialized, which led to multiple CPUs printing the stack backtrace at
once which rendered the output unreadable.
Now the backtraces are nicely serialized by a lock.

Signed-off-by: Helge Deller <deller@gmx.de>
2022-01-11 11:57:37 +01:00
Linus Torvalds
fe8152b38d Merge tag 'devprop-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull device properties framework updates from Rafael Wysocki:
 "These update the handling of software nodes and graph properties, and
  the MAINTAINERS entry for the former.

  Specifics:

   - Remove device_add_properties() which does not work correctly if
     software nodes holding additional device properties are shared or
     reused (Heikki Krogerus).

   - Fix nargs_prop property handling for software nodes (Clément
     Léger).

   - Update documentation of ACPI device properties (Sakari Ailus).

   - Update the handling of graph properties in the generic framework to
     match the DT case (Sakari Ailus).

   - Update software nodes entry in MAINTAINERS (Andy Shevchenko)"

* tag 'devprop-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  software node: Update MAINTAINERS data base
  software node: fix wrong node passed to find nargs_prop
  device property: Drop fwnode_graph_get_remote_node()
  device property: Use fwnode_graph_for_each_endpoint() macro
  device property: Implement fwnode_graph_get_endpoint_count()
  Documentation: ACPI: Update references
  Documentation: ACPI: Fix data node reference documentation
  device property: Fix documentation for FWNODE_GRAPH_DEVICE_DISABLED
  device property: Fix fwnode_graph_devcon_match() fwnode leak
  device property: Remove device_add_properties() API
  driver core: Don't call device_remove_properties() from device_del()
  PCI: Convert to device_create_managed_software_node()
2022-01-10 20:48:19 -08:00
Linus Torvalds
fe2437ccbd Merge tag 'thermal-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
 "These add a new driver for Renesas RZ/G2L TSU, update a few existing
  thermal control drivers and clean up the tmon utility.

  Specifics:

   - Add new TSU driver and DT bindings for the Renesas RZ/G2L platform
     (Biju Das).

   - Fix missing check when calling reset_control_deassert() in the
     rz2gl thermal driver (Biju Das).

   - In preparation for FORTIFY_SOURCE performing compile-time and
     run-time field bounds checking for memcpy(), avoid intentionally
     writing across neighboring fields in the int340x thermal control
     driver (Kees Cook).

   - Fix RFIM mailbox write commands handling in the int340x thermal
     control driver (Sumeet Pawnikar).

   - Fix PM issue occurring in the iMX thermal control driver during
     suspend/resume by implementing PM runtime support in it (Oleksij
     Rempel).

   - Add 'const' annotation to thermal_cooling_ops in the Intel
     powerclamp driver (Rikard Falkeborn).

   - Fix missing ADC bit set in the iMX8MP thermal driver to enable the
     sensor (Paul Gerber).

   - Drop unused local variable definition from tmon (ran jianping)"

* tag 'thermal-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal/drivers/int340x: Fix RFIM mailbox write commands
  thermal/drivers/rz2gl: Add error check for reset_control_deassert()
  thermal/drivers/imx8mm: Enable ADC when enabling monitor
  thermal/drivers: Add TSU driver for RZ/G2L
  dt-bindings: thermal: Document Renesas RZ/G2L TSU
  thermal/drivers/intel_powerclamp: Constify static thermal_cooling_device_ops
  thermal/drivers/imx: Implement runtime PM support
  thermal: tools: tmon: remove unneeded local variable
  thermal: int340x: Use struct_group() for memcpy() region
2022-01-10 20:43:54 -08:00
Linus Torvalds
b35b6d4d71 Merge tag 'pm-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
 "The most signigicant change here is the addition of a new cpufreq
  'P-state' driver for AMD processors as a better replacement for the
  venerable acpi-cpufreq driver.

  There are also other cpufreq updates (in the core, intel_pstate, ARM
  drivers), PM core updates (mostly related to adding new macros for
  declaring PM operations which should make the lives of driver
  developers somewhat easier), and a bunch of assorted fixes and
  cleanups.

  Summary:

   - Add new P-state driver for AMD processors (Huang Rui).

   - Fix initialization of min and max frequency QoS requests in the
     cpufreq core (Rafael Wysocki).

   - Fix EPP handling on Alder Lake in intel_pstate (Srinivas
     Pandruvada).

   - Make intel_pstate update cpuinfo.max_freq when notified of HWP
     capabilities changes and drop a redundant function call from that
     driver (Rafael Wysocki).

   - Improve IRQ support in the Qcom cpufreq driver (Ard Biesheuvel,
     Stephen Boyd, Vladimir Zapolskiy).

   - Fix double devm_remap() in the Mediatek cpufreq driver (Hector
     Yuan).

   - Introduce thermal pressure helpers for cpufreq CPU cooling (Lukasz
     Luba).

   - Make cpufreq use default_groups in kobj_type (Greg Kroah-Hartman).

   - Make cpuidle use default_groups in kobj_type (Greg Kroah-Hartman).

   - Fix two comments in cpuidle code (Jason Wang, Yang Li).

   - Allow model-specific normal EPB value to be used in the intel_epb
     sysfs attribute handling code (Srinivas Pandruvada).

   - Simplify locking in pm_runtime_put_suppliers() (Rafael Wysocki).

   - Add safety net to supplier device release in the runtime PM core
     code (Rafael Wysocki).

   - Capture device status before disabling runtime PM for it (Rafael
     Wysocki).

   - Add new macros for declaring PM operations to allow drivers to
     avoid guarding them with CONFIG_PM #ifdefs or __maybe_unused and
     update some drivers to use these macros (Paul Cercueil).

   - Allow ACPI hardware signature to be honoured during restore from
     hibernation (David Woodhouse).

   - Update outdated operating performance points (OPP) documentation
     (Tang Yizhou).

   - Reduce log severity for informative message regarding frequency
     transition failures in devfreq (Tzung-Bi Shih).

   - Add DRAM frequency controller devfreq driver for Allwinner sunXi
     SoCs (Samuel Holland).

   - Add missing COMMON_CLK dependency to sun8i devfreq driver (Arnd
     Bergmann).

   - Add support for new layout of Psys PowerLimit Register on SPR to
     the Intel RAPL power capping driver (Zhang Rui).

   - Fix typo in a comment in idle_inject.c (Jason Wang).

   - Remove unused function definition from the DTPM (Dynamit Thermal
     Power Management) power capping framework (Daniel Lezcano).

   - Reduce DTPM trace verbosity (Daniel Lezcano)"

* tag 'pm-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
  x86, sched: Fix undefined reference to init_freq_invariance_cppc() build error
  cpufreq: amd-pstate: Fix Kconfig dependencies for AMD P-State
  cpufreq: amd-pstate: Fix struct amd_cpudata kernel-doc comment
  cpuidle: use default_groups in kobj_type
  x86: intel_epb: Allow model specific normal EPB value
  MAINTAINERS: Add AMD P-State driver maintainer entry
  Documentation: amd-pstate: Add AMD P-State driver introduction
  cpufreq: amd-pstate: Add AMD P-State performance attributes
  cpufreq: amd-pstate: Add AMD P-State frequencies attributes
  cpufreq: amd-pstate: Add boost mode support for AMD P-State
  cpufreq: amd-pstate: Add trace for AMD P-State module
  cpufreq: amd-pstate: Introduce the support for the processors with shared memory solution
  cpufreq: amd-pstate: Add fast switch function for AMD P-State
  cpufreq: amd-pstate: Introduce a new AMD P-State driver to support future processors
  ACPI: CPPC: Add CPPC enable register function
  ACPI: CPPC: Check present CPUs for determining _CPC is valid
  ACPI: CPPC: Implement support for SystemIO registers
  x86/msr: Add AMD CPPC MSR definitions
  x86/cpufeatures: Add AMD Collaborative Processor Performance Control feature flag
  cpufreq: use default_groups in kobj_type
  ...
2022-01-10 20:34:00 -08:00
Linus Torvalds
bca21755b9 Merge tag 'acpi-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
 "These are usual ACPICA code updates (although there are more of them
  than in the last few releases), a noticeable EC driver update (which
  mostly consists of cleanups, though), the device enumeration quirks
  handling rework from Hans, some updates eliminating unnecessary CPU
  cache flushing in some places (processor idle and system-wide PM code)
  and a bunch of assorted cleanups and fixes.

  Specifics:

   - Update ACPICA code in the kernel to the 20211217 upstream release
     including the following changes:

      - iASL/Disassembler: Additional support for NHLT table (Bob
        Moore).
      - Change a return_ACPI_STATUS (AE_BAD_PARAMETER) (Bob Moore).
      - Fix a couple of warnings under MSVC (Bob Moore).
      - iASL: Add TDEL table to both compiler/disassembler (Bob Moore).
      - iASL/NHLT table: "Specific Data" field support (Bob Moore).
      - Use original data_table_region pointer for accesses (Jessica
        Clarke).
      - Use original pointer for virtual origin tables (Jessica Clarke).
      - Macros: Remove ACPI_PHYSADDR_TO_PTR (Jessica Clarke).
      - Avoid subobject buffer overflow when validating RSDP signature
        (Jessica Clarke).
      - iASL: Add suppport for AGDI table (Ilkka Koskinen).
      - Hardware: Do not flush CPU cache when entering S4 and S5 (Kirill
        A. Shutemov).
      - Expand the ACPI_ACCESS_ definitions (Mark Langsdorf).
      - Utilities: Avoid deleting the same object twice in a row (Rafael
        Wysocki).
      - Executer: Fix REFCLASS_REFOF case in acpi_ex_opcode_1A_0T_1R()
        (Rafael Wysocki).
      - Fix AEST Processor generic resource substructure data field byte
        length (Shuuichirou Ishii).
      - Fix wrong interpretation of PCC address (Sudeep Holla).
      - Add support for PCC Opregion special context data (Sudeep
        Holla).

   - Implement OperationRegion handler for PCC Type 3 subtype (Sudeep
     Holla).

   - Introduce acpi_fetch_acpi_dev() as a replacement for
     acpi_bus_get_device() and use it in the ACPI subsystem (Rafael
     Wysocki).

   - Avoid using _CID for device enumaration if _HID is missing or
     invalid (Rafael Wysocki).

   - Rework quirk handling during ACPI device enumeration and add some
     new quirks for known broken platforms (Hans de Goede).

   - Avoid unnecessary or redundant CPU cache flushing during system PM
     transitions (Kirill A. Shutemov).

   - Add PM debug messages related to power resources (Rafael Wysocki).

   - Fix kernel-doc comment in the PCI host bridge ACPI driver (Yang
     Li).

   - Rework flushing of EC work while suspended to idle and clean up the
     handling of events in the ACPI EC driver (Rafael Wysocki).

   - Prohibit ec_sys module parameter write_support from being used when
     the system is locked down (Hans de Goede).

   - Make the ACPI processor thermal driver use cpufreq_cpu_get() to
     check for presence of cpufreq policy (Manfred Spraul).

   - Avoid unnecessary CPU cache flushing in the ACPI processor idle
     driver (Kirill A. Shutemov).

   - Replace kernel.h with the necessary inclusions in the ACPI
     processor driver (Andy Shevchenko).

   - Use swap() instead of open coding it in the ACPI processor idle
     driver (Guo Zhengkui).

   - Fix the handling of defective LPAT in the ACPI xpower PMIC driver
     and clean up some definitions of PMIC data structures (Hans de
     Goede).

   - Fix outdated comment in the ACPI DPTF driver (Sumeet Pawnikar).

   - Add AEST to the list of known ACPI table signatures (Shuuichirou
     Ishii).

   - Make ACPI NUMA code take hotpluggable memblocks into account when
     CONFIG_MEMORY_HOTPLUG is not set (Vitaly Kuznetsov).

   - Use default_groups in kobj_type in the ACPI sysfs code (Greg
     Kroah-Hartman).

   - Rearrange _CPC structure documentation (Andy Shevchenko).

   - Drop an always true check from the ACPI thermal driver (Adam
     Borowski).

   - Add new "not charging" quirk for Lenovo ThinkPads to the ACPI
     battery driver (Thomas Weißschuh)"

* tag 'acpi-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits)
  ACPI: PCC: Implement OperationRegion handler for the PCC Type 3 subtype
  ACPI / x86: Skip AC and battery devices on x86 Android tablets with broken DSDTs
  ACPI / x86: Introduce an acpi_quirk_skip_acpi_ac_and_battery() helper
  ACPI: processor: thermal: avoid cpufreq_get_policy()
  serdev: Do not instantiate serdevs on boards with known bogus DSDT entries
  i2c: acpi: Do not instantiate I2C-clients on boards with known bogus DSDT entries
  ACPI / x86: Add acpi_quirk_skip_[i2c_client|serdev]_enumeration() helpers
  ACPI: scan: Create platform device for BCM4752 and LNV4752 ACPI nodes
  PCI/ACPI: Fix acpi_pci_osc_control_set() kernel-doc comment
  ACPI: battery: Add the ThinkPad "Not Charging" quirk
  ACPI: sysfs: use default_groups in kobj_type
  ACPICA: Update version to 20211217
  ACPICA: iASL/NHLT table: "Specific Data" field support
  ACPICA: iASL: Add suppport for AGDI table
  ACPICA: iASL: Add TDEL table to both compiler/disassembler
  ACPICA: Fixed a couple of warnings under MSVC
  ACPICA: Change a return_ACPI_STATUS (AE_BAD_PARAMETER)
  ACPICA: Hardware: Do not flush CPU cache when entering S4 and S5
  ACPICA: Add support for PCC Opregion special context data
  ACPICA: Fix wrong interpretation of PCC address
  ...
2022-01-10 20:26:56 -08:00
Linus Torvalds
63045bfd3c netfilter: nf_tables: don't use 'data_size' uninitialized
Commit 2c865a8a28 ("netfilter: nf_tables: add rule blob layout") never
initialized the new 'data_size' variable.

I'm not sure how it ever worked, but it might have worked almost by
accident - gcc seems to occasionally miss these kinds of 'variable used
uninitialized' situations, but I've seen it do so because it ended up
zero-initializing them due to some other simplification.

But clang is very unhappy about it all, and correctly reports

    net/netfilter/nf_tables_api.c:8278:4: error: variable 'data_size' is uninitialized when used here [-Werror,-Wuninitialized]
                            data_size += sizeof(*prule) + rule->dlen;
                            ^~~~~~~~~
    net/netfilter/nf_tables_api.c:8263:30: note: initialize the variable 'data_size' to silence this warning
            unsigned int size, data_size;
                                        ^
                                         = 0
    1 error generated.

and this fix just initializes 'data_size' to zero before the loop.

Fixes: 2c865a8a28 ("netfilter: nf_tables: add rule blob layout")
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-10 19:33:36 -08:00
Linus Torvalds
8efd0d9c31 Merge tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
 "Core
  ----

   - Defer freeing TCP skbs to the BH handler, whenever possible, or at
     least perform the freeing outside of the socket lock section to
     decrease cross-CPU allocator work and improve latency.

   - Add netdevice refcount tracking to locate sources of netdevice and
     net namespace refcount leaks.

   - Make Tx watchdog less intrusive - avoid pausing Tx and restarting
     all queues from a single CPU removing latency spikes.

   - Various small optimizations throughout the stack from Eric Dumazet.

   - Make netdev->dev_addr[] constant, force modifications to go via
     appropriate helpers to allow us to keep addresses in ordered data
     structures.

   - Replace unix_table_lock with per-hash locks, improving performance
     of bind() calls.

   - Extend skb drop tracepoint with a drop reason.

   - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.

  BPF
  ---

   - New helpers:
      - bpf_find_vma(), find and inspect VMAs for profiling use cases
      - bpf_loop(), runtime-bounded loop helper trading some execution
        time for much faster (if at all converging) verification
      - bpf_strncmp(), improve performance, avoid compiler flakiness
      - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
        for tracing programs, all inlined by the verifier

   - Support BPF relocations (CO-RE) in the kernel loader.

   - Further the support for BTF_TYPE_TAG annotations.

   - Allow access to local storage in sleepable helpers.

   - Convert verifier argument types to a composable form with different
     attributes which can be shared across types (ro, maybe-null).

   - Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
     creating new, extensible ones where missing and deprecating those
     to be removed.

  Protocols
  ---------

   - WiFi (mac80211/cfg80211):
      - notify user space about long "come back in N" AP responses,
        allow it to react to such temporary rejections
      - allow non-standard VHT MCS 10/11 rates
      - use coarse time in airtime fairness code to save CPU cycles

   - Bluetooth:
      - rework of HCI command execution serialization to use a common
        queue and work struct, and improve handling errors reported in
        the middle of a batch of commands
      - rework HCI event handling to use skb_pull_data, avoiding packet
        parsing pitfalls
      - support AOSP Bluetooth Quality Report

   - SMC:
      - support net namespaces, following the RDMA model
      - improve connection establishment latency by pre-clearing buffers
      - introduce TCP ULP for automatic redirection to SMC

   - Multi-Path TCP:
      - support ioctls: SIOCINQ, OUTQ, and OUTQNSD
      - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
        IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
      - support cmsgs: TCP_INQ
      - improvements in the data scheduler (assigning data to subflows)
      - support fastclose option (quick shutdown of the full MPTCP
        connection, similar to TCP RST in regular TCP)

   - MCTP (Management Component Transport) over serial, as defined by
     DMTF spec DSP0253 - "MCTP Serial Transport Binding".

  Driver API
  ----------

   - Support timestamping on bond interfaces in active/passive mode.

   - Introduce generic phylink link mode validation for drivers which
     don't have any quirks and where MAC capability bits fully express
     what's supported. Allow PCS layer to participate in the validation.
     Convert a number of drivers.

   - Add support to set/get size of buffers on the Rx rings and size of
     the tx copybreak buffer via ethtool.

   - Support offloading TC actions as first-class citizens rather than
     only as attributes of filters, improve sharing and device resource
     utilization.

   - WiFi (mac80211/cfg80211):
      - support forwarding offload (ndo_fill_forward_path)
      - support for background radar detection hardware
      - SA Query Procedures offload on the AP side

  New hardware / drivers
  ----------------------

   - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
     real-time requirements for isochronous communication with protocols
     like OPC UA Pub/Sub.

   - Qualcomm BAM-DMUX WWAN - driver for data channels of modems
     integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974
     (qcom_bam_dmux).

   - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver
     with support for bridging, VLANs and multicast forwarding
     (lan966x).

   - iwlmei driver for co-operating between Intel's WiFi driver and
     Intel's Active Management Technology (AMT) devices.

   - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips

   - Bluetooth:
      - MediaTek MT7921 SDIO devices
      - Foxconn MT7922A
      - Realtek RTL8852AE

  Drivers
  -------

   - Significantly improve performance in the datapaths of: lan78xx,
     ax88179_178a, lantiq_xrx200, bnxt.

   - Intel Ethernet NICs:
      - igb: support PTP/time PEROUT and EXTTS SDP functions on
        82580/i354/i350 adapters
      - ixgbevf: new PF -> VF mailbox API which avoids the risk of
        mailbox corruption with ESXi
      - iavf: support configuration of VLAN features of finer
        granularity, stacked tags and filtering
      - ice: PTP support for new E822 devices with sub-ns precision
      - ice: support firmware activation without reboot

   - Mellanox Ethernet NICs (mlx5):
      - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
      - support TC forwarding when tunnel encap and decap happen between
        two ports of the same NIC
      - dynamically size and allow disabling various features to save
        resources for running in embedded / SmartNIC scenarios

   - Broadcom Ethernet NICs (bnxt):
      - use page frag allocator to improve Rx performance
      - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool

   - Other Ethernet NICs:
      - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support

   - Microsoft cloud/virtual NIC (mana):
      - add XDP support (PASS, DROP, TX)

   - Mellanox Ethernet switches (mlxsw):
      - initial support for Spectrum-4 ASICs
      - VxLAN with IPv6 underlay

   - Marvell Ethernet switches (prestera):
      - support flower flow templates
      - add basic IP forwarding support

   - NXP embedded Ethernet switches (ocelot & felix):
      - support Per-Stream Filtering and Policing (PSFP)
      - enable cut-through forwarding between ports by default
      - support FDMA to improve packet Rx/Tx to CPU

   - Other embedded switches:
      - hellcreek: improve trapping management (STP and PTP) packets
      - qca8k: support link aggregation and port mirroring

   - Qualcomm 802.11ax WiFi (ath11k):
      - qca6390, wcn6855: enable 802.11 power save mode in station mode
      - BSS color change support
      - WCN6855 hw2.1 support
      - 11d scan offload support
      - scan MAC address randomization support
      - full monitor mode, only supported on QCN9074
      - qca6390/wcn6855: report signal and tx bitrate
      - qca6390: rfkill support
      - qca6390/wcn6855: regdb.bin support

   - Intel WiFi (iwlwifi):
      - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
        in cooperation with the BIOS
      - support for Optimized Connectivity Experience (OCE) scan
      - support firmware API version 68
      - lots of preparatory work for the upcoming Bz device family

   - MediaTek WiFi (mt76):
      - Specific Absorption Rate (SAR) support
      - mt7921: 160 MHz channel support

   - RealTek WiFi (rtw88):
      - Specific Absorption Rate (SAR) support
      - scan offload

   - Other WiFi NICs
      - ath10k: support fetching (pre-)calibration data from nvmem
      - brcmfmac: configure keep-alive packet on suspend
      - wcn36xx: beacon filter support"

* tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2048 commits)
  tcp: tcp_send_challenge_ack delete useless param `skb`
  net/qla3xxx: Remove useless DMA-32 fallback configuration
  rocker: Remove useless DMA-32 fallback configuration
  hinic: Remove useless DMA-32 fallback configuration
  lan743x: Remove useless DMA-32 fallback configuration
  net: enetc: Remove useless DMA-32 fallback configuration
  cxgb4vf: Remove useless DMA-32 fallback configuration
  cxgb4: Remove useless DMA-32 fallback configuration
  cxgb3: Remove useless DMA-32 fallback configuration
  bnx2x: Remove useless DMA-32 fallback configuration
  et131x: Remove useless DMA-32 fallback configuration
  be2net: Remove useless DMA-32 fallback configuration
  vmxnet3: Remove useless DMA-32 fallback configuration
  bna: Simplify DMA setting
  net: alteon: Simplify DMA setting
  myri10ge: Simplify DMA setting
  qlcnic: Simplify DMA setting
  net: allwinner: Fix print format
  page_pool: remove spinlock in page_pool_refill_alloc_cache()
  amt: fix wrong return type of amt_send_membership_update()
  ...
2022-01-10 19:06:09 -08:00
Linus Torvalds
9bcbf894b6 Merge tag 'media/v5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:

 - New sensor driver: ov5693

 - A new driver for STM32 Chrom-ART Accelerator

 - Added V4L2 core helper functions for VP9 codec

 - Hantro driver has gained support for VP9 codecs

 - Added support for Maxim MAX96712 Quad GMSL2 Deserializer

 - The staging atomisp driver has gained lots of improvements, fixes and
   cleanups. It now works with userptr

 - Lots of random driver improvements as usual

* tag 'media/v5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (397 commits)
  media: ipu3-cio2: Add support for instantiating i2c-clients for VCMs
  media: ipu3-cio2: Call cio2_bridge_init() before anything else
  media: ipu3-cio2: Defer probing until the PMIC is fully setup
  media: hantro: Add support for Allwinner H6
  media: dt-bindings: allwinner: document H6 Hantro G2 binding
  media: hantro: Convert imx8m_vpu_g2_irq to helper
  media: hantro: move postproc enablement for old cores
  media: hantro: vp9: add support for legacy register set
  media: hantro: vp9: use double buffering if needed
  media: hantro: add support for reset lines
  media: hantro: Fix probe func error path
  media: i2c: hi846: use pm_runtime_force_suspend/resume for system suspend
  media: i2c: hi846: check return value of regulator_bulk_disable()
  media: hi556: Support device probe in non-zero ACPI D state
  media: ov5675: Support device probe in non-zero ACPI D state
  media: imx208: Support device probe in non-zero ACPI D state
  media: ov2740: support device probe in non-zero ACPI D state
  media: ov5670: Support device probe in non-zero ACPI D state
  media: ov8856: support device probe in non-zero ACPI D state
  media: ov8865: Disable only enabled regulators on error path
  ...
2022-01-10 18:55:43 -08:00
Linus Torvalds
75b950ef61 Revert "drm/amd/display: Fix for otg synchronization logic"
This reverts commit a896f870f8.

It causes odd flickering on my Radeon RX580 (PCI ID 1002:67df rev e7,
subsystem ID 1da2:e353).

Bisected right to this commit, and reverting it fixes things.

Link: https://lore.kernel.org/all/CAHk-=wg9hDde_L3bK9tAfdJ4N=TJJ+SjO3ZDONqH5=bVoy_Mzg@mail.gmail.com/
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Jun Lei <Jun.Lei@amd.com>
Cc: Mustapha Ghaddar <mustapha.ghaddar@amd.com>
Cc: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Cc: meenakshikumar somasundaram <meenakshikumar.somasundaram@amd.com>
Cc: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-10 18:50:10 -08:00
Linus Torvalds
8d0749b4f8 Merge tag 'drm-next-2022-01-07' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
 "Highlights are support for privacy screens found in new laptops, a
  bunch of nomodeset refactoring, and i915 enables ADL-P systems by
  default, while starting to add RPL-S support.

  vmwgfx adds GEM and support for OpenGL 4.3 features in userspace.

  Lots of internal refactorings around dma reservations, and lots of
  driver refactoring as well.

  Summary:

  core:
   - add privacy screen support
   - move nomodeset option into drm subsystem
   - clean up nomodeset handling in drivers
   - make drm_irq.c legacy
   - fix stack_depot name conflicts
   - remove DMA_BUF_SET_NAME ioctl restrictions
   - sysfs: send hotplug event
   - replace several DRM_* logging macros with drm_*
   - move hashtable to legacy code
   - add error return from gem_create_object
   - cma-helper: improve interfaces, drop CONFIG_DRM_KMS_CMA_HELPER
   - kernel.h related include cleanups
   - support XRGB2101010 source buffers

  ttm:
   - don't include drm hashtable
   - stop pruning fences after wait
   - documentation updates

  dma-buf:
   - add dma_resv selftest
   - add debugfs helpers
   - remove dma_resv_get_excl_unlocked
   - documentation
   - make fences mandatory in dma_resv_add_excl_fence

  dp:
   - add link training delay helpers

  gem:
   - link shmem/cma helpers into separate modules
   - use dma_resv iteratior
   - import dma-buf namespace into gem helper modules

  scheduler:
   - fence grab fix
   - lockdep fixes

  bridge:
   - switch to managed MIPI DSI helpers
   - register and attach during probe fixes
   - convert to YAML in several places.

  panel:
   - add bunch of new panesl

  simpledrm:
   - support FB_DAMAGE_CLIPS
   - support virtual screen sizes
   - add Apple M1 support

  amdgpu:
   - enable seamless boot for DCN 3.01
   - runtime PM fixes
   - use drm_kms_helper_connector_hotplug_event
   - get all fences at once
   - use generic drm fb helpers
   - PSR/DPCD/LTTPR/DSC/PM/RAS/OLED/SRIOV fixes
   - add smart trace buffer (STB) for supported GPUs
   - display debugfs entries
   - new SMU debug option
   - Documentation update

  amdkfd:
   - IP discovery enumeration refactor
   - interface between driver fixes
   - SVM fixes
   - kfd uapi header to define some sysfs bitfields.

  i915:
   - support VESA panel backlights
   - enable ADL-P by default
   - add eDP privacy screen support
   - add Raptor Lake S (RPL-S) support
   - DG2 page table support
   - lots of GuC/HuC fw refactoring
   - refactored i915->gt interfaces
   - CD clock squashing support
   - enable 10-bit gamma support
   - update ADL-P DMC fw to v2.14
   - enable runtime PM autosuspend by default
   - ADL-P DSI support
   - per-lane DP drive settings for ICL+
   - add support for pipe C/D DMC firmware
   - Atomic gamma LUT updates
   - remove CCS FB stride restrictions on ADL-P
   - VRR platform support for display 11
   - add support for display audio codec keepalive
   - lots of display refactoring
   - fix runtime PM handling during PXP suspend
   - improved eviction performance with async TTM moves
   - async VMA unbinding improvements
   - VMA locking refactoring
   - improved error capture robustness
   - use per device iommu checks
   - drop bits stealing from i915_sw_fence function ptr
   - remove dma_resv_prune
   - add IC cache invalidation on DG2

  nouveau:
   - crc fixes
   - validate LUTs in atomic check
   - set HDMI AVI RGB quant to full

  tegra:
   - buffer objects reworks for dma-buf compat
   - NVDEC driver uAPI support
   - power management improvements

  etnaviv:
   - IOMMU enabled system support
   - fix > 4GB command buffer mapping
   - close a DoS vector
   - fix spurious GPU resets

  ast:
   - fix i2c initialization

  rcar-du:
   - DSI output support

  exynos:
   - replace legacy gpio interface
   - implement generic GEM object mmap

  msm:
   - dpu plane state cleanup in prep for multirect
   - dpu debugfs cleanups
   - dp support for sc7280
   - a506 support
   - removal of struct_mutex
   - remove old eDP sub-driver

  anx7625:
   - support MIPI DSI input
   - support HDMI audio
   - fix reading EDID

  lvds:
   - fix bridge DT bindings

  megachips:
   - probe both bridges before registering

  dw-hdmi:
   - allow interlace on bridge

  ps8640:
   - enable runtime PM
   - support aux-bus

  tx358768:
   - enable reference clock
   - add pulse mode support

  ti-sn65dsi86:
   - use regmap bulk write
   - add PWM support

  etnaviv:
   - get all fences at once

  gma500:
   - gem object cleanups

  kmb:
   - enable fb console

  radeon:
   - use dma_resv_wait_timeout

  rockchip:
   - add DSP hold timeout
   - suspend/resume fixes
   - PLL clock fixes
   - implement mmap in GEM object functions
   - use generic fbdev emulation

  sun4i:
   - use CMA helpers without vmap support

  vc4:
   - fix HDMI-CEC hang with display is off
   - power on HDMI controller while disabling
   - support 4K@60Hz modes
   - support 10-bit YUV 4:2:0 output

  vmwgfx:
   - fix leak on probe errors
   - fail probing on broken hosts
   - new placement for MOB page tables
   - hide internal BOs from userspace
   - implement GEM support
   - implement GL 4.3 support

  virtio:
   - overflow fixes

  xen:
   - implement mmap as GEM object function

  omapdrm:
   - fix scatterlist export
   - support virtual planes

  mediatek:
   - MT8192 support
   - CMDQ refinement"

* tag 'drm-next-2022-01-07' of git://anongit.freedesktop.org/drm/drm: (1241 commits)
  drm/amdgpu: no DC support for headless chips
  drm/amd/display: fix dereference before NULL check
  drm/amdgpu: always reset the asic in suspend (v2)
  drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
  drm/amd/display: Fix the uninitialized variable in enable_stream_features()
  drm/amdgpu: fix runpm documentation
  amdgpu/pm: Make sysfs pm attributes as read-only for VFs
  drm/amdgpu: save error count in RAS poison handler
  drm/amdgpu: drop redundant semicolon
  drm/amd/display: get and restore link res map
  drm/amd/display: support dynamic HPO DP link encoder allocation
  drm/amd/display: access hpo dp link encoder only through link resource
  drm/amd/display: populate link res in both detection and validation
  drm/amd/display: define link res and make it accessible to all link interfaces
  drm/amd/display: 3.2.167
  drm/amd/display: [FW Promotion] Release 0.0.98
  drm/amd/display: Undo ODM combine
  drm/amd/display: Add reg defs for DCN303
  drm/amd/display: Changed pipe split policy to allow for multi-display pipe split
  drm/amd/display: Set optimize_pwr_state for DCN31
  ...
2022-01-10 12:58:46 -08:00
Linus Torvalds
bf4eebf8cf Merge tag 'linux-kselftest-kunit-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
 "This consists of several fixes and enhancements. A few highlights:

   - Option --kconfig_add option allows easily tweaking kunitconfigs

   - make build subcommand can reconfigure if needed

   - doesn't error on tests without test plans

   - doesn't crash if no parameters are generated

   - defaults --jobs to # of cups

   - reports test parameter results as (K)TAP subtests"

* tag 'linux-kselftest-kunit-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: tool: Default --jobs to number of CPUs
  kunit: tool: fix newly introduced typechecker errors
  kunit: tool: make `build` subcommand also reconfigure if needed
  kunit: tool: delete kunit_parser.TestResult type
  kunit: tool: use dataclass instead of collections.namedtuple
  kunit: tool: suggest using decode_stacktrace.sh on kernel crash
  kunit: tool: reconfigure when the used kunitconfig changes
  kunit: tool: revamp message for invalid kunitconfig
  kunit: tool: add --kconfig_add to allow easily tweaking kunitconfigs
  kunit: tool: move Kconfig read_from_file/parse_from_string to package-level
  kunit: tool: print parsed test results fully incrementally
  kunit: Report test parameter results as (K)TAP subtests
  kunit: Don't crash if no parameters are generated
  kunit: tool: Report an error if any test has no subtests
  kunit: tool: Do not error on tests without test plans
  kunit: add run_checks.py script to validate kunit changes
  Documentation: kunit: remove claims that kunit is a mocking framework
  kunit: tool: fix --json output for skipped tests
2022-01-10 12:16:48 -08:00
Linus Torvalds
4369b3cec2 Merge tag 'linux-kselftest-next-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest update from Shuah Khan:
 "Fixes to build errors, false negatives, and several code cleanups,
  including the ARRAY_SIZE cleanup that removes 25+ duplicates
  ARRAY_SIZE defines from individual tests"

* tag 'linux-kselftest-next-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/vm: remove ARRAY_SIZE define from individual tests
  selftests/timens: remove ARRAY_SIZE define from individual tests
  selftests/sparc64: remove ARRAY_SIZE define from adi-test
  selftests/seccomp: remove ARRAY_SIZE define from seccomp_benchmark
  selftests/rseq: remove ARRAY_SIZE define from individual tests
  selftests/net: remove ARRAY_SIZE define from individual tests
  selftests/landlock: remove ARRAY_SIZE define from common.h
  selftests/ir: remove ARRAY_SIZE define from ir_loopback.c
  selftests/core: remove ARRAY_SIZE define from close_range_test.c
  selftests/cgroup: remove ARRAY_SIZE define from cgroup_util.h
  selftests/arm64: remove ARRAY_SIZE define from vec-syscfg.c
  tools: fix ARRAY_SIZE defines in tools and selftests hdrs
  selftests: cgroup: build error multiple outpt files
  selftests/move_mount_set_group remove unneeded conversion to bool
  selftests/mount: remove unneeded conversion to bool
  selftests: harness: avoid false negatives if test has no ASSERTs
  selftests/ftrace: make kprobe profile testcase description unique
  selftests: clone3: clone3: add case CLONE3_ARGS_NO_TEST
  selftests: timers: Remove unneeded semicolon
  kselftests: timers:Remove unneeded semicolon
2022-01-10 12:08:12 -08:00
Linus Torvalds
ca1a46d6f5 Merge tag 'slab-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:

 - Separate struct slab from struct page - an offshot of the page folio
   work.

   Struct page fields used by slab allocators are moved from struct page
   to a new struct slab, that uses the same physical storage. Similar to
   struct folio, it always is a head page. This brings better type
   safety, separation of large kmalloc allocations from true slabs, and
   cleanup of related objcg code.

 - A SLAB_MERGE_DEFAULT config optimization.

* tag 'slab-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (33 commits)
  mm/slob: Remove unnecessary page_mapcount_reset() function call
  bootmem: Use page->index instead of page->freelist
  zsmalloc: Stop using slab fields in struct page
  mm/slub: Define struct slab fields for CONFIG_SLUB_CPU_PARTIAL only when enabled
  mm/slub: Simplify struct slab slabs field definition
  mm/sl*b: Differentiate struct slab fields by sl*b implementations
  mm/kfence: Convert kfence_guarded_alloc() to struct slab
  mm/kasan: Convert to struct folio and struct slab
  mm/slob: Convert SLOB to use struct slab and struct folio
  mm/memcg: Convert slab objcgs from struct page to struct slab
  mm: Convert struct page to struct slab in functions used by other subsystems
  mm/slab: Finish struct page to struct slab conversion
  mm/slab: Convert most struct page to struct slab by spatch
  mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab
  mm/slub: Finish struct page to struct slab conversion
  mm/slub: Convert most struct page to struct slab by spatch
  mm/slub: Convert pfmemalloc_match() to take a struct slab
  mm/slub: Convert __free_slab() to use struct slab
  mm/slub: Convert alloc_slab_page() to return a struct slab
  mm/slub: Convert print_page_info() to print_slab_info()
  ...
2022-01-10 11:58:12 -08:00
Linus Torvalds
d93aebbd76 Merge branch 'random-5.17-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
 "These a bit more numerous than usual for the RNG, due to folks
  resubmitting patches that had been pending prior and generally renewed
  interest.

  There are a few categories of patches in here:

   1) Dominik Brodowski and I traded a series back and forth for a some
      weeks that fixed numerous issues related to seeds being provided
      at extremely early boot by the firmware, before other parts of the
      kernel or of the RNG have been initialized, both fixing some
      crashes and addressing correctness around early boot randomness.
      One of these is marked for stable.

   2) I replaced the RNG's usage of SHA-1 with BLAKE2s in the entropy
      extractor, and made the construction a bit safer and more
      standard. This was sort of a long overdue low hanging fruit, as we
      were supposed to have phased out SHA-1 usage quite some time ago
      (even if all we needed here was non-invertibility). Along the way
      it also made extraction 131% faster. This required a bit of
      Kconfig and symbol plumbing to make things work well with the
      crypto libraries, which is one of the reasons why I'm sending you
      this pull early in the cycle.

   3) I got rid of a truly superfluous call to RDRAND in the hot path,
      which resulted in a whopping 370% increase in performance.

   4) Sebastian Andrzej Siewior sent some patches regarding PREEMPT_RT,
      the full series of which wasn't ready yet, but the first two
      preparatory cleanups were good on their own. One of them touches
      files in kernel/irq/, which is the other reason why I'm sending
      you this pull early in the cycle.

   5) Other assorted correctness fixes from Eric Biggers, Jann Horn,
      Mark Brown, Dominik Brodowski, and myself"

* 'random-5.17-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  random: don't reset crng_init_cnt on urandom_read()
  random: avoid superfluous call to RDRAND in CRNG extraction
  random: early initialization of ChaCha constants
  random: use IS_ENABLED(CONFIG_NUMA) instead of ifdefs
  random: harmonize "crng init done" messages
  random: mix bootloader randomness into pool
  random: do not throw away excess input to crng_fast_load
  random: do not re-init if crng_reseed completes before primary init
  random: fix crash on multiple early calls to add_bootloader_randomness()
  random: do not sign extend bytes for rotation when mixing
  random: use BLAKE2s instead of SHA1 in extraction
  lib/crypto: blake2s: include as built-in
  random: fix data race on crng init time
  random: fix data race on crng_node_pool
  irq: remove unused flags argument from __handle_irq_event_percpu()
  random: remove unused irq_flags argument from add_interrupt_randomness()
  random: document add_hwgenerator_randomness() with other input functions
  MAINTAINERS: add git tree for random.c
2022-01-10 11:52:16 -08:00
Linus Torvalds
9d3a1e0a88 Merge tag 'seccomp-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
 "The core seccomp code hasn't changed for this cycle, but the selftests
  were improved while helping to debug the recent signal handling
  refactoring work Eric did.

  Summary:

   - Improve seccomp selftests in support of signal handler refactoring
     (Kees Cook)"

* tag 'seccomp-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  selftests/seccomp: Report event mismatches more clearly
  selftests/seccomp: Stop USER_NOTIF test if kcmp() fails
2022-01-10 11:50:57 -08:00
Linus Torvalds
404dbad382 Merge tag 'pstore-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore update from Kees Cook:

 - Add boot param for early ftrace recording in pstore (Uwe
   Kleine-König)

* tag 'pstore-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/ftrace: Allow immediate recording
2022-01-10 11:48:37 -08:00
Linus Torvalds
ff8be96420 Merge tag 'edac_updates_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:

 - Add support for version 3 of the Synopsys DDR controller to
   synopsys_edac

 - Add support for DRR5 and new models 0x10-0x1f and 0x50-0x5f of AMD
   family 0x19 CPUs to amd64_edac

 - The usual set of fixes and cleanups

* tag 'edac_updates_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/amd64: Add support for family 19h, models 50h-5fh
  EDAC/sb_edac: Remove redundant initialization of variable rc
  RAS/CEC: Remove a repeated 'an' in a comment
  EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh
  EDAC: Add RDDR5 and LRDDR5 memory types
  EDAC/sifive: Fix non-kernel-doc comment
  dt-bindings: memory: Add entry for version 3.80a
  EDAC/synopsys: Enable the driver on Intel's N5X platform
  EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR
  EDAC/synopsys: Use the quirk for version instead of ddr version
2022-01-10 11:45:23 -08:00
Linus Torvalds
7e740ae635 Merge tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
 "A relatively big amount of movements in RAS-land this time around:

   - First part of a series to move the AMD address translation code
     from arch/x86/ to amd64_edac as that is its only user anyway

   - Some MCE error injection improvements to the AMD side

   - Reorganization of the #MC handler code and the facilities it calls
     to make it noinstr-safe

   - Add support for new AMD MCA bank types and non-uniform banks layout

   - The usual set of cleanups and fixes"

* tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/mce: Reduce number of machine checks taken during recovery
  x86/mce/inject: Avoid out-of-bounds write when setting flags
  x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration
  x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types
  x86/mce: Check regs before accessing it
  x86/mce: Mark mce_start() noinstr
  x86/mce: Mark mce_timed_out() noinstr
  x86/mce: Move the tainting outside of the noinstr region
  x86/mce: Mark mce_read_aux() noinstr
  x86/mce: Mark mce_end() noinstr
  x86/mce: Mark mce_panic() noinstr
  x86/mce: Prevent severity computation from being instrumented
  x86/mce: Allow instrumentation during task work queueing
  x86/mce: Remove noinstr annotation from mce_setup()
  x86/mce: Use mce_rdmsrl() in severity checking code
  x86/mce: Remove function-local cpus variables
  x86/mce: Do not use memset to clear the banks bitmaps
  x86/mce/inject: Set the valid bit in MCA_STATUS before error injection
  x86/mce/inject: Check if a bank is populated before injecting
  x86/mce: Get rid of cpu_missing
  ...
2022-01-10 11:43:09 -08:00
Linus Torvalds
48a60bdb2b Merge tag 'core_entry_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull thread_info flag accessor helper updates from Borislav Petkov:
 "Add a set of thread_info.flags accessors which snapshot it before
  accesing it in order to prevent any potential data races, and convert
  all users to those new accessors"

* tag 'core_entry_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  powerpc: Snapshot thread flags
  powerpc: Avoid discarding flags in system_call_exception()
  openrisc: Snapshot thread flags
  microblaze: Snapshot thread flags
  arm64: Snapshot thread flags
  ARM: Snapshot thread flags
  alpha: Snapshot thread flags
  sched: Snapshot thread flags
  entry: Snapshot thread flags
  x86: Snapshot thread flags
  thread_info: Add helpers to snapshot thread flags
2022-01-10 11:34:10 -08:00
Linus Torvalds
5ba13c1c4d Merge tag 'core_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull notifier fix from Borislav Petkov:
 "Return an error when a notifier callback has been registered already"

* tag 'core_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  notifier: Return an error when a callback has already been registered
2022-01-10 11:32:57 -08:00
Theodore Ts'o
6eeaf88fd5 ext4: don't use the orphan list when migrating an inode
We probably want to remove the indirect block to extents migration
feature after a deprecation window, but until then, let's fix a
potential data loss problem caused by the fact that we put the
tmp_inode on the orphan list.  In the unlikely case where we crash and
do a journal recovery, the data blocks belonging to the inode being
migrated are also represented in the tmp_inode on the orphan list ---
and so its data blocks will get marked unallocated, and available for
reuse.

Instead, stop putting the tmp_inode on the oprhan list.  So in the
case where we crash while migrating the inode, we'll leak an inode,
which is not a disaster.  It will be easily fixed the next time we run
fsck, and it's better than potentially having blocks getting claimed
by two different files, and losing data as a result.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Cc: stable@kernel.org
2022-01-10 13:25:56 -05:00
xu xin
a2e3965df4 ext4: use BUG_ON instead of if condition followed by BUG
BUG_ON would be better.

This issue was detected with the help of Coccinelle.

Reported-by: Zeal robot <zealci@zte.com.cn>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Link: https://lore.kernel.org/r/20211228073252.580296-1-xu.xin16@zte.com.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:56 -05:00
Dan Carpenter
da9e480212 ext4: fix a copy and paste typo
This was obviously supposed to be an ext4 struct, not xfs.  GCC
doesn't care either way so it doesn't affect the build or runtime.

Fixes: cebe85d570 ("ext4: switch to the new mount api")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20211215114309.GB14552@kili
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:56 -05:00
Luís Henriques
e81c9302a6 ext4: set csum seed in tmp inode while migrating to extents
When migrating to extents, the temporary inode will have it's own checksum
seed.  This means that, when swapping the inodes data, the inode checksums
will be incorrect.

This can be fixed by recalculating the extents checksums again.  Or simply
by copying the seed into the temporary inode.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=213357
Reported-by: Jeroen van Wolffelaar <jeroen@wolffelaar.nl>
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Link: https://lore.kernel.org/r/20211214175058.19511-1-lhenriques@suse.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-01-10 13:25:56 -05:00
luo penghao
ae6ec194b5 ext4: remove unnecessary 'offset' assignment
Although it is in the loop, offset is reassigned at the beginning of the
while loop.  And after the loop, the value will not be used

The clang_analyzer complains as follows:

fs/ext4/dir.c:306:3 warning:

Value stored to 'offset' is never read

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Link: https://lore.kernel.org/r/20211208075307.404703-1-luo.penghao@zte.com.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:56 -05:00
luo penghao
a6dbc76c4d ext4: remove redundant o_start statement
The if will goto out of the loop, and until the end of the
function execution, o_start will not be used again.

The clang_analyzer complains as follows:

fs/ext4/move_extent.c:635:5 warning:

Value stored to 'o_start' is never read

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Link: https://lore.kernel.org/r/20211208075157.404535-1-luo.penghao@zte.com.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:56 -05:00
Adam Borowski
037e7c525d ext4: drop an always true check
EXT_FIRST_INDEX(ptr) is ptr+12, which can't possibly be null; gcc-12
warns about this.

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20211115172020.57853-1-kilobyte@angband.pl
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:56 -05:00
luo penghao
fac888b2be ext4: remove unused assignments
The eh assignment in these two places is meaningless, because the
function will goto to merge, which will not use eh.

The clang_analyzer complains as follows:

fs/ext4/extents.c:1988:4 warning:
fs/ext4/extents.c:2016:4 warning:

Value stored to 'eh' is never read

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Link: https://lore.kernel.org/r/20211104064007.2919-1-luo.penghao@zte.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:56 -05:00
luo penghao
a660be97eb ext4: remove redundant statement
The local variable assignment at the end of the function is meaningless.

The clang_analyzer complains as follows:

fs/ext4/fast_commit.c:779:2 warning:

Value stored to 'dst' is never read

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Link: https://lore.kernel.org/r/20211104063406.2747-1-luo.penghao@zte.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:56 -05:00
Nghia Le
effc5b3b0d ext4: remove useless resetting io_end_size in mpage_process_page()
The command "make clang-analyzer" detects dead stores in
mpage_process_page() function.

Do not reset io_end_size to 0 in the current paths, as the function
exits on those paths without further using io_end_size.

Signed-off-by: Nghia Le <nghialm78@gmail.com>
Link: https://lore.kernel.org/r/20211025221803.3326-1-nghialm78@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:55 -05:00
Lukas Czerner
4a69aecbfb ext4: allow to change s_last_trim_minblks via sysfs
Ext4 has an optimization mechanism for batched disacrd (FITRIM) that
should help speed up subsequent calls of FITRIM ioctl by skipping the
groups that were previously trimmed. However because the FITRIM allows
to set the minimum size of an extent to trim, ext4 stores the last
minimum extent size and only avoids trimming the group if it was
previously trimmed with minimum extent size equal to, or smaller than
the current call.

There is currently no way to bypass the optimization without
umount/mount cycle. This becomes a problem when the file system is
live migrated to a different storage, because the optimization will
prevent possibly useful discard calls to the storage.

Fix it by exporting the s_last_trim_minblks via sysfs interface which
will allow us to set the minimum size to the number of blocks larger
than subsequent FITRIM call, effectively bypassing the optimization.

By setting the s_last_trim_minblks to ULONG_MAX the optimization will be
effectively cleared regardless of the previous state, or file system
configuration.

For example:
getconf ULONG_MAX > /sys/fs/ext4/dm-1/last_trim_minblks

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Laurent GUERBY <laurent@guerby.net>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20211103145122.17338-2-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:55 -05:00
Lukas Czerner
2327fb2e23 ext4: change s_last_trim_minblks type to unsigned long
There is no good reason for the s_last_trim_minblks to be atomic. There is
no data integrity needed and there is no real danger in setting and
reading it in a racy manner. Change it to be unsigned long, the same type
as s_clusters_per_group which is the maximum that's allowed.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Suggested-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20211103145122.17338-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:55 -05:00
Lukas Czerner
bbc605cdb1 ext4: implement support for get/set fs label
Implement support for FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls for
online reading and setting of file system label.

ext4_ioctl_getlabel() is simple, just get the label from the primary
superblock. This might not be the first sb on the file system if
'sb=' mount option is used.

In ext4_ioctl_setlabel() we update what ext4 currently views as a
primary superblock and then proceed to update backup superblocks. There
are two caveats:
 - the primary superblock might not be the first superblock and so it
   might not be the one used by userspace tools if read directly
   off the disk.
 - because the primary superblock might not be the first superblock we
   potentialy have to update it as part of backup superblock update.
   However the first sb location is a bit more complicated than the rest
   so we have to account for that.

The superblock modification is created generic enough so the
infrastructure can be used for other potential superblock modification
operations, such as chaning UUID.

Tested with generic/492 with various configurations. I also checked the
behavior with 'sb=' mount options, including very large file systems
with and without sparse_super/sparse_super2.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20211213135618.43303-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:55 -05:00
Lukas Czerner
4c1bd5a90c ext4: only set EXT4_MOUNT_QUOTA when journalled quota file is specified
Only set EXT4_MOUNT_QUOTA when journalled quota file is specified,
otherwise simply disabling specific quota type (usrjquota=) will also
set the EXT4_MOUNT_QUOTA super block option.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Fixes: e6e268cb68 ("ext4: move quota configuration out of handle_mount_opt()")
Link: https://lore.kernel.org/r/20220104143518.134465-2-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:55 -05:00
Lukas Czerner
13b215a9e6 ext4: don't use kfree() on rcu protected pointer sbi->s_qf_names
During ext4 mount api rework the commit e6e268cb68 ("ext4: move quota
configuration out of handle_mount_opt()") introduced a bug where we
would kfree(sbi->s_qf_names[i]) before assigning the new quota name in
ext4_apply_quota_options().

This is wrong because we're using kfree() on rcu prointer that could be
simultaneously accessed from ext4_show_quota_options() during remount.
Fix it by using rcu_replace_pointer() to replace the old qname with the
new one and then kfree_rcu() the old quota name.

Also use get_qf_name() instead of sbi->s_qf_names in strcmp() to silence
the sparse warning.

Fixes: e6e268cb68 ("ext4: move quota configuration out of handle_mount_opt()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220104143518.134465-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:55 -05:00
Jan Kara
173b6e383d ext4: avoid trim error on fs with small groups
A user reported FITRIM ioctl failing for him on ext4 on some devices
without apparent reason.  After some debugging we've found out that
these devices (being LVM volumes) report rather large discard
granularity of 42MB and the filesystem had 1k blocksize and thus group
size of 8MB. Because ext4 FITRIM implementation puts discard
granularity into minlen, ext4_trim_fs() declared the trim request as
invalid. However just silently doing nothing seems to be a more
appropriate reaction to such combination of parameters since user did
not specify anything wrong.

CC: Lukas Czerner <lczerner@redhat.com>
Fixes: 5c2ed62fd4 ("ext4: Adjust minlen with discard_granularity in the FITRIM ioctl")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211112152202.26614-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:55 -05:00
Zhang Yi
5c48a7df91 ext4: fix an use-after-free issue about data=journal writeback mode
Our syzkaller report an use-after-free issue that accessing the freed
buffer_head on the writeback page in __ext4_journalled_writepage(). The
problem is that if there was a truncate racing with the data=journalled
writeback procedure, the writeback length could become zero and
bget_one() refuse to get buffer_head's refcount, then the truncate
procedure release buffer once we drop page lock, finally, the last
ext4_walk_page_buffers() trigger the use-after-free problem.

sync                               truncate
ext4_sync_file()
 file_write_and_wait_range()
                                   ext4_setattr(0)
                                    inode->i_size = 0
  ext4_writepage()
   len = 0
   __ext4_journalled_writepage()
    page_bufs = page_buffers(page)
    ext4_walk_page_buffers(bget_one) <- does not get refcount
                                    do_invalidatepage()
                                      free_buffer_head()
    ext4_walk_page_buffers(page_bufs) <- trigger use-after-free

After commit bdf96838ae ("ext4: fix race between truncate and
__ext4_journalled_writepage()"), we have already handled the racing
case, so the bget_one() and bput_one() are not needed. So this patch
simply remove these hunk, and recheck the i_size to make it safe.

Fixes: bdf96838ae ("ext4: fix race between truncate and __ext4_journalled_writepage()")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211225090937.712867-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:55 -05:00
Ye Bin
298b5c5217 ext4: fix null-ptr-deref in '__ext4_journal_ensure_credits'
We got issue as follows when run syzkaller test:
[ 1901.130043] EXT4-fs error (device vda): ext4_remount:5624: comm syz-executor.5: Abort forced by user
[ 1901.130901] Aborting journal on device vda-8.
[ 1901.131437] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.16: Detected aborted journal
[ 1901.131566] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.11: Detected aborted journal
[ 1901.132586] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.18: Detected aborted journal
[ 1901.132751] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.9: Detected aborted journal
[ 1901.136149] EXT4-fs error (device vda) in ext4_reserve_inode_write:6035: Journal has aborted
[ 1901.136837] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-fuzzer: Detected aborted journal
[ 1901.136915] ==================================================================
[ 1901.138175] BUG: KASAN: null-ptr-deref in __ext4_journal_ensure_credits+0x74/0x140 [ext4]
[ 1901.138343] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.13: Detected aborted journal
[ 1901.138398] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.1: Detected aborted journal
[ 1901.138808] Read of size 8 at addr 0000000000000000 by task syz-executor.17/968
[ 1901.138817]
[ 1901.138852] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.30: Detected aborted journal
[ 1901.144779] CPU: 1 PID: 968 Comm: syz-executor.17 Not tainted 4.19.90-vhulk2111.1.0.h893.eulerosv2r10.aarch64+ #1
[ 1901.146479] Hardware name: linux,dummy-virt (DT)
[ 1901.147317] Call trace:
[ 1901.147552]  dump_backtrace+0x0/0x2d8
[ 1901.147898]  show_stack+0x28/0x38
[ 1901.148215]  dump_stack+0xec/0x15c
[ 1901.148746]  kasan_report+0x108/0x338
[ 1901.149207]  __asan_load8+0x58/0xb0
[ 1901.149753]  __ext4_journal_ensure_credits+0x74/0x140 [ext4]
[ 1901.150579]  ext4_xattr_delete_inode+0xe4/0x700 [ext4]
[ 1901.151316]  ext4_evict_inode+0x524/0xba8 [ext4]
[ 1901.151985]  evict+0x1a4/0x378
[ 1901.152353]  iput+0x310/0x428
[ 1901.152733]  do_unlinkat+0x260/0x428
[ 1901.153056]  __arm64_sys_unlinkat+0x6c/0xc0
[ 1901.153455]  el0_svc_common+0xc8/0x320
[ 1901.153799]  el0_svc_handler+0xf8/0x160
[ 1901.154265]  el0_svc+0x10/0x218
[ 1901.154682] ==================================================================

This issue may happens like this:
	Process1                               Process2
ext4_evict_inode
  ext4_journal_start
   ext4_truncate
     ext4_ind_truncate
       ext4_free_branches
         ext4_ind_truncate_ensure_credits
	   ext4_journal_ensure_credits_fn
	     ext4_journal_restart
	       handle->h_transaction = NULL;
                                           mount -o remount,abort  /mnt
					   -> trigger JBD abort
               start_this_handle -> will return failed
  ext4_xattr_delete_inode
    ext4_journal_ensure_credits
      ext4_journal_ensure_credits_fn
        __ext4_journal_ensure_credits
	  jbd2_handle_buffer_credits
	    journal = handle->h_transaction->t_journal; ->null-ptr-deref

Now, indirect truncate process didn't handle error. To solve this issue
maybe simply add check handle is abort in '__ext4_journal_ensure_credits'
is enough, and i also think this is necessary.

Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20211224100341.3299128-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:55 -05:00
Harshad Shirwadkar
c27c29c6af ext4: initialize err_blk before calling __ext4_get_inode_loc
It is not guaranteed that __ext4_get_inode_loc will definitely set
err_blk pointer when it returns EIO. To avoid using uninitialized
variables, let's first set err_blk to 0.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211201163421.2631661-1-harshads@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-01-10 13:25:55 -05:00
Chunguang Xu
8c80fb312d ext4: fix a possible ABBA deadlock due to busy PA
We found on older kernel (3.10) that in the scenario of insufficient
disk space, system may trigger an ABBA deadlock problem, it seems that
this problem still exists in latest kernel, try to fix it here. The
main process triggered by this problem is that task A occupies the PA
and waits for the jbd2 transaction finish, the jbd2 transaction waits
for the completion of task B's IO (plug_list), but task B waits for
the release of PA by task A to finish discard, which indirectly forms
an ABBA deadlock. The related calltrace is as follows:

    Task A
    vfs_write
    ext4_mb_new_blocks()
    ext4_mb_mark_diskspace_used()       JBD2
    jbd2_journal_get_write_access()  -> jbd2_journal_commit_transaction()
  ->schedule()                          filemap_fdatawait()
 |                                              |
 | Task B                                       |
 | do_unlinkat()                                |
 | ext4_evict_inode()                           |
 | jbd2_journal_begin_ordered_truncate()        |
 | filemap_fdatawrite_range()                   |
 | ext4_mb_new_blocks()                         |
  -ext4_mb_discard_group_preallocations() <-----

Here, try to cancel ext4_mb_discard_group_preallocations() internal
retry due to PA busy, and do a limited number of retries inside
ext4_mb_discard_preallocations(), which can circumvent the above
problems, but also has some advantages:

1. Since the PA is in a busy state, if other groups have free PAs,
   keeping the current PA may help to reduce fragmentation.
2. Continue to traverse forward instead of waiting for the current
   group PA to be released. In most scenarios, the PA discard time
   can be reduced.

However, in the case of smaller free space, if only a few groups have
space, then due to multiple traversals of the group, it may increase
CPU overhead. But in contrast, I feel that the overall benefit is
better than the cost.

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/1637630277-23496-1-git-send-email-brookxu.cn@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-01-10 13:25:55 -05:00
Qing Wang
dfac1a1670 ext4: replace snprintf in show functions with sysfs_emit
coccicheck complains about the use of snprintf() in sysfs show functions.

Fix the coccicheck warning:
WARNING: use scnprintf or sprintf.

Use sysfs_emit instead of scnprintf or sprintf makes more sense.

Signed-off-by: Qing Wang <wangqing@vivo.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/1634095731-4528-1-git-send-email-wangqing@vivo.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:55 -05:00
Jan Kara
4013d47a53 ext4: make sure to reset inode lockdep class when quota enabling fails
When we succeed in enabling some quota type but fail to enable another
one with quota feature, we correctly disable all enabled quota types.
However we forget to reset i_data_sem lockdep class. When the inode gets
freed and reused, it will inherit this lockdep class (i_data_sem is
initialized only when a slab is created) and thus eventually lockdep
barfs about possible deadlocks.

Reported-and-tested-by: syzbot+3b6f9218b1301ddda3e2@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20211007155336.12493-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:55 -05:00
Jan Kara
15fc69bbbb ext4: make sure quota gets properly shutdown on error
When we hit an error when enabling quotas and setting inode flags, we do
not properly shutdown quota subsystem despite returning error from
Q_QUOTAON quotactl. This can lead to some odd situations like kernel
using quota file while it is still writeable for userspace. Make sure we
properly cleanup the quota subsystem in case of error.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20211007155336.12493-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10 13:25:54 -05:00
Ye Bin
380a0091ca ext4: Fix BUG_ON in ext4_bread when write quota data
We got issue as follows when run syzkaller:
[  167.936972] EXT4-fs error (device loop0): __ext4_remount:6314: comm rep: Abort forced by user
[  167.938306] EXT4-fs (loop0): Remounting filesystem read-only
[  167.981637] Assertion failure in ext4_getblk() at fs/ext4/inode.c:847: '(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) || handle != NULL || create == 0'
[  167.983601] ------------[ cut here ]------------
[  167.984245] kernel BUG at fs/ext4/inode.c:847!
[  167.984882] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[  167.985624] CPU: 7 PID: 2290 Comm: rep Tainted: G    B             5.16.0-rc5-next-20211217+ #123
[  167.986823] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[  167.988590] RIP: 0010:ext4_getblk+0x17e/0x504
[  167.989189] Code: c6 01 74 28 49 c7 c0 a0 a3 5c 9b b9 4f 03 00 00 48 c7 c2 80 9c 5c 9b 48 c7 c6 40 b6 5c 9b 48 c7 c7 20 a4 5c 9b e8 77 e3 fd ff <0f> 0b 8b 04 244
[  167.991679] RSP: 0018:ffff8881736f7398 EFLAGS: 00010282
[  167.992385] RAX: 0000000000000094 RBX: 1ffff1102e6dee75 RCX: 0000000000000000
[  167.993337] RDX: 0000000000000001 RSI: ffffffff9b6e29e0 RDI: ffffed102e6dee66
[  167.994292] RBP: ffff88816a076210 R08: 0000000000000094 R09: ffffed107363fa09
[  167.995252] R10: ffff88839b1fd047 R11: ffffed107363fa08 R12: ffff88816a0761e8
[  167.996205] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000001
[  167.997158] FS:  00007f6a1428c740(0000) GS:ffff88839b000000(0000) knlGS:0000000000000000
[  167.998238] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  167.999025] CR2: 00007f6a140716c8 CR3: 0000000133216000 CR4: 00000000000006e0
[  167.999987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  168.000944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  168.001899] Call Trace:
[  168.002235]  <TASK>
[  168.007167]  ext4_bread+0xd/0x53
[  168.007612]  ext4_quota_write+0x20c/0x5c0
[  168.010457]  write_blk+0x100/0x220
[  168.010944]  remove_free_dqentry+0x1c6/0x440
[  168.011525]  free_dqentry.isra.0+0x565/0x830
[  168.012133]  remove_tree+0x318/0x6d0
[  168.014744]  remove_tree+0x1eb/0x6d0
[  168.017346]  remove_tree+0x1eb/0x6d0
[  168.019969]  remove_tree+0x1eb/0x6d0
[  168.022128]  qtree_release_dquot+0x291/0x340
[  168.023297]  v2_release_dquot+0xce/0x120
[  168.023847]  dquot_release+0x197/0x3e0
[  168.024358]  ext4_release_dquot+0x22a/0x2d0
[  168.024932]  dqput.part.0+0x1c9/0x900
[  168.025430]  __dquot_drop+0x120/0x190
[  168.025942]  ext4_clear_inode+0x86/0x220
[  168.026472]  ext4_evict_inode+0x9e8/0xa22
[  168.028200]  evict+0x29e/0x4f0
[  168.028625]  dispose_list+0x102/0x1f0
[  168.029148]  evict_inodes+0x2c1/0x3e0
[  168.030188]  generic_shutdown_super+0xa4/0x3b0
[  168.030817]  kill_block_super+0x95/0xd0
[  168.031360]  deactivate_locked_super+0x85/0xd0
[  168.031977]  cleanup_mnt+0x2bc/0x480
[  168.033062]  task_work_run+0xd1/0x170
[  168.033565]  do_exit+0xa4f/0x2b50
[  168.037155]  do_group_exit+0xef/0x2d0
[  168.037666]  __x64_sys_exit_group+0x3a/0x50
[  168.038237]  do_syscall_64+0x3b/0x90
[  168.038751]  entry_SYSCALL_64_after_hwframe+0x44/0xae

In order to reproduce this problem, the following conditions need to be met:
1. Ext4 filesystem with no journal;
2. Filesystem image with incorrect quota data;
3. Abort filesystem forced by user;
4. umount filesystem;

As in ext4_quota_write:
...
         if (EXT4_SB(sb)->s_journal && !handle) {
                 ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)"
                         " cancelled because transaction is not started",
                         (unsigned long long)off, (unsigned long long)len);
                 return -EIO;
         }
...
We only check handle if NULL when filesystem has journal. There is need
check handle if NULL even when filesystem has no journal.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211223015506.297766-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-01-10 13:25:54 -05:00
Sebastian Andrzej Siewior
ab047d516d ext4: destroy ext4_fc_dentry_cachep kmemcache on module removal
The kmemcache for ext4_fc_dentry_cachep remains registered after module
removal.

Destroy ext4_fc_dentry_cachep kmemcache on module removal.

Fixes: aa75f4d3da ("ext4: main fast-commit commit path")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211110134640.lyku5vklvdndw6uk@linutronix.de
Link: https://lore.kernel.org/r/YbiK3JetFFl08bd7@linutronix.de
Link: https://lore.kernel.org/r/20211223164436.2628390-1-bigeasy@linutronix.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-01-10 13:25:54 -05:00
Xin Yin
9725958bb7 ext4: fast commit may miss tracking unwritten range during ftruncate
If use FALLOC_FL_KEEP_SIZE to alloc unwritten range at bottom, the
inode->i_size will not include the unwritten range. When call
ftruncate with fast commit enabled, it will miss to track the
unwritten range.

Change to trace the full range during ftruncate.

Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211223032337.5198-3-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-01-10 13:25:54 -05:00
Xin Yin
0b5b5a62b9 ext4: use ext4_ext_remove_space() for fast commit replay delete range
For now ,we use ext4_punch_hole() during fast commit replay delete range
procedure. But it will be affected by inode->i_size, which may not
correct during fast commit replay procedure. The following test will
failed.

-create & write foo (len 1000K)
-falloc FALLOC_FL_ZERO_RANGE foo (range 400K - 600K)
-create & fsync bar
-falloc FALLOC_FL_PUNCH_HOLE foo (range 300K-500K)
-fsync foo
-crash before a full commit

After the fast_commit reply procedure, the range 400K-500K will not be
removed. Because in this case, when calling ext4_punch_hole() the
inode->i_size is 0, and it just retruns with doing nothing.

Change to use ext4_ext_remove_space() instead of ext4_punch_hole()
to remove blocks of inode directly.

Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211223032337.5198-2-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-01-10 13:25:54 -05:00