This reverts commit ec8a8e23a19bbd6dbaa158c16a60560aec48b95f.
The merge conflicts are now over, so bring it back as the abi can not be
broken.
Bug: 161946584
Change-Id: I16533658044de8e0d862cc282e528ad30fd540da
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit b086d1e82f which is
commit bb663f0f3c upstream.
It breaks the Android kernel abi by turning del_timer() into an
inline function, which breaks the abi. Fix this by putting it back
as needed AND fix up the only use of this new function in
net/hsr/hsr_device.c which is what caused this commit to be backported
to 6.1.91 in the first place.
Bug: 161946584
Change-Id: I033a9b5d57acaffdfc1973f6f77bc4e92675b7e4
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Changes in 6.1.91
dmaengine: pl330: issue_pending waits until WFP state
dmaengine: Revert "dmaengine: pl330: issue_pending waits until WFP state"
wifi: nl80211: don't free NULL coalescing rule
rust: kernel: require `Send` for `Module` implementations
eeprom: at24: Use dev_err_probe for nvmem register failure
eeprom: at24: Probe for DDR3 thermal sensor in the SPD case
eeprom: at24: fix memory corruption race condition
Bluetooth: qca: add support for QCA2066
mm/hugetlb: add folio support to hugetlb specific flag macros
mm: add private field of first tail to struct page and struct folio
mm/hugetlb: add hugetlb_folio_subpool() helpers
mm/hugetlb: add folio_hstate()
mm/hugetlb_cgroup: convert __set_hugetlb_cgroup() to folios
mm/hugetlb_cgroup: convert hugetlb_cgroup_from_page() to folios
mm/hugetlb: convert free_huge_page to folios
mm/hugetlb_cgroup: convert hugetlb_cgroup_uncharge_page() to folios
mm/hugetlb: fix missing hugetlb_lock for resv uncharge
kbuild: refactor host*_flags
kbuild: specify output names separately for each emission type from rustc
cifs: use the least loaded channel for sending requests
smb3: missing lock when picking channel
pinctrl: pinctrl-aspeed-g6: Fix register offset for pinconf of GPIOR-T
pinctrl/meson: fix typo in PDM's pin name
pinctrl: core: delete incorrect free in pinctrl_enable()
pinctrl: mediatek: paris: Fix PIN_CONFIG_INPUT_SCHMITT_ENABLE readback
pinctrl: mediatek: paris: Rework support for PIN_CONFIG_{INPUT,OUTPUT}_ENABLE
sunrpc: add a struct rpc_stats arg to rpc_create_args
nfs: expose /proc/net/sunrpc/nfs in net namespaces
nfs: make the rpc_stat per net namespace
nfs: Handle error of rpc_proc_register() in nfs_net_init().
pinctrl: Introduce struct pinfunction and PINCTRL_PINFUNCTION() macro
pinctrl: intel: Make use of struct pinfunction and PINCTRL_PINFUNCTION()
pinctrl: baytrail: Fix selecting gpio pinctrl state
power: rt9455: hide unused rt9455_boost_voltage_values
power: supply: mt6360_charger: Fix of_match for usb-otg-vbus regulator
pinctrl: devicetree: fix refcount leak in pinctrl_dt_to_map()
regulator: mt6360: De-capitalize devicetree regulator subnodes
regulator: change stubbed devm_regulator_get_enable to return Ok
regulator: change devm_regulator_get_enable_optional() stub to return Ok
bpf, kconfig: Fix DEBUG_INFO_BTF_MODULES Kconfig definition
bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue
nvme: fix warn output about shared namespaces without CONFIG_NVME_MULTIPATH
bpf: Fix a verifier verbose message
spi: introduce new helpers with using modern naming
spi: axi-spi-engine: Convert to platform remove callback returning void
spi: spi-axi-spi-engine: switch to use modern name
spi: spi-axi-spi-engine: Use helper function devm_clk_get_enabled()
spi: axi-spi-engine: simplify driver data allocation
spi: axi-spi-engine: use devm_spi_alloc_host()
spi: axi-spi-engine: move msg state to new struct
spi: axi-spi-engine: use common AXI macros
spi: axi-spi-engine: fix version format string
spi: hisi-kunpeng: Delete the dump interface of data registers in debugfs
bpf, arm64: Fix incorrect runtime stats
s390/mm: Fix storage key clearing for guest huge pages
s390/mm: Fix clearing storage keys for huge pages
xdp: use flags field to disambiguate broadcast redirect
bna: ensure the copied buf is NUL terminated
octeontx2-af: avoid off-by-one read from userspace
nsh: Restore skb->{protocol,data,mac_header} for outer header in nsh_gso_segment().
net l2tp: drop flow hash on forward
s390/vdso: Add CFI for RA register to asm macro vdso_func
net: qede: sanitize 'rc' in qede_add_tc_flower_fltr()
net: qede: use return from qede_parse_flow_attr() for flower
net: qede: use return from qede_parse_flow_attr() for flow_spec
net: qede: use return from qede_parse_actions()
ASoC: meson: axg-fifo: use FIELD helpers
ASoC: meson: axg-fifo: use threaded irq to check periods
ASoC: meson: axg-card: make links nonatomic
ASoC: meson: axg-tdm-interface: manage formatters in trigger
ASoC: meson: cards: select SND_DYNAMIC_MINORS
ALSA: hda: intel-sdw-acpi: fix usage of device_get_named_child_node()
s390/cio: Ensure the copied buf is NUL terminated
cxgb4: Properly lock TX queue for the selftest.
net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341
spi: fix null pointer dereference within spi_sync
net: bridge: fix multicast-to-unicast with fraglist GSO
net: core: reject skb_copy(_expand) for fraglist GSO skbs
tipc: fix a possible memleak in tipc_buf_append
vxlan: Pull inner IP header in vxlan_rcv().
s390/qeth: Fix kernel panic after setting hsuid
drm/panel: ili9341: Respect deferred probe
drm/panel: ili9341: Use predefined error codes
net: gro: add flush check in udp_gro_receive_segment
clk: sunxi-ng: h6: Reparent CPUX during PLL CPUX rate change
powerpc/pseries: replace kmalloc with kzalloc in PLPKS driver
powerpc/pseries: Move PLPKS constants to header file
powerpc/pseries: make max polling consistent for longer H_CALLs
powerpc/pseries/iommu: LPAR panics during boot up with a frozen PE
KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id
KVM: arm64: vgic-v2: Check for non-NULL vCPU in vgic_v2_parse_attr()
scsi: lpfc: Move NPIV's transport unregistration to after resource clean up
scsi: lpfc: Update lpfc_ramp_down_queue_handler() logic
scsi: lpfc: Replace hbalock with ndlp lock in lpfc_nvme_unregister_port()
scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up()
gfs2: Fix invalid metadata access in punch_hole
wifi: mac80211: fix ieee80211_bss_*_flags kernel-doc
wifi: cfg80211: fix rdev_dump_mpp() arguments order
net: mark racy access on sk->sk_rcvbuf
scsi: mpi3mr: Avoid memcpy field-spanning write WARNING
scsi: bnx2fc: Remove spin_lock_bh while releasing resources after upload
btrfs: return accurate error code on open failure in open_fs_devices()
bpf: Check bloom filter map value size
kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries
scsi: ufs: core: WLUN suspend dev/link state error recovery
ALSA: line6: Zero-initialize message buffers
block: fix overflow in blk_ioctl_discard()
net: bcmgenet: Reset RBUF on first open
ata: sata_gemini: Check clk_enable() result
firewire: ohci: mask bus reset interrupts between ISR and bottom half
tools/power turbostat: Fix added raw MSR output
tools/power turbostat: Increase the limit for fd opened
tools/power turbostat: Fix Bzy_MHz documentation typo
btrfs: make btrfs_clear_delalloc_extent() free delalloc reserve
btrfs: always clear PERTRANS metadata during commit
memblock tests: fix undefined reference to `early_pfn_to_nid'
memblock tests: fix undefined reference to `panic'
memblock tests: fix undefined reference to `BIT'
scsi: target: Fix SELinux error when systemd-modules loads the target module
blk-iocost: avoid out of bounds shift
gpu: host1x: Do not setup DMA for virtual devices
MIPS: scall: Save thread_info.syscall unconditionally on entry
tools/power/turbostat: Fix uncore frequency file string
drm/amdgpu: Refine IB schedule error logging
selftests: timers: Fix valid-adjtimex signed left-shift undefined behavior
Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl
uio_hv_generic: Don't free decrypted memory
Drivers: hv: vmbus: Don't free ring buffers that couldn't be re-encrypted
iommu: mtk: fix module autoloading
fs/9p: only translate RWX permissions for plain 9P2000
fs/9p: translate O_TRUNC into OTRUNC
9p: explicitly deny setlease attempts
gpio: wcove: Use -ENOTSUPP consistently
gpio: crystalcove: Use -ENOTSUPP consistently
clk: Don't hold prepare_lock when calling kref_put()
fs/9p: drop inodes immediately on non-.L too
drm/nouveau/dp: Don't probe eDP ports twice harder
net:usb:qmi_wwan: support Rolling modules
kbuild: rust: avoid creating temporary files
spi: Merge spi_controller.{slave,target}_abort()
perf unwind-libunwind: Fix base address for .eh_frame
perf unwind-libdw: Handle JIT-generated DSOs properly
qibfs: fix dentry leak
xfrm: Preserve vlan tags for transport mode software GRO
ARM: 9381/1: kasan: clear stale stack poison
tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets
tcp: Use refcount_inc_not_zero() in tcp_twsk_unique().
Bluetooth: Fix use-after-free bugs caused by sco_sock_timeout
Bluetooth: msft: fix slab-use-after-free in msft_do_close()
Bluetooth: l2cap: fix null-ptr-deref in l2cap_chan_timeout
net: ks8851: Queue RX packets in IRQ handler instead of disabling BHs
rtnetlink: Correct nested IFLA_VF_VLAN_LIST attribute validation
hwmon: (corsair-cpro) Use a separate buffer for sending commands
hwmon: (corsair-cpro) Use complete_all() instead of complete() in ccp_raw_event()
hwmon: (corsair-cpro) Protect ccp->wait_input_report with a spinlock
phonet: fix rtm_phonet_notify() skb allocation
net: bridge: fix corrupted ethernet header on multicast-to-unicast
ipv6: fib6_rules: avoid possible NULL dereference in fib6_rule_action()
timers: Get rid of del_singleshot_timer_sync()
timers: Rename del_timer() to timer_delete()
net-sysfs: convert dev->operstate reads to lockless ones
hsr: Simplify code for announcing HSR nodes timer setup
ipv6: annotate data-races around cnf.disable_ipv6
ipv6: prevent NULL dereference in ip6_output()
net/smc: fix neighbour and rtable leak in smc_ib_find_route()
net: hns3: using user configure after hardware reset
net: hns3: direct return when receive a unknown mailbox message
net: hns3: change type of numa_node_mask as nodemask_t
net: hns3: release PTP resources if pf initialization failed
net: hns3: use appropriate barrier function after setting a bit value
net: hns3: fix port vlan filter not disabled issue
net: hns3: fix kernel crash when devlink reload during initialization
drm/meson: dw-hdmi: power up phy on device init
drm/meson: dw-hdmi: add bandgap setting for g12
drm/connector: Add \n to message about demoting connector force-probes
dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users
gpiolib: cdev: Add missing header(s)
gpiolib: cdev: relocate debounce_period_us from struct gpio_desc
gpiolib: cdev: fix uninitialised kfifo
drm/amd/display: Atom Integrated System Info v2_2 for DCN35
MAINTAINERS: add leah to 6.1 MAINTAINERS file
drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2
btrfs: fix kvcalloc() arguments order in btrfs_ioctl_send()
firewire: nosy: ensure user_length is taken into account when fetching packet contents
Reapply "drm/qxl: simplify qxl_fence_wait"
rust: error: Rename to_kernel_errno() -> to_errno()
rust: fix regexp in scripts/is_rust_module.sh
btf, scripts: rust: drop is_rust_module.sh
rust: module: place generated init_module() function in .init.text
rust: macros: fix soundness issue in `module!` macro
usb: typec: ucsi: Check for notifications after init
usb: typec: ucsi: Fix connector check on init
usb: Fix regression caused by invalid ep0 maxpacket in virtual SuperSpeed device
usb: ohci: Prevent missed ohci interrupts
USB: core: Fix access violation during port device removal
usb: gadget: composite: fix OS descriptors w_value logic
usb: gadget: f_fs: Fix a race condition when processing setup packets.
usb: xhci-plat: Don't include xhci.h
usb: dwc3: core: Prevent phy suspend during init
usb: typec: tcpm: unregister existing source caps before re-registration
usb: typec: tcpm: Check for port partner validity before consuming it
ALSA: hda/realtek: Fix mute led of HP Laptop 15-da3001TU
btrfs: add missing mutex_unlock in btrfs_relocate_sys_chunks()
mm/slab: make __free(kfree) accept error pointers
mptcp: ensure snd_nxt is properly initialized on connect
dt-bindings: iio: health: maxim,max30102: fix compatible check
iio:imu: adis16475: Fix sync mode setting
iio: accel: mxc4005: Interrupt handling fixes
kmsan: compiler_types: declare __no_sanitize_or_inline
tipc: fix UAF in error path
ASoC: tegra: Fix DSPK 16-bit playback
ASoC: ti: davinci-mcasp: Fix race condition during probe
dyndbg: fix old BUG_ON in >control parser
slimbus: qcom-ngd-ctrl: Add timeout for wait operation
mei: me: add lunar lake point M DID
drm/amdkfd: don't allow mapping the MMIO HDP page with large pages
drm/vmwgfx: Fix invalid reads in fence signaled events
drm/i915/bios: Fix parsing backlight BDB data
drm/amd/display: Handle Y carry-over in VCP X.Y calculation
net: fix out-of-bounds access in ops_init
hwmon: (pmbus/ucd9000) Increase delay from 250 to 500us
mm: use memalloc_nofs_save() in page_cache_ra_order()
regulator: core: fix debugfs creation regression
spi: microchip-core-qspi: fix setting spi bus clock rate
ksmbd: off ipv6only for both ipv4/ipv6 binding
ksmbd: avoid to send duplicate lease break notifications
ksmbd: do not grant v2 lease if parent lease key and epoch are not set
Bluetooth: qca: add missing firmware sanity checks
Bluetooth: qca: fix NVM configuration parsing
Bluetooth: qca: fix info leak when fetching board id
Bluetooth: qca: fix info leak when fetching fw build id
Bluetooth: qca: fix firmware check error path
VFIO: Add the SPR_DSA and SPR_IAX devices to the denylist
dmaengine: idxd: add a new security check to deal with a hardware erratum
dmaengine: idxd: add a write() method for applications to submit work
keys: Fix overwrite of key expiration on instantiation
btrfs: do not wait for short bulk allocation
mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()
mm,swapops: update check in is_pfn_swap_entry for hwpoison entries
md: fix kmemleak of rdev->serial
net: bcmgenet: Clear RGMII_LINK upon link down
net: bcmgenet: synchronize EXT_RGMII_OOB_CTRL access
net: bcmgenet: synchronize use of bcmgenet_set_rx_mode()
net: bcmgenet: synchronize UMAC_CMD access
Linux 6.1.91
Change-Id: I71c08414d3580e6d9b869a8f0fc3e27f02752997
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 5f29666f69.
It causes merge conflicts in 6.1.91. Will be brought back after that
merge happens.
Change-Id: I2b6335bf5296d51bd08f92816aa1c98c92b822a8
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 113d5341ee which is
commit 9b13df3fb6 upstream.
It breaks the Android kernel abi by turning del_timer_sync() into an
inline function, which breaks the abi. Fix this by putting it back as
needed AND fix up the only use of this new function in
drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c which is
what caused this commit to be backported to 5.4.274 in the first place.
Bug: 161946584
Change-Id: Icd26c7c81e6172f36eeeb69827989bfab1d32afe
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 9a5a305686 ]
del_singleshot_timer_sync() used to be an optimization for deleting timers
which are not rearmed from the timer callback function.
This optimization turned out to be broken and got mapped to
del_timer_sync() about 17 years ago.
Get rid of the undocumented indirection and use del_timer_sync() directly.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Link: https://lore.kernel.org/r/20221123201624.706987932@linutronix.de
Stable-dep-of: 4893b8b3ef8d ("hsr: Simplify code for announcing HSR nodes timer setup")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 6.1.84
x86/cpu: Support AMD Automatic IBRS
x86/bugs: Use sysfs_emit()
KVM: x86: Update KVM-only leaf handling to allow for 100% KVM-only leafs
KVM: x86: Advertise CPUID.(EAX=7,ECX=2):EDX[5:0] to userspace
KVM: x86: Use a switch statement and macros in __feature_translate()
timers: Update kernel-doc for various functions
timers: Use del_timer_sync() even on UP
timers: Rename del_timer_sync() to timer_delete_sync()
wifi: brcmfmac: Fix use-after-free bug in brcmf_cfg80211_detach
media: staging: ipu3-imgu: Set fields before media_entity_pads_init()
arm64: dts: qcom: sc7280: Add additional MSI interrupts
remoteproc: virtio: Fix wdg cannot recovery remote processor
clk: qcom: gcc-sdm845: Add soft dependency on rpmhpd
smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr()
smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity()
arm: dts: marvell: Fix maxium->maxim typo in brownstone dts
drm/vmwgfx: Fix possible null pointer derefence with invalid contexts
serial: max310x: fix NULL pointer dereference in I2C instantiation
pci_iounmap(): Fix MMIO mapping leak
media: xc4000: Fix atomicity violation in xc4000_get_frequency
media: mc: Add local pad to pipeline regardless of the link state
media: mc: Fix flags handling when creating pad links
media: mc: Add num_links flag to media_pad
media: mc: Rename pad variable to clarify intent
media: mc: Expand MUST_CONNECT flag to always require an enabled link
KVM: Always flush async #PF workqueue when vCPU is being destroyed
cpufreq: amd-pstate: Fix min_perf assignment in amd_pstate_adjust_perf()
powerpc/smp: Adjust nr_cpu_ids to cover all threads of a core
powerpc/smp: Increase nr_cpu_ids to include the boot CPU
sparc64: NMI watchdog: fix return value of __setup handler
sparc: vDSO: fix return value of __setup handler
crypto: qat - fix double free during reset
crypto: qat - resolve race condition during AER recovery
selftests/mqueue: Set timeout to 180 seconds
ext4: correct best extent lstart adjustment logic
block: Clear zone limits for a non-zoned stacked queue
kasan/test: avoid gcc warning for intentional overflow
bounds: support non-power-of-two CONFIG_NR_CPUS
fat: fix uninitialized field in nostale filehandles
ubifs: Set page uptodate in the correct place
ubi: Check for too small LEB size in VTBL code
ubi: correct the calculation of fastmap size
mtd: rawnand: meson: fix scrambling mode value in command macro
parisc/unaligned: Rewrite 64-bit inline assembly of emulate_ldd()
parisc: Avoid clobbering the C/B bits in the PSW with tophys and tovirt macros
parisc: Fix ip_fast_csum
parisc: Fix csum_ipv6_magic on 32-bit systems
parisc: Fix csum_ipv6_magic on 64-bit systems
parisc: Strip upper 32 bit of sum in csum_ipv6_magic for 64-bit builds
md/raid5: fix atomicity violation in raid5_cache_count
cpufreq: Limit resolving a frequency to policy min/max
PM: suspend: Set mem_sleep_current during kernel command line setup
clk: qcom: gcc-ipq6018: fix terminating of frequency table arrays
clk: qcom: gcc-ipq8074: fix terminating of frequency table arrays
clk: qcom: mmcc-apq8084: fix terminating of frequency table arrays
clk: qcom: mmcc-msm8974: fix terminating of frequency table arrays
usb: xhci: Add error handling in xhci_map_urb_for_dma
powerpc/fsl: Fix mfpmr build errors with newer binutils
USB: serial: ftdi_sio: add support for GMC Z216C Adapter IR-USB
USB: serial: add device ID for VeriFone adapter
USB: serial: cp210x: add ID for MGP Instruments PDS100
USB: serial: option: add MeiG Smart SLM320 product
KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
USB: serial: cp210x: add pid/vid for TDK NC0110013M and MM0110113M
PM: sleep: wakeirq: fix wake irq warning in system suspend
mmc: tmio: avoid concurrent runs of mmc_request_done()
fuse: fix root lookup with nonzero generation
fuse: don't unhash root
usb: typec: ucsi: Clean up UCSI_CABLE_PROP macros
serial: Lock console when calling into driver before registration
btrfs: qgroup: always free reserved space for extent records
btrfs: fix off-by-one chunk length calculation at contains_pending_extent()
PCI/PM: Drain runtime-idle callbacks before driver removal
PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root Ports
dm-raid: fix lockdep waring in "pers->hot_add_disk"
powerpc: xor_vmx: Add '-mhard-float' to CFLAGS
mac802154: fix llsec key resources release in mac802154_llsec_key_del
swap: comments get_swap_device() with usage rule
mm: swap: fix race between free_swap_and_cache() and swapoff()
mmc: core: Fix switch on gp3 partition
drm/etnaviv: Restore some id values
landlock: Warn once if a Landlock action is requested while disabled
hwmon: (amc6821) add of_match table
ext4: fix corruption during on-line resize
nvmem: meson-efuse: fix function pointer type mismatch
slimbus: core: Remove usage of the deprecated ida_simple_xx() API
phy: tegra: xusb: Add API to retrieve the port number of phy
usb: gadget: tegra-xudc: Fix USB3 PHY retrieval logic
speakup: Fix 8bit characters from direct synth
PCI/AER: Block runtime suspend when handling errors
io_uring/net: correctly handle multishot recvmsg retry setup
sparc: Explicitly include correct DT includes
sparc32: Fix parport build with sparc32
nfs: fix UAF in direct writes
kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1
PCI: qcom: Rename qcom_pcie_config_sid_sm8250() to reflect IP version
PCI: qcom: Enable BDF to SID translation properly
PCI: dwc: endpoint: Fix advertised resizable BAR size
PCI: hv: Fix ring buffer size calculation
vfio: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable
vfio/pci: Remove negative check on unsigned vector
vfio/pci: Lock external INTx masking ops
vfio/platform: Disable virqfds on cleanup
ksmbd: retrieve number of blocks using vfs_getattr in set_file_allocation_info
ring-buffer: Fix waking up ring buffer readers
ring-buffer: Do not set shortest_full when full target is hit
ring-buffer: Fix resetting of shortest_full
ring-buffer: Fix full_waiters_pending in poll
ring-buffer: Use wait_event_interruptible() in ring_buffer_wait()
soc: fsl: qbman: Always disable interrupts when taking cgr_lock
soc: fsl: qbman: Use raw spinlock for cgr_lock
s390/zcrypt: fix reference counting on zcrypt card objects
drm/probe-helper: warn about negative .get_modes()
drm/panel: do not return negative error codes from drm_panel_get_modes()
drm/exynos: do not return negative values from .get_modes()
drm/imx/ipuv3: do not return negative values from .get_modes()
drm/vc4: hdmi: do not return negative values from .get_modes()
memtest: use {READ,WRITE}_ONCE in memory scanning
Revert "block/mq-deadline: use correct way to throttling write requests"
f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flag
f2fs: truncate page cache before clearing flags when aborting atomic write
nilfs2: fix failure to detect DAT corruption in btree and direct mappings
nilfs2: prevent kernel bug at submit_bh_wbc()
cifs: open_cached_dir(): add FILE_READ_EA to desired access
cpufreq: dt: always allocate zeroed cpumask
x86/CPU/AMD: Update the Zenbleed microcode revisions
NFSD: Fix nfsd_clid_class use of __string_len() macro
net: hns3: tracing: fix hclgevf trace event strings
LoongArch: Change __my_cpu_offset definition to avoid mis-optimization
LoongArch: Define the __io_aw() hook as mmiowb()
wireguard: netlink: check for dangling peer via is_dead instead of empty list
wireguard: netlink: access device through ctx instead of peer
ahci: asm1064: correct count of reported ports
ahci: asm1064: asm1166: don't limit reported ports
drm/amdgpu: amdgpu_ttm_gart_bind set gtt bound flag
drm/amd/display: Return the correct HDCP error code
drm/amd/display: Fix noise issue on HDMI AV mute
dm snapshot: fix lockup in dm_exception_table_exit
x86/pm: Work around false positive kmemleak report in msr_build_context()
cpufreq: brcmstb-avs-cpufreq: fix up "add check for cpufreq_cpu_get's return value"
platform/x86: p2sb: On Goldmont only cache P2SB and SPI devfn BAR
tls: fix race between tx work scheduling and socket close
netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
netfilter: nf_tables: disallow anonymous set with timeout flag
netfilter: nf_tables: reject constant set with timeout
Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memory
xfrm: Avoid clang fortify warning in copy_to_user_tmpl()
init/Kconfig: lower GCC version check for -Warray-bounds
KVM: x86: Mark target gfn of emulated atomic instruction as dirty
KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region()
tracing: Use .flush() call to wake up readers
drm/amdgpu/pm: Fix the error of pwm1_enable setting
drm/i915: Check before removing mm notifier
ALSA: hda/realtek - Fix headset Mic no show at resume back for Lenovo ALC897 platform
USB: usb-storage: Prevent divide-by-0 error in isd200_ata_command
usb: gadget: ncm: Fix handling of zero block length packets
usb: port: Don't try to peer unused USB ports based on location
tty: serial: fsl_lpuart: avoid idle preamble pending if CTS is enabled
misc: lis3lv02d_i2c: Fix regulators getting en-/dis-abled twice on suspend/resume
mei: me: add arrow lake point S DID
mei: me: add arrow lake point H DID
vt: fix unicode buffer corruption when deleting characters
fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion
ALSA: hda/realtek - Add Headset Mic supported Acer NB platform
ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook
tee: optee: Fix kernel panic caused by incorrect error handling
mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
iio: accel: adxl367: fix DEVID read after reset
iio: accel: adxl367: fix I2C FIFO data register
i2c: i801: Avoid potential double call to gpiod_remove_lookup_table
drm/amd/display: handle range offsets in VRR ranges
x86/efistub: Call mixed mode boot services on the firmware's stack
net: tls: handle backlogging of crypto requests
ASoC: amd: yc: Revert "Fix non-functional mic on Lenovo 21J2"
iommu: Avoid races around default domain allocations
clocksource/drivers/arm_global_timer: Fix maximum prescaler value
entry: Respect changes to system call number by trace_sys_enter()
minmax: add umin(a, b) and umax(a, b)
swiotlb: Fix alignment checks when both allocation and DMA masks are present
iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
printk: Update @console_may_schedule in console_trylock_spinning()
irqchip/renesas-rzg2l: Implement restriction when writing ISCR register
irqchip/renesas-rzg2l: Flush posted write in irq_eoi()
irqchip/renesas-rzg2l: Add macro to retrieve TITSR register offset based on register's index
irqchip/renesas-rzg2l: Rename rzg2l_tint_eoi()
irqchip/renesas-rzg2l: Rename rzg2l_irq_eoi()
irqchip/renesas-rzg2l: Prevent spurious interrupts when setting trigger type
kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address
efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address
x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
efi: fix panic in kdump kernel
pwm: img: fix pwm clock lookup
tty: serial: imx: Fix broken RS485
block: Fix page refcounts for unaligned buffers in __bio_release_pages()
blk-mq: release scheduler resource when request completes
selftests: mptcp: diag: return KSFT_FAIL not test_cnt
vfio/pci: Disable auto-enable of exclusive INTx IRQ
vfio: Introduce interface to flush virqfd inject workqueue
vfio/pci: Create persistent INTx handler
vfio/platform: Create persistent IRQ handlers
vfio/fsl-mc: Block calling interrupt handler without trigger
x86/coco: Export cc_vendor
x86/coco: Get rid of accessor functions
x86/Kconfig: Remove CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
x86/sev: Fix position dependent variable references in startup code
mm/migrate: set swap entry values of THP tail pages properly.
init: open /initrd.image with O_LARGEFILE
x86/efistub: Add missing boot_params for mixed mode compat entry
efi/libstub: Cast away type warning in use of max()
btrfs: zoned: don't skip block groups with 100% zone unusable
btrfs: zoned: use zone aware sb location for scrub
wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes
wifi: iwlwifi: fw: don't always use FW dump trig
exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack()
hexagon: vmlinux.lds.S: handle attributes section
mmc: sdhci-omap: re-tuning is needed after a pm transition to support emmc HS200 mode
mmc: core: Initialize mmc_blk_ioc_data
mmc: core: Avoid negative index with array access
block: Do not force full zone append completion in req_bio_endio()
thermal: devfreq_cooling: Fix perf state when calculate dfc res_util
nouveau/dmem: handle kcalloc() allocation failure
net: ll_temac: platform_get_resource replaced by wrong function
drm/vmwgfx: Create debugfs ttm_resource_manager entry only if needed
drm/amdkfd: fix TLB flush after unmap for GFX9.4.2
drm/i915/bios: Tolerate devdata==NULL in intel_bios_encoder_supports_dp_dual_mode()
drm/i915/gt: Reset queue_priority_hint on parking
Bluetooth: hci_sync: Fix not checking error on hci_cmd_sync_cancel_sync
Revert "usb: phy: generic: Get the vbus supply"
usb: cdc-wdm: close race between read and workqueue
USB: UAS: return ENODEV when submit urbs fail with device not attached
usb: dwc3-am62: Rename private data
usb: dwc3-am62: fix module unload/reload behavior
ALSA: sh: aica: reorder cleanup operations to avoid UAF bugs
scsi: core: Fix unremoved procfs host directory regression
staging: vc04_services: changen strncpy() to strscpy_pad()
staging: vc04_services: fix information leak in create_component()
USB: core: Add hub_get() and hub_put() routines
USB: core: Fix deadlock in port "disable" sysfs attribute
scsi: sd: Fix TCG OPAL unlock on system resume
usb: dwc2: host: Fix remote wakeup from hibernation
usb: dwc2: host: Fix hibernation flow
usb: dwc2: host: Fix ISOC flow in DDMA mode
usb: dwc2: gadget: Fix exiting from clock gating
usb: dwc2: gadget: LPM flow fix
usb: udc: remove warning when queue disabled ep
usb: typec: Return size of buffer if pd_set operation succeeds
usb: typec: ucsi: Clear EVENT_PENDING under PPM lock
usb: typec: ucsi: Ack unsupported commands
usb: typec: ucsi_acpi: Refactor and fix DELL quirk
usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset
scsi: qla2xxx: Prevent command send on chip reset
scsi: qla2xxx: Fix N2N stuck connection
scsi: qla2xxx: Split FCE|EFT trace control
scsi: qla2xxx: Update manufacturer detail
scsi: qla2xxx: NVME|FCP prefer flag not being honored
scsi: qla2xxx: Fix command flush on cable pull
scsi: qla2xxx: Fix double free of fcport
scsi: qla2xxx: Change debug message during driver unload
scsi: qla2xxx: Delay I/O Abort on PCI error
x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled
tls: fix use-after-free on failed backlog decryption
scsi: lpfc: Correct size for cmdwqe/rspwqe for memset()
scsi: lpfc: Correct size for wqe for memset()
scsi: libsas: Add a helper sas_get_sas_addr_and_dev_type()
scsi: libsas: Fix disk not being scanned in after being removed
x86/sev: Skip ROM range scans and validation for SEV-SNP guests
USB: core: Fix deadlock in usb_deauthorize_interface()
tools/resolve_btfids: fix build with musl libc
Linux 6.1.84
Change-Id: I2aa458588d512ce908a9b087cdc66b345cef83a9
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Export hrtimer_expire_entry/exit tracepoints, so that vendor modules
can register probes for these tracepoints.
When core stop, we need hrtimer when last hrtimer is called.
Also, rcu stall issue needs hrtimer infos.
For this reason, we have stored hrtimer infos.
Bug: 205928005
Change-Id: I739f369d3b56e09f8e9061fefdf25830e37e987e
Signed-off-by: Changki Kim <changki.kim@samsung.com>
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
timer wheel calculates the index for any timer based on the expiry
value and level granularity of the timer. Due to the level granularity
timer will not fire at the exact time instead expire at a time value
expires + granularity. This is done in the timer code when the index for
each timer is calculated based on the expiry and granularity at each
level:
expires = (expires >> LVL_SHIFT(lvl)) + 1;
For devfreq drivers the requirement is to fire the timer at the exact
time. If the timer does not expire at the exact time then it'll take
much longer to react and increase the device frequency. Devfreq driver
registers timer for 10ms expiry and due to slack in timer code the
expirty happens at 20 ms. For eg: Frame rendering time is 16ms.
If devfreq driver reacts after 20ms instead of 10ms, that's
way past a frame rendering time.
Timers with 10ms to 630ms expiry fall under level 0, to overcome the
granularity issue for level 0 with low expirty values do not add the
granularity by introducing a new calc_index vendor hook.
Bug: 178758017
Change-Id: I13cdf541e4c1bd426ce28b7a8a17cb8381eb2a92
Signed-off-by: Huang Yiwei <quic_hyiwei@quicinc.com>
(cherry picked from commit 1855071010)
[quic_satyap@quicinc.com: fix minor merge conflict]
Signed-off-by: Satya Durga Srinivasu Prabhala <quic_satyap@quicinc.com>
Pull random number generator updates from Jason Donenfeld:
"These updates continue to refine the work began in 5.17 and 5.18 of
modernizing the RNG's crypto and streamlining and documenting its
code.
New for 5.19, the updates aim to improve entropy collection methods
and make some initial decisions regarding the "premature next" problem
and our threat model. The cloc utility now reports that random.c is
931 lines of code and 466 lines of comments, not that basic metrics
like that mean all that much, but at the very least it tells you that
this is very much a manageable driver now.
Here's a summary of the various updates:
- The random_get_entropy() function now always returns something at
least minimally useful. This is the primary entropy source in most
collectors, which in the best case expands to something like RDTSC,
but prior to this change, in the worst case it would just return 0,
contributing nothing. For 5.19, additional architectures are wired
up, and architectures that are entirely missing a cycle counter now
have a generic fallback path, which uses the highest resolution
clock available from the timekeeping subsystem.
Some of those clocks can actually be quite good, despite the CPU
not having a cycle counter of its own, and going off-core for a
stamp is generally thought to increase jitter, something positive
from the perspective of entropy gathering. Done very early on in
the development cycle, this has been sitting in next getting some
testing for a while now and has relevant acks from the archs, so it
should be pretty well tested and fine, but is nonetheless the thing
I'll be keeping my eye on most closely.
- Of particular note with the random_get_entropy() improvements is
MIPS, which, on CPUs that lack the c0 count register, will now
combine the high-speed but short-cycle c0 random register with the
lower-speed but long-cycle generic fallback path.
- With random_get_entropy() now always returning something useful,
the interrupt handler now collects entropy in a consistent
construction.
- Rather than comparing two samples of random_get_entropy() for the
jitter dance, the algorithm now tests many samples, and uses the
amount of differing ones to determine whether or not jitter entropy
is usable and how laborious it must be. The problem with comparing
only two samples was that if the cycle counter was extremely slow,
but just so happened to be on the cusp of a change, the slowness
wouldn't be detected. Taking many samples fixes that to some
degree.
This, combined with the other improvements to random_get_entropy(),
should make future unification of /dev/random and /dev/urandom
maybe more possible. At the very least, were we to attempt it again
today (we're not), it wouldn't break any of Guenter's test rigs
that broke when we tried it with 5.18. So, not today, but perhaps
down the road, that's something we can revisit.
- We attempt to reseed the RNG immediately upon waking up from system
suspend or hibernation, making use of the various timestamps about
suspend time and such available, as well as the usual inputs such
as RDRAND when available.
- Batched randomness now falls back to ordinary randomness before the
RNG is initialized. This provides more consistent guarantees to the
types of random numbers being returned by the various accessors.
- The "pre-init injection" code is now gone for good. I suspect you
in particular will be happy to read that, as I recall you
expressing your distaste for it a few months ago. Instead, to avoid
a "premature first" issue, while still allowing for maximal amount
of entropy availability during system boot, the first 128 bits of
estimated entropy are used immediately as it arrives, with the next
128 bits being buffered. And, as before, after the RNG has been
fully initialized, it winds up reseeding anyway a few seconds later
in most cases. This resulted in a pretty big simplification of the
initialization code and let us remove various ad-hoc mechanisms
like the ugly crng_pre_init_inject().
- The RNG no longer pretends to handle the "premature next" security
model, something that various academics and other RNG designs have
tried to care about in the past. After an interesting mailing list
thread, these issues are thought to be a) mainly academic and not
practical at all, and b) actively harming the real security of the
RNG by delaying new entropy additions after a potential compromise,
making a potentially bad situation even worse. As well, in the
first place, our RNG never even properly handled the premature next
issue, so removing an incomplete solution to a fake problem was
particularly nice.
This allowed for numerous other simplifications in the code, which
is a lot cleaner as a consequence. If you didn't see it before,
https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/ may be a
thread worth skimming through.
- While the interrupt handler received a separate code path years ago
that avoids locks by using per-cpu data structures and a faster
mixing algorithm, in order to reduce interrupt latency, input and
disk events that are triggered in hardirq handlers were still
hitting locks and more expensive algorithms. Those are now
redirected to use the faster per-cpu data structures.
- Rather than having the fake-crypto almost-siphash-based random32
implementation be used right and left, and in many places where
cryptographically secure randomness is desirable, the batched
entropy code is now fast enough to replace that.
- As usual, numerous code quality and documentation cleanups. For
example, the initialization state machine now uses enum symbolic
constants instead of just hard coding numbers everywhere.
- Since the RNG initializes once, and then is always initialized
thereafter, a pretty heavy amount of code used during that
initialization is never used again. It is now completely cordoned
off using static branches and it winds up in the .text.unlikely
section so that it doesn't reduce cache compactness after the RNG
is ready.
- A variety of functions meant for waiting on the RNG to be
initialized were only used by vsprintf, and in not a particularly
optimal way. Replacing that usage with a more ordinary setup made
it possible to remove those functions.
- A cleanup of how we warn userspace about the use of uninitialized
/dev/urandom and uninitialized get_random_bytes() usage.
Interestingly, with the change you merged for 5.18 that attempts to
use jitter (but does not block if it can't), the majority of users
should never see those warnings for /dev/urandom at all now, and
the one for in-kernel usage is mainly a debug thing.
- The file_operations struct for /dev/[u]random now implements
.read_iter and .write_iter instead of .read and .write, allowing it
to also implement .splice_read and .splice_write, which makes
splice(2) work again after it was broken here (and in many other
places in the tree) during the set_fs() removal. This was a bit of
a last minute arrival from Jens that hasn't had as much time to
bake, so I'll be keeping my eye on this as well, but it seems
fairly ordinary. Unfortunately, read_iter() is around 3% slower
than read() in my tests, which I'm not thrilled about. But Jens and
Al, spurred by this observation, seem to be making progress in
removing the bottlenecks on the iter paths in the VFS layer in
general, which should remove the performance gap for all drivers.
- Assorted other bug fixes, cleanups, and optimizations.
- A small SipHash cleanup"
* tag 'random-5.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (49 commits)
random: check for signals after page of pool writes
random: wire up fops->splice_{read,write}_iter()
random: convert to using fops->write_iter()
random: convert to using fops->read_iter()
random: unify batched entropy implementations
random: move randomize_page() into mm where it belongs
random: remove mostly unused async readiness notifier
random: remove get_random_bytes_arch() and add rng_has_arch_random()
random: move initialization functions out of hot pages
random: make consistent use of buf and len
random: use proper return types on get_random_{int,long}_wait()
random: remove extern from functions in header
random: use static branch for crng_ready()
random: credit architectural init the exact amount
random: handle latent entropy and command line from random_init()
random: use proper jiffies comparison macro
random: remove ratelimiting for in-kernel unseeded randomness
random: move initialization out of reseeding hot path
random: avoid initializing twice in credit race
random: use symbolic constants for crng_init states
...
random32.c has two random number generators in it: one that is meant to
be used deterministically, with some predefined seed, and one that does
the same exact thing as random.c, except does it poorly. The first one
has some use cases. The second one no longer does and can be replaced
with calls to random.c's proper random number generator.
The relatively recent siphash-based bad random32.c code was added in
response to concerns that the prior random32.c was too deterministic.
Out of fears that random.c was (at the time) too slow, this code was
anonymously contributed. Then out of that emerged a kind of shadow
entropy gathering system, with its own tentacles throughout various net
code, added willy nilly.
Stop👏making👏bespoke👏random👏number👏generators👏.
Fortunately, recent advances in random.c mean that we can stop playing
with this sketchiness, and just use get_random_u32(), which is now fast
enough. In micro benchmarks using RDPMC, I'm seeing the same median
cycle count between the two functions, with the mean being _slightly_
higher due to batches refilling (which we can optimize further need be).
However, when doing *real* benchmarks of the net functions that actually
use these random numbers, the mean cycles actually *decreased* slightly
(with the median still staying the same), likely because the additional
prandom code means icache misses and complexity, whereas random.c is
generally already being used by something else nearby.
The biggest benefit of this is that there are many users of prandom who
probably should be using cryptographically secure random numbers. This
makes all of those accidental cases become secure by just flipping a
switch. Later on, we can do a tree-wide cleanup to remove the static
inline wrapper functions that this commit adds.
There are also some low-ish hanging fruits for making this even faster
in the future: a get_random_u16() function for use in the networking
stack will give a 2x performance boost there, using SIMD for ChaCha20
will let us compute 4 or 8 or 16 blocks of output in parallel, instead
of just one, giving us large buffers for cheap, and introducing a
get_random_*_bh() function that assumes irqs are already disabled will
shave off a few cycles for ordinary calls. These are things we can chip
away at down the road.
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
With debugobjects enabled the timer hint for freeing of active timers
embedded inside delayed works is always the same, i.e. the hint is
delayed_work_timer_fn, even though the function the delayed work is going
to run can be wildly different depending on what work was queued. Enabling
workqueue debugobjects doesn't help either because the delayed work isn't
considered active until it is actually queued to run on a workqueue. If the
work is freed while the timer is pending the work isn't considered active
so there is no information from workqueue debugobjects.
Special case delayed works in the timer debugobjects hint logic so that the
delayed work function is returned instead of the delayed_work_timer_fn.
This will help to understand which delayed work was pending that got
freed.
Apply the same treatment for kthread_delayed_work because it follows the
same pattern.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220511201951.42408-1-swboyd@chromium.org
The level granularity round up of calc_index() does:
(x + (1 << n)) >> n
which is obviously equivalent to
(x >> n) + 1
but compilers can't figure that out despite the fact that the input range
is known to not cause an overflow. It's neither intuitive to read.
Just write out the obvious.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/87h778j46c.ffs@tglx
When base::next_expiry_recalc is not initialized to false during cpu
bringup in HOTPLUG_CPU and is accidently true and no timer is queued in the
meantime, the loop through the wheel to find __next_timer_interrupt() might
be done for nothing.
Therefore initialize base::next_expiry_recalc to false in
timers_prepare_cpu().
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220405191732.7438-2-anna-maria@linutronix.de
When the timer base is empty, base::next_expiry is set to base::clk +
NEXT_TIMER_MAX_DELTA and base::next_expiry_recalc is false. When no timer
is queued until jiffies reaches base::next_expiry value, the warning for
not finding any expired timer and base::next_expiry_recalc is false in
__run_timers() triggers.
To prevent triggering the warning in this valid scenario
base::timers_pending needs to be added to the warning condition.
Fixes: 31cd0e119d ("timers: Recalculate next timer interrupt only when necessary")
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220405191732.7438-3-anna-maria@linutronix.de
Patch series "mm/damon: Fix fake /proc/loadavg reports", v3.
This patchset fixes DAMON's fake load report issue. The first patch
makes yet another variant of usleep_range() for this fix, and the second
patch fixes the issue of DAMON by making it using the newly introduced
function.
This patch (of 2):
Some kernel threads such as DAMON could need to repeatedly sleep in
micro seconds level. Because usleep_range() sleeps in uninterruptible
state, however, such threads would make /proc/loadavg reports fake load.
To help such cases, this commit implements a variant of usleep_range()
called usleep_idle_range(). It is same to usleep_range() but sets the
state of the current task as TASK_IDLE while sleeping.
Link: https://lkml.kernel.org/r/20211126145015.15862-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211126145015.15862-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: John Stultz <john.stultz@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
syzbot reported KCSAN data races vs. timer_base::timer_running being set to
NULL without holding base::lock in expire_timers().
This looks innocent and most reads are clearly not problematic, but
Frederic identified an issue which is:
int data = 0;
void timer_func(struct timer_list *t)
{
data = 1;
}
CPU 0 CPU 1
------------------------------ --------------------------
base = lock_timer_base(timer, &flags); raw_spin_unlock(&base->lock);
if (base->running_timer != timer) call_timer_fn(timer, fn, baseclk);
ret = detach_if_pending(timer, base, true); base->running_timer = NULL;
raw_spin_unlock_irqrestore(&base->lock, flags); raw_spin_lock(&base->lock);
x = data;
If the timer has previously executed on CPU 1 and then CPU 0 can observe
base->running_timer == NULL and returns, assuming the timer has completed,
but it's not guaranteed on all architectures. The comment for
del_timer_sync() makes that guarantee. Moving the assignment under
base->lock prevents this.
For non-RT kernel it's performance wise completely irrelevant whether the
store happens before or after taking the lock. For an RT kernel moving the
store under the lock requires an extra unlock/lock pair in the case that
there is a waiter for the timer, but that's not the end of the world.
Reported-by: syzbot+aa7c2385d46c5eba0b89@syzkaller.appspotmail.com
Reported-by: syzbot+abea4558531bae1ba9fe@syzkaller.appspotmail.com
Fixes: 030dcdd197 ("timers: Prepare support for PREEMPT_RT")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/87lfea7gw8.fsf@nanos.tec.linutronix.de
Cc: stable@vger.kernel.org
31cd0e119d ("timers: Recalculate next timer interrupt only when
necessary") subtly altered get_next_timer_interrupt()'s behaviour. The
function no longer consistently returns KTIME_MAX with no timers
pending.
In order to decide if there are any timers pending we check whether the
next expiry will happen NEXT_TIMER_MAX_DELTA jiffies from now.
Unfortunately, the next expiry time and the timer base clock are no
longer updated in unison. The former changes upon certain timer
operations (enqueue, expire, detach), whereas the latter keeps track of
jiffies as they move forward. Ultimately breaking the logic above.
A simplified example:
- Upon entering get_next_timer_interrupt() with:
jiffies = 1
base->clk = 0;
base->next_expiry = NEXT_TIMER_MAX_DELTA;
'base->next_expiry == base->clk + NEXT_TIMER_MAX_DELTA', the function
returns KTIME_MAX.
- 'base->clk' is updated to the jiffies value.
- The next time we enter get_next_timer_interrupt(), taking into account
no timer operations happened:
base->clk = 1;
base->next_expiry = NEXT_TIMER_MAX_DELTA;
'base->next_expiry != base->clk + NEXT_TIMER_MAX_DELTA', the function
returns a valid expire time, which is incorrect.
This ultimately might unnecessarily rearm sched's timer on nohz_full
setups, and add latency to the system[1].
So, introduce 'base->timers_pending'[2], update it every time
'base->next_expiry' changes, and use it in get_next_timer_interrupt().
[1] See tick_nohz_stop_tick().
[2] A quick pahole check on x86_64 and arm64 shows it doesn't make
'struct timer_base' any bigger.
Fixes: 31cd0e119d ("timers: Recalculate next timer interrupt only when necessary")
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Pull RCU updates from Paul McKenney:
- Bitmap parsing support for "all" as an alias for all bits
- Documentation updates
- Miscellaneous fixes, including some that overlap into mm and lockdep
- kvfree_rcu() updates
- mem_dump_obj() updates, with acks from one of the slab-allocator
maintainers
- RCU NOCB CPU updates, including limited deoffloading
- SRCU updates
- Tasks-RCU updates
- Torture-test updates
* 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits)
tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline
rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states
rcu: Add missing __releases() annotation
rcu: Remove obsolete rcu_read_unlock() deadlock commentary
rcu: Improve comments describing RCU read-side critical sections
rcu: Create an unrcu_pointer() to remove __rcu from a pointer
srcu: Early test SRCU polling start
rcu: Fix various typos in comments
rcu/nocb: Unify timers
rcu/nocb: Prepare for fine-grained deferred wakeup
rcu/nocb: Only cancel nocb timer if not polling
rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup
rcu/nocb: Allow de-offloading rdp leader
rcu/nocb: Directly call __wake_nocb_gp() from bypass timer
rcu: Don't penalize priority boosting when there is nothing to boost
rcu: Point to documentation of ordering guarantees
rcu: Make rcu_gp_cleanup() be noinline for tracing
rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs
rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP
...
This commit addresses a few code-style nits in callback-offloading
toggling, including one that predates this toggling.
Cc: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Pull timers and timekeeping updates from Thomas Gleixner:
"Core:
- Robustness improvements for the NOHZ tick management
- Fixes and consolidation of the NTP/RTC synchronization code
- Small fixes and improvements in various places
- A set of function documentation udpates and fixes
Drivers:
- Cleanups and improvements in various clocksoure/event drivers
- Removal of the EZChip NPS clocksource driver as the platfrom
support was removed from ARC
- The usual set of new device tree binding and json conversions
- The RTC driver which have been acked by the RTC maintainer:
* fix a long standing bug in the MC146818 library code which can
cause reading garbage during the RTC internal update.
* changes related to the NTP/RTC consolidation work"
* tag 'timers-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
ntp: Fix prototype in the !CONFIG_GENERIC_CMOS_UPDATE case
tick/sched: Make jiffies update quick check more robust
ntp: Consolidate the RTC update implementation
ntp: Make the RTC sync offset less obscure
ntp, rtc: Move rtc_set_ntp_time() to ntp code
ntp: Make the RTC synchronization more reliable
rtc: core: Make the sync offset default more realistic
rtc: cmos: Make rtc_cmos sync offset correct
rtc: mc146818: Reduce spinlock section in mc146818_set_time()
rtc: mc146818: Prevent reading garbage
clocksource/drivers/sh_cmt: Fix potential deadlock when calling runtime PM
clocksource/drivers/arm_arch_timer: Correct fault programming of CNTKCTL_EL1.EVNTI
clocksource/drivers/arm_arch_timer: Use stable count reader in erratum sne
clocksource/drivers/dw_apb_timer_of: Add error handling if no clock available
clocksource/drivers/riscv: Make RISCV_TIMER depends on RISCV_SBI
clocksource/drivers/ingenic: Fix section mismatch
clocksource/drivers/cadence_ttc: Fix memory leak in ttc_setup_clockevent()
dt-bindings: timer: renesas: tmu: Convert to json-schema
dt-bindings: timer: renesas: tmu: Document r8a774e1 bindings
clocksource/drivers/orion: Add missing clk_disable_unprepare() on error path
...
No users outside of the timer code. Move the caller below this function to
avoid a pointless forward declaration.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
PREEMPT_RT does not spin and wait until a running timer completes its
callback but instead it blocks on a sleeping lock to prevent a livelock in
the case that the task waiting for the callback completion preempted the
callback.
This cannot be done for timers flagged with TIMER_IRQSAFE. These timers can
be canceled from an interrupt disabled context even on RT kernels.
The expiry callback of such timers is invoked with interrupts disabled so
there is no need to use the expiry lock mechanism because obviously the
callback cannot be preempted even on RT kernels.
Do not use the timer_base::expiry_lock mechanism when waiting for a running
callback to complete if the timer is flagged with TIMER_IRQSAFE.
Also add a lockdep assertion for RT kernels to validate that the expiry
lock mechanism is always invoked in preemptible context.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201103190937.hga67rqhvknki3tp@linutronix.de
With the removal of the interrupt perturbations in previous random32
change (random32: make prandom_u32() output unpredictable), the PRNG
has become 100% deterministic again. While SipHash is expected to be
way more robust against brute force than the previous Tausworthe LFSR,
there's still the risk that whoever has even one temporary access to
the PRNG's internal state is able to predict all subsequent draws till
the next reseed (roughly every minute). This may happen through a side
channel attack or any data leak.
This patch restores the spirit of commit f227e3ec3b ("random32: update
the net random state on interrupt and activity") in that it will perturb
the internal PRNG's statee using externally collected noise, except that
it will not pick that noise from the random pool's bits nor upon
interrupt, but will rather combine a few elements along the Tx path
that are collectively hard to predict, such as dev, skb and txq
pointers, packet length and jiffies values. These ones are combined
using a single round of SipHash into a single long variable that is
mixed with the net_rand_state upon each invocation.
The operation was inlined because it produces very small and efficient
code, typically 3 xor, 2 add and 2 rol. The performance was measured
to be the same (even very slightly better) than before the switch to
SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC
(i40e), the connection rate dropped from 556k/s to 555k/s while the
SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps.
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
Cc: George Spelvin <lkml@sdf.org>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output. An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.
It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack. Oops.
This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key. (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.) Speed is prioritized over security;
attacks are rare, while performance is always wanted.
Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.
Commit f227e3ec3b ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution. This patch replaces
it.
Reported-by: Amit Klein <aksecurity@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Fixes: f227e3ec3b ("random32: update the net random state on interrupt and activity")
Signed-off-by: George Spelvin <lkml@sdf.org>
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
to prandom.h for later use; merged George's prandom_seed() proposal;
inlined siprand_u32(); replaced the net_rand_state[] array with 4
members to fix a build issue; cosmetic cleanups to make checkpatch
happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Pull timekeeping updates from Thomas Gleixner:
"Updates for timekeeping, timers and related drivers:
Core:
- Early boot support for the NMI safe timekeeper by utilizing
local_clock() up to the point where timekeeping is initialized.
This allows printk() to store multiple timestamps in the ringbuffer
which is useful for coordinating dmesg information across a fleet
of machines.
- Provide a multi-timestamp accessor for printk()
- Make timer init more robust by checking for invalid timer flags.
- Comma vs semicolon fixes
Drivers:
- Support for new platforms in existing drivers (SP804 and Renesas
CMT)
- Comma vs semicolon fixes
* tag 'timers-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/armada-370-xp: Use semicolons rather than commas to separate statements
clocksource/drivers/mps2-timer: Use semicolons rather than commas to separate statements
timers: Mask invalid flags in do_init_timer()
clocksource/drivers/sp804: Enable Hisilicon sp804 timer 64bit mode
clocksource/drivers/sp804: Add support for Hisilicon sp804 timer
clocksource/drivers/sp804: Support non-standard register offset
clocksource/drivers/sp804: Prepare for support non-standard register offset
clocksource/drivers/sp804: Remove a mismatched comment
clocksource/drivers/sp804: Delete the leading "__" of some functions
clocksource/drivers/sp804: Remove unused sp804_timer_disable() and timer-sp804.h
clocksource/drivers/sp804: Cleanup clk_get_sys()
dt-bindings: timer: renesas,cmt: Document r8a774e1 CMT support
dt-bindings: timer: renesas,cmt: Document r8a7742 CMT support
alarmtimer: Convert comma to semicolon
timekeeping: Provide multi-timestamp accessor to NMI safe timekeeper
timekeeping: Utilize local_clock() for NMI safe timekeeper during early boot
do_init_timer() accepts any combination of timer flags handed in by the
caller without a sanity check, but only TIMER_DEFFERABLE, TIMER_PINNED and
TIMER_IRQSAFE are valid.
If the supplied flags have other bits set, this could result in
malfunction. If bits are set in TIMER_CPUMASK the first timer usage could
deference a cpu base which is outside the range of possible CPUs. If
TIMER_MIGRATION is set, then the switch_timer_base() will live lock.
Prevent that with a sanity check which warns when invalid flags are
supplied and masks them out.
[ tglx: Made it WARN_ON_ONCE() and added context to the changelog ]
Signed-off-by: Qianli Zhao <zhaoqianli@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/9d79a8aa4eb56713af7379f99f062dedabcde140.1597326756.git.zhaoqianli@xiaomi.com
Running posix CPU timers in hard interrupt context has a few downsides:
- For PREEMPT_RT it cannot work as the expiry code needs to take
sighand lock, which is a 'sleeping spinlock' in RT. The original RT
approach of offloading the posix CPU timer handling into a high
priority thread was clumsy and provided no real benefit in general.
- For fine grained accounting it's just wrong to run this in context of
the timer interrupt because that way a process specific CPU time is
accounted to the timer interrupt.
- Long running timer interrupts caused by a large amount of expiring
timers which can be created and armed by unpriviledged user space.
There is no hard requirement to expire them in interrupt context.
If the signal is targeted at the task itself then it won't be delivered
before the task returns to user space anyway. If the signal is targeted at
a supervisor process then it might be slightly delayed, but posix CPU
timers are inaccurate anyway due to the fact that they are tied to the
tick.
Provide infrastructure to schedule task work which allows splitting the
posix CPU timer code into a quick check in interrupt context and a thread
context expiry and signal delivery function. This has to be enabled by
architectures as it requires that the architecture specific KVM
implementation handles pending task work before exiting to guest mode.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200730102337.783470146@linutronix.de
Pull timer updates from Thomas Gleixner:
"Time, timers and related driver updates:
- Prevent unnecessary timer softirq invocations by extending the
tracking of the next expiring timer in the timer wheel beyond the
existing NOHZ functionality.
The tracking overhead at enqueue time is within the noise, but on
sensitive workloads the avoidance of the soft interrupt invocation
is a measurable improvement.
- The obligatory new clocksource driver for Ingenic X100 OST
- The usual fixes, improvements, cleanups and extensions for newer
chip variants all over the driver space"
* tag 'timers-core-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
timers: Recalculate next timer interrupt only when necessary
clocksource/drivers/ingenic: Add support for the Ingenic X1000 OST.
dt-bindings: timer: Add Ingenic X1000 OST bindings.
clocksource/drivers: Replace HTTP links with HTTPS ones
clocksource/drivers/nomadik-mtu: Handle 32kHz clock
clocksource/drivers/sh_cmt: Use "kHz" for kilohertz
clocksource/drivers/imx: Add support for i.MX TPM driver with ARM64
clocksource/drivers/ingenic: Add high resolution timer support for SMP/SMT.
timers: Lower base clock forwarding threshold
timers: Remove must_forward_clk
timers: Spare timer softirq until next expiry
timers: Expand clk forward logic beyond nohz
timers: Reuse next expiry cache after nohz exit
timers: Always keep track of next expiry
timers: Optimize _next_timer_interrupt() level iteration
timers: Add comments about calc_index() ceiling work
timers: Move trigger_dyntick_cpu() to enqueue_timer()
timers: Use only bucket expiry for base->next_expiry value
timers: Preserve higher bits of expiration on index calculation
clocksource/drivers/timer-atmel-tcb: Add sama5d2 support
...
This modifies the first 32 bits out of the 128 bits of a random CPU's
net_rand_state on interrupt or CPU activity to complicate remote
observations that could lead to guessing the network RNG's internal
state.
Note that depending on some network devices' interrupt rate moderation
or binding, this re-seeding might happen on every packet or even almost
never.
In addition, with NOHZ some CPUs might not even get timer interrupts,
leaving their local state rarely updated, while they are running
networked processes making use of the random state. For this reason, we
also perform this update in update_process_times() in order to at least
update the state when there is user or system activity, since it's the
only case we care about.
Reported-by: Amit Klein <aksecurity@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The nohz tick code recalculates the timer wheel's next expiry on each idle
loop iteration.
On the other hand, the base next expiry is now always cached and updated
upon timer enqueue and execution. Only timer dequeue may leave
base->next_expiry out of date (but then its stale value won't ever go past
the actual next expiry to be recalculated).
Since recalculating the next_expiry isn't a free operation, especially when
the last wheel level is reached to find out that no timer has been enqueued
at all, reuse the next expiry cache when it is known to be reliable, which
it is most of the time.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200723151641.12236-1-frederic@kernel.org
Now that the core timer infrastructure doesn't depend anymore on
periodic base->clk increments, even when the CPU is not in NO_HZ mode,
timer softirqs can be skipped until there are timers to expire.
Some spurious softirqs can still remain since base->next_expiry doesn't
keep track of canceled timers but this still reduces the number of softirqs
significantly: ~15 times less for HZ=1000 and ~5 times less for HZ=100.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-11-frederic@kernel.org
As for next_expiry, the base->clk catch up logic will be expanded beyond
NOHZ in order to avoid triggering useless softirqs.
If softirqs should only fire to expire pending timers, periodic base->clk
increments must be skippable for random amounts of time. Therefore prepare
to catch-up with missing updates whenever an up-to-date base clock is
needed.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-10-frederic@kernel.org
If a level has a timer that expires before reaching the next level, there
is no need to iterate further.
The next level is reached when the 3 lower bits of the current level are
cleared. If the next event happens before/during that, the next levels
won't provide an earlier expiration.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-7-frederic@kernel.org