Taking an accurate timestamp in a close proximity of the interrupt is
required for user side statistics management.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
The driver allows only a single process to open a device's FD at any
single time. This is done by checking "hdev->compute_ctx" under mutex.
Therefore, to prevent a race between the moment a user closes it's FD
and when another user tries to open the device, we need to make sure
that clearing this variable is the very last thing that is done in the
code of the FD's release.
I'm moving the idle check before clearing this variable and the
"reset on device release". btw, if the reset happens it will prevent
any other user from opening the device until the reset is finished.
An important thing to note is that we need to remove the user process
that is closing the device from the process list BEFORE calling the
reset function. That is to prevent a case where the reset code will
try to kill that user process and it is unnecessary as the process
doesn't hold any device/driver resources anymore.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reset to the device is not necessarily due to an error, so print it
as info instead of error.
In addition, print the type of reset we are doing:
- reset of the entire device (aka hard reset)
- reset of the device after user have released it (less than hard reset)
- lighter reset of an inference device (aka soft reset)
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Soft-reset is the procedure where we reset only the compute/DMA engines
of the device, without requiring the current user-space process to
release the device.
This type of reset can happen if TDR event occurred (a workload got
stuck) or by a root request through sysfs.
This is only relevant for inference ASICs, as there is no real-world
use-case to do that in training, because training runs on multiple
devices.
In addition, we also do (in certain ASICs) a reset upon device release.
That reset uses the same code as the soft-reset.
Therefore, to better differentiate between the two resets, it is better
to rename the soft-reset support as "inference soft-reset", to make
the code more self-explanatory.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
The translation in debugfs of device memory MMU virtual addresses was
wrong as it did not take into consideration the fact that the page
sizes there can be not power of 2.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In order to avoid user target value wraparound, we modify the
current interface so user will be able to wait for an 8-byte
target value rather than a 4-byte value.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
During TDR handling, we check multiple times if CS is valid.
No need to perform this check as CS must be valid at all time
during the TDR handling.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Add support to retrieve following power info via HWMON:
- instantaneous power value
- highest value since last reset
- reset the highest place holder
Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Instead of having dedicated function per message that we want to send
to the firmware in COMMS protocol, have a generic function that we can
call to from other parts of the driver
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Instead of using the Linux kernel HWMON enums definition when
communicating with the firmware, use proprietary HWMON based enums
i.e. map hwmon.h header enum to cpucp_if.h based enum while.
This is needed because the HWMON enums are not forcing backward
compatibility and therefore changes can break compatibility between
newer driver and older firmware.
The driver will check for CPU_BOOT_DEV_STS0_MAP_HWMON_EN bit to
validate if f/w supports cpucp->hwmon enum mapping to support older
firmware where this mapping won't be available.
Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Command submission timeout is currently determined during driver
loading time. As some environments requires this timeout to be
modified in runtime, we introduce a new debugfs node that controls
the timeout value without the need to reload the driver.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Pull libata fixes from Damien Le Moal:
"Two fixes for this cycle:
- Fix a null pointer dereference in ahci-platform driver (from Hai)
- Fix uninitialized variables in pata_legacy driver (from Dan)"
* tag 'libata-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: ahci_platform: fix null-ptr-deref in ahci_platform_enable_regulators()
pata_legacy: fix a couple uninitialized variable bugs
Pull block fixes from Jens Axboe:
"Bigger than usual for this point in time, the majority is fixing some
issues around BDI lifetimes with the move from the request_queue to
the disk in this release. In detail:
- Series on draining fs IO for del_gendisk() (Christoph)
- NVMe pull request via Christoph:
- fix the abort command id (Keith Busch)
- nvme: fix per-namespace chardev deletion (Adam Manzanares)
- brd locking scope fix (Tetsuo)
- BFQ fix (Paolo)"
* tag 'block-5.15-2021-10-17' of git://git.kernel.dk/linux-block:
block, bfq: reset last_bfqq_created on group change
block: warn when putting the final reference on a registered disk
brd: reduce the brd_devices_mutex scope
kyber: avoid q->disk dereferences in trace points
block: keep q_usage_counter in atomic mode after del_gendisk
block: drain file system I/O on del_gendisk
block: split bio_queue_enter from blk_queue_enter
block: factor out a blk_try_enter_queue helper
block: call submit_bio_checks under q_usage_counter
nvme: fix per-namespace chardev deletion
block/rnbd-clt-sysfs: fix a couple uninitialized variable bugs
nvme-pci: Fix abort command id
Pull io_uring fix from Jens Axboe:
"Just a single fix for a wrong condition for grabbing a lock, a
regression in this merge window"
* tag 'io_uring-5.15-2021-10-17' of git://git.kernel.dk/linux-block:
io_uring: fix wrong condition to grab uring lock
Pull virtio fixes from Michael Tsirkin:
"Fixes up some issues in rc5"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost-vdpa: Fix the wrong input in config_cb
VDUSE: fix documentation underline warning
Revert "virtio-blk: Add validation for block size in config space"
vhost_vdpa: unset vq irq before freeing irq
virtio: write back F_VERSION_1 before validate
Pull powerpc fixes from Michael Ellerman:
- Fix a bug where guests on P9 with interrupts passed through could get
stuck in synchronize_irq().
- Fix a bug in KVM on P8 where secondary threads entering a guest would
write outside their allocated stack.
- Fix a bug in KVM on P8 where secondary threads could confuse the host
offline code and cause the guest or host to crash.
Thanks to Cédric Le Goater.
* tag 'powerpc-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest
KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()
powerpc/xive: Discard disabled interrupts in get_irqchip_state()
Pull objtool fixes from Borislav Petkov:
- Update section headers before the respective relocations to not
trigger a safety check in elftoolchain's implementation of libelf
- Do not add garbage data to the .rela.orc_unwind_ip section
* tag 'objtool_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Update section header before relocations
objtool: Check for gelf_update_rel[a] failures
Pull EDAC fix from Borislav Petkov:
- Log the "correct" uncorrectable error count in the armada_xp driver
* tag 'edac_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/armada-xp: Fix output of uncorrectable error counter
Pull perf fix from Borislav Petkov:
- Add Sapphire Rapids to the list of CPUs supporting the SMI count MSR
* tag 'perf_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/msr: Add Sapphire Rapids CPU support
Pull EFI fixes from Borislav Petkov:
"Forwarded from Ard Biesheuvel through the tip tree. Ard will send
stuff directly in the near future.
Low priority fixes but fixes nonetheless:
- update stub diagnostic print that is no longer accurate
- avoid statically allocated buffer for CPER error record decoding
- avoid sleeping on the efi_runtime semaphore when calling the
ResetSystem EFI runtime service"
* tag 'efi-urgent-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Change down_interruptible() in virt_efi_reset_system() to down_trylock()
efi/cper: use stack buffer for error record decoding
efi/libstub: Simplify "Exiting bootservices" message
Pull x86 fixes from Borislav Petkov:
- Do not enable AMD memory encryption in Kconfig by default due to
shortcomings of some platforms, leading to boot failures.
- Mask out invalid bits in the MXCSR for 32-bit kernels again because
Thomas and I don't know how to mask out bits properly. Third time's
the charm.
* tag 'x86_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Mask out the invalid MXCSR bits properly
x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automatically
Pull driver core fixes from Greg KH:
"Here are some small driver core fixes for 5.15-rc6, all of which have
been in linux-next for a while with no reported issues.
They include:
- kernfs negative dentry bugfix
- simple pm bus fixes to resolve reported issues"
* tag 'driver-core-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
drivers: bus: Delete CONFIG_SIMPLE_PM_BUS
drivers: bus: simple-pm-bus: Add support for probing simple bus only devices
driver core: Reject pointless SYNC_STATE_ONLY device links
kernfs: don't create a negative dentry if inactive node exists
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 5.15-rc6 for reported
issues that include:
- habanalabs driver fixes
- mei driver fixes and new ids
- fpga new device ids
- MAINTAINER file updates for fpga subsystem
- spi module id table additions and fixes
- fastrpc locking fixes
- nvmem driver fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
eeprom: 93xx46: fix MODULE_DEVICE_TABLE
nvmem: Fix shift-out-of-bound (UBSAN) with byte size cells
mei: hbm: drop hbm responses on early shutdown
mei: me: add Ice Lake-N device id.
eeprom: 93xx46: Add SPI device ID table
eeprom: at25: Add SPI ID table
misc: HI6421V600_IRQ should depend on HAS_IOMEM
misc: fastrpc: Add missing lock before accessing find_vma()
cb710: avoid NULL pointer subtraction
misc: gehc: Add SPI ID table
MAINTAINERS: Drop outdated FPGA Manager website
MAINTAINERS: Add Hao and Yilun as maintainers
habanalabs: fix resetting args in wait for CS IOCTL
fpga: ice40-spi: Add SPI device ID table
Pull staging and IIO driver fixes from Greg KH:
"Here are a number of small IIO and staging driver fixes for 5.15-rc6.
They include:
- vc04_services bugfix for reported problem
- r8188eu array underflow fix
- iio driver fixes for a lot of tiny reported issues.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: r8188eu: prevent array underflow in rtw_hal_update_ra_mask()
staging: vc04_services: shut up out-of-range warning
iio: light: opt3001: Fixed timeout error when 0 lux
iio: adis16480: fix devices that do not support sleep mode
iio: mtk-auxadc: fix case IIO_CHAN_INFO_PROCESSED
iio: adis16475: fix deadlock on frequency set
iio: ssp_sensors: add more range checking in ssp_parse_dataframe()
iio: ssp_sensors: fix error code in ssp_print_mcu_debug()
iio: adc: ad7793: Fix IRQ flag
iio: adc: ad7780: Fix IRQ flag
iio: adc: ad7192: Add IRQ flag
iio: adc: aspeed: set driver data when adc probe.
iio: adc: rzg2l_adc: add missing clk_disable_unprepare() in rzg2l_adc_pm_runtime_resume()
iio: adc: max1027: Fix the number of max1X31 channels
iio: adc: max1027: Fix wrong shift with 12-bit devices
iio: adc128s052: Fix the error handling path of 'adc128_probe()'
iio: adc: rzg2l_adc: Fix -EBUSY timeout error return
iio: accel: fxls8962af: return IRQ_HANDLED when fifo is flushed
iio: dac: ti-dac5571: fix an error code in probe()
Pull serial driver fix from Greg KH:
"Here is a single 8250 Kconfig fix for 5.15-rc6 that resolves a
regression that showed up in 5.15-rc1. It has been in linux-next for a
while with no reported issues"
* tag 'tty-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250: allow disabling of Freescale 16550 compile test
Pull USB fixes from Greg KH:
"Here are some small USB fixes that resolve a number of tiny issues.
They include:
- new USB serial driver ids
- xhci driver fixes for a bunch of issues
- musb error path fixes.
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: musb: dsps: Fix the probe error path
xhci: Enable trust tx length quirk for Fresco FL11 USB controller
xhci: Fix command ring pointer corruption while aborting a command
USB: xhci: dbc: fix tty registration race
xhci: add quirk for host controllers that don't update endpoint DCS
xhci: guard accesses to ep_state in xhci_endpoint_reset()
USB: serial: qcserial: add EM9191 QDL support
USB: serial: option: add Quectel EC200S-CN module support
USB: serial: option: add prod. id for Quectel EG91
USB: serial: option: add Telit LE910Cx composition 0x1204
Pull input fixes from Dmitry Torokhov:
- a new product ID for the xpad joystick driver
- fixes to resistive-adc-touch and snvs_pwrkey drivers
- a change to touchscreen helpers to make clang happier
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: touchscreen - avoid bitwise vs logical OR warning
Input: xpad - add support for another USB ID of Nacon GC-100
Input: resistive-adc-touch - fix division by zero error on z1 == 0
Input: snvs_pwrkey - add clk handling
I got the double free report:
BUG: KASAN: double-free or invalid-free in kfree+0xce/0x390
iio_device_unregister_sysfs+0x108/0x13b [industrialio]
iio_dev_release+0x9e/0x10e [industrialio]
device_release+0xa5/0x240
If __iio_device_register() fails, iio_dev_opaque->groups will be freed
in error path in iio_device_unregister_sysfs(), then iio_dev_release()
will call iio_device_unregister_sysfs() again, it causes double free.
Set iio_dev_opaque->groups to NULL when it's freed to fix this double free.
Not this is a local work around for a more general mess around life time
management that will get cleaned up and should make this handling
unnecesarry.
Fixes: 32f171724e ("iio: core: rework iio device group creation")
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Alexandru Ardelean <ardeleanalex@gmail.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20211013030532.956133-1-yangyingliang@huawei.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
When __iio_buffer_alloc_sysfs_and_mask() failed, 'unwind_idx' should be
set to 'i - 1' to prevent double-free when cleanup resources.
BUG: KASAN: double-free or invalid-free in __iio_buffer_free_sysfs_and_mask+0x32/0xb0 [industrialio]
Call Trace:
kfree+0x117/0x4c0
__iio_buffer_free_sysfs_and_mask+0x32/0xb0 [industrialio]
iio_buffers_alloc_sysfs_and_mask+0x60d/0x1570 [industrialio]
__iio_device_register+0x483/0x1a30 [industrialio]
ina2xx_probe+0x625/0x980 [ina2xx_adc]
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: ee708e6baa ("iio: buffer: introduce support for attaching more IIO buffers")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Alexandru Ardelean <ardeleanalex@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20211013094923.2473-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Check return value of kstrdup_const() in iio_buffer_wrap_attr(),
or it will cause null-ptr-deref in kernfs_name_hash() when calling
device_add() as follows:
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:strlen+0x0/0x20
Call Trace:
kernfs_name_hash+0x22/0x110
kernfs_find_ns+0x11d/0x390
kernfs_remove_by_name_ns+0x3b/0xb0
remove_files.isra.1+0x7b/0x190
internal_create_group+0x7f1/0xbb0
internal_create_groups+0xa3/0x150
device_add+0x8f0/0x2020
cdev_device_add+0xc3/0x160
__iio_device_register+0x1427/0x1b40 [industrialio]
__devm_iio_device_register+0x22/0x80 [industrialio]
adjd_s311_probe+0x195/0x200 [adjd_s311]
i2c_device_probe+0xa07/0xbb0
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 15097c7a1a ("iio: buffer: wrap all buffer attributes into iio_dev_attr")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20211013040438.1689277-1-yangyingliang@huawei.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Since commit 430a67f9d6 ("block, bfq: merge bursts of newly-created
queues"), BFQ maintains a per-group pointer to the last bfq_queue
created. If such a queue, say bfqq, happens to move to a different
group, then bfqq is no more a valid last bfq_queue created for its
previous group. That pointer must then be cleared. Not resetting such
a pointer may also cause UAF, if bfqq happens to also be freed after
being moved to a different group. This commit performs this missing
reset. As such it fixes commit 430a67f9d6 ("block, bfq: merge bursts
of newly-created queues").
Such a missing reset is most likely the cause of the crash reported in [1].
With some analysis, we found that this crash was due to the
above UAF. And such UAF did go away with this commit applied [1].
Anyway, before this commit, that crash happened to be triggered in
conjunction with commit 2d52c58b9c ("block, bfq: honor already-setup
queue merges"). The latter was then reverted by commit ebc69e897e
("Revert "block, bfq: honor already-setup queue merges""). Yet commit
2d52c58b9c ("block, bfq: honor already-setup queue merges") contains
no error related with the above UAF, and can then be restored.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=214503
Fixes: 430a67f9d6 ("block, bfq: merge bursts of newly-created queues")
Tested-by: Grzegorz Kowal <custos.mentis@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20211015144336.45894-2-paolo.valente@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Warn when the last reference on a live disk is put without calling
del_gendisk first. There are some BDI related bug reports that look
like a case of this, so make sure we have the proper instrumentation
to catch it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211014130231.1468538-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
On success i2c_master_send() returns the number of bytes written. The
call from iio_write_channel_info(), however, expects the return value to
be zero on success.
This bug causes incorrect consumption of the sysfs buffer in
iio_write_channel_info(). When writing more than two characters to
out_voltage0_raw, the ad5446 write handler is called multiple times
causing unexpected behavior.
Fixes: 3ec36a2cf0 ("iio:ad5446: Add support for I2C based DACs")
Signed-off-by: Pekka Korpinen <pekka.korpinen@iki.fi>
Link: https://lore.kernel.org/r/20210929185755.2384-1-pekka.korpinen@iki.fi
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
This change converts the probe of the AD5064 driver to use only
device-managed functions.
The regulator_bulk_disable() is passed on a devm_add_action_or_reset() hook
and the devm_iio_device_register() can be used to register the IIO device.
The driver has both I2C and SPI hooks inside, so all these can be removed.
Signed-off-by: Alexandru Ardelean <aardelean@deviqon.com>
Link: https://lore.kernel.org/r/20210913115237.301310-1-aardelean@deviqon.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
This change does a conversion of the driver to use device-managed init
functions. The 2 regulators and the clock inits are converted to use
devm_add_action_or_reset() callbacks for de-initializing them when the
driver unloads.
And finally the devm_iio_device_register() function can be use to register
the device.
The remove hook can finally be removed and the spi_set_drvdata() call can
also be removed as the private data is no longer used.
Signed-off-by: Alexandru Ardelean <aardelean@deviqon.com>
Link: https://lore.kernel.org/r/20210913115209.300665-1-aardelean@deviqon.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>