Commit Graph

1185157 Commits

Author SHA1 Message Date
Dave Jiang
244009b07e dmaengine: idxd: expose fault counters to sysfs
Expose cr_faults and cr_fault_failures counters to the user space. This
allows a user app to keep track of how many fault the application is
causing with the completion record (CR) and also the number of failures
of the CR writeback. Having a high number of cr_fault_failures is bad as
the app is submitting descriptors with the CR addresses that are bad. User
monitoring daemon may want to consider killing the application as it may be
malicious and attempting to flood the device event log.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20230407203143.2189681-15-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:46 +05:30
Dave Jiang
e6fd6d7e5f dmaengine: idxd: add a device to represent the file opened
Embed a struct device for the user file context in order to export sysfs
attributes related with the opened file. Tie the lifetime of the file
context to the device. The sysfs entry will be added under the char device.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20230407203143.2189681-14-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:45 +05:30
Dave Jiang
fecae134ee dmaengine: idxd: add per file user counters for completion record faults
Add counters per opened file for the char device in order to keep track how
many completion record faults occurred and how many of those faults failed
the writeback by the driver after attempt to fault in the page. The
counters are managed by xarray that associates the PASID with
struct idxd_user_context.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20230407203143.2189681-13-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:45 +05:30
Dave Jiang
2442b7473a dmaengine: idxd: process batch descriptor completion record faults
Add event log processing for faulting of user batch descriptor completion
record.

When encountering an event log entry for a page fault on a completion
record, the driver is expected to do the following:
1. If the "first error in batch" bit in event log entry error info is
set, discard any previously recorded errors associated with the
"batch identifier".
2. Fix the page fault according to the fault address in the event log. If
successful, write the completion record to the fault address in user space.
3. If an error is encountered while writing the completion record and it is
associated to a descriptor in the batch, the driver associates the error
with the batch identifier of the event log entry and tracks it until the
event log entry for the corresponding batch desc is encountered.

While processing an event log entry for a batch descriptor with error
indicating that one or more descs in the batch had event log entries,
the driver will do the following before writing the batch completion
record:
1. If the status field of the completion record is 0x1, the driver will
change it to error code 0x5 (one or more operations in batch completed
with status not successful) and changes the result field to 1.
2. If the status is error code 0x6 (page fault on batch descriptor list
address), change the result field to 1.
3. If status is any other value, the completion record is not changed.
4. Clear the recorded error in preparation for next batch with same batch
identifier.

The result field is for user software to determine whether to set the
"Batch Error" flag bit in the descriptor for continuation of partial
batch descriptor completion. See DSA spec 2.0 for additional information.

If no error has been recorded for the batch, the batch completion record is
written to user space as is.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20230407203143.2189681-12-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:45 +05:30
Dave Jiang
6926987185 dmaengine: idxd: add descs_completed field for completion record
The descs_completed field for a completion record is part of a batch
descriptor completion record. It takes the same location as bytes_completed
in a normal descriptor field. Add to expose to user.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20230407203143.2189681-11-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:45 +05:30
Dave Jiang
c40bd7d973 dmaengine: idxd: process user page faults for completion record
DSA supports page fault handling through PRS. However, the DMA engine
that's processing the descriptor is blocked until the PRS response is
received. Other workqueues sharing the engine are also blocked.
Page fault handing by the driver with PRS disabled can be used to
mitigate the stalling.

With PRS disabled while ATS remain enabled, DSA handles page faults on
a completion record by reporting an event in the event log. In this
instance, the descriptor is completed and the event log contains the
completion record address and the contents of the completion record. Add
support to the event log handling code to fault in the completion record
and copy the content of the completion record to user memory.

A bitmap is introduced to keep track of discarded event log entries. When
the user process initiates ->release() of the char device, it no longer is
interested in any remaining event log entries tied to the relevant wq and
PASID. The driver will mark the event log entry index in the bitmap. Upon
encountering the entries during processing, the event log handler will just
clear the bitmap bit and skip the entry rather than attempt to process the
event log entry.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20230407203143.2189681-10-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:45 +05:30
Fenghua Yu
b022f59725 dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling
Define idxd_copy_cr() to copy completion record to fault address in
user address that is found by work queue (wq) and PASID.

It will be used to write the user's completion record that the hardware
device is not able to write due to user completion record page fault.

An xarray is added to associate the PASID and mm with the
struct idxd_user_context so mm can be found by PASID and wq.

It is called when handling the completion record fault in a kernel thread
context. Switch to the mm using kthread_use_vm() and copy the
completion record to the mm via copy_to_user(). Once the copy is
completed, switch back to the current mm using kthread_unuse_mm().

Suggested-by: Christoph Hellwig <hch@infradead.org>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230407203143.2189681-9-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:45 +05:30
Dave Jiang
c2f156bf16 dmaengine: idxd: create kmem cache for event log fault items
Add a kmem cache per device for allocating event log fault context. The
context allows an event log entry to be copied and passed to a software
workqueue to be processed. Due to each device can have different sized
event log entry depending on device type, it's not possible to have a
global kmem cache.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20230407203143.2189681-8-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:45 +05:30
Dave Jiang
2f30decd2f dmaengine: idxd: add per DSA wq workqueue for processing cr faults
Add a workqueue for user submitted completion record fault processing.
The workqueue creation and destruction lifetime will be tied to the user
sub-driver since it will only be used when the wq is a user type.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20230407203143.2189681-7-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:45 +05:30
Dave Jiang
5fbe6503b5 dmanegine: idxd: add debugfs for event log dump
Add debugfs entry to dump the content of the event log for debugging. The
function will dump all non-zero entries in the event log. It will note
which entries are processed and which entries are still pending processing
at the time of the dump. The entries may not always be in chronological
order due to the log is a circular buffer.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20230407203143.2189681-6-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:45 +05:30
Dave Jiang
2f431ba908 dmaengine: idxd: add interrupt handling for event log
An event log interrupt is raised in the misc interrupt INTCAUSE register
when an event is written by the hardware. Add basic event log processing
support to the interrupt handler. The event log is a ring where the
hardware owns the tail and the software owns the head. The hardware will
advance the tail index when an additional event has been pushed to memory.
The software will process the log entry and then advances the head. The
log is full when (tail + 1) % log_size = head. The hardware will stop
writing when the log is full. The user is expected to create a log size
large enough to handle all the expected events.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20230407203143.2189681-5-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:45 +05:30
Dave Jiang
244da66cda dmaengine: idxd: setup event log configuration
Add setup of event log feature for supported device. Event log addresses
error reporting that was lacking in gen 1 DSA devices where a second error
event does not get reported when a first event is pending software
handling. The event log allows a circular buffer that the device can push
error events to. It is up to the user to create a large enough event log
ring in order to capture the expected events. The evl size can be set in
the device sysfs attribute. By default 64 entries are supported as minimal
when event log is enabled.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20230407203143.2189681-4-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:45 +05:30
Dave Jiang
1649091f91 dmaengine: idxd: add event log size sysfs attribute
Add support for changing of the event log size. Event log is a
feature added to DSA 2.0 hardware to improve error reporting.
It supersedes the SWERROR register on DSA 1.0 hardware and hope
to prevent loss of reported errors.

The error log size determines how many error entries supported for
the device. It can be configured by the user via sysfs attribute.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20230407203143.2189681-3-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:44 +05:30
Dave Jiang
0c40bfb4c2 dmaengine: idxd: make misc interrupt one shot
Current code continuously processes the interrupt as long as the hardware
is setting the status bit. There's no reason to do that since the threaded
handler will get called again if another interrupt is asserted.

Also through testing, it has shown that if a misprogrammed (or malicious)
agent can continuously submit descriptors with bad completion record and
causes errors to be reported via the misc interrupt. Continuous processing
by the thread can cause software hang watchdog to kick off since the thread
isn't giving up the CPU.

Reported-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20230407203143.2189681-2-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:44 +05:30
Walker Chen
c9566127f0 dt-bindings: dma: snps,dw-axi-dmac: constrain the items of resets for JH7110 dma
The DMA controller needs two reset items to work properly on JH7110 SoC,
so there is need to constrain the items' value to 2, other platforms
have 1 reset item at most.

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Walker Chen <walker.chen@starfivetech.com>
Link: https://lore.kernel.org/r/20230322094820.24738-2-walker.chen@starfivetech.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:44 +05:30
Rob Herring
894abe0dcf dt-bindings: dma: Drop unneeded quotes
Cleanup bindings dropping unneeded quotes. Once all these are fixed,
checking for this can be enabled in yamllint.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230331182141.1900348-1-robh@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:44 +05:30
Claudiu Beznea
09ebe227c2 dmaengine: at_xdmac: align declaration of ret with the rest of variables
Align the declaration of ret in atmel_xdmac_resume() with the rest of
variables. Do this by adding ret to the line with declaration for i
variable.

Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20230214151827.1050280-8-claudiu.beznea@microchip.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:44 +05:30
Claudiu Beznea
5056eae6c3 dmaengine: at_xdmac: add a warning message regarding for unpaused channels
Add a warning message on suspend to let the user that there are channels
not paused by their consumers.

Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20230214151827.1050280-7-claudiu.beznea@microchip.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:44 +05:30
Claudiu Beznea
f8435befd8 dmaengine: at_xdmac: do not enable all cyclic channels
Do not global enable all the cyclic channels in at_xdmac_resume(). Instead
save the global status in at_xdmac_suspend() and re-enable the cyclic
channel only if it was active before suspend.

Fixes: e1f7c9eee7 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver")
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20230214151827.1050280-6-claudiu.beznea@microchip.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:44 +05:30
Claudiu Beznea
7c5eb63d16 dmaengine: at_xdmac: restore the content of grws register
In case the system suspends to a deep sleep state where power to DMA
controller is cut-off we need to restore the content of GRWS register.
This is a write only register and writing bit X tells the controller
to suspend read and write requests for channel X. Thus set GRWS before
restoring the content of GE (Global Enable) regiter.

Fixes: e1f7c9eee7 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver")
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20230214151827.1050280-5-claudiu.beznea@microchip.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:44 +05:30
Claudiu Beznea
44fe8440bd dmaengine: at_xdmac: do not resume channels paused by consumers
In case there are DMA channels not paused by consumers in suspend
process (valid on AT91 SoCs for serial driver when no_console_suspend) the
driver pauses them (using at_xdmac_device_pause() which is also the same
function called by dmaengine_pause()) and then in the resume process the
driver resumes them calling at_xdmac_device_resume() which is the same
function called by dmaengine_resume()). This is good for DMA channels
not paused by consumers but for drivers that calls
dmaengine_pause()/dmaegine_resume() on suspend/resume path this may lead to
DMA channel being enabled before the IP is enabled. For IPs that needs
strict ordering with regards to DMA channel enablement this will lead to
wrong behavior. To fix this add a new set of functions
at_xdmac_device_pause_internal()/at_xdmac_device_resume_internal() to be
called only on suspend/resume.

Fixes: e1f7c9eee7 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver")
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20230214151827.1050280-4-claudiu.beznea@microchip.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:44 +05:30
Claudiu Beznea
e53957e1ec dmaengine: at_xdmac: fix imbalanced runtime PM reference counter
In case there are channels not paused during suspend (which on AT91 case
is valid for serial driver when no_console_suspend boot argument is used)
the at_xdmac_runtime_suspend_descriptors() was called more than
one time due to at_xdmac_off(). To fix this add a new argument to
at_xdmac_off() to specify if runtime PM reference counter needs to be
decremented for queued active descriptors. Along with it moved the
at_xdmac_runtime_suspend_descriptors() call under at_xdmac_chan_is_paused()
check on suspend path as for the rest of channels the suspend is delayed
by atmel_xdmac_prepare() in case channel is enabled. Same approach has
been applied on resume path.

Fixes: 650b0e990c ("dmaengine: at_xdmac: add runtime pm support")
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20230214151827.1050280-3-claudiu.beznea@microchip.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:44 +05:30
Claudiu Beznea
2de5ddb5e6 dmaengine: at_xdmac: disable/enable clock directly on suspend/resume
Runtime PM APIs for at_xdmac just plays with clk_enable()/clk_disable()
letting aside the clk_prepare()/clk_unprepare() that needs to be
executed as the clock is also prepared on probe. Thus instead of using
runtime PM force suspend/resume APIs use
clk_disable_unprepare() + pm_runtime_put_noidle() on suspend and
clk_prepare_enable() + pm_runtime_get_noresume() on resume. This
approach as been chosen instead of using runtime PM force suspend/resume
with clk_unprepare()/clk_prepare() as it looks simpler and the final
code is better.

While at it added the missing pm_runtime_mark_last_busy() on suspend before
decrementing the reference counter.

Fixes: 650b0e990c ("dmaengine: at_xdmac: add runtime pm support")
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20230214151827.1050280-2-claudiu.beznea@microchip.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:43 +05:30
Walker Chen
ce62432cb8 dmaengine: dw-axi-dmac: Increase polling time to DMA transmission completion status
The bit DMAC_CHEN[0] is automatically cleared by hardware to disable the
channel after the last AMBA transfer of the DMA transfer to the
destination has completed. Software can therefore poll this bit to
determine when this channel is free for a new DMA transfer.
This time requires at least 40 milliseconds on JH7110 SoC, otherwise an
error message 'failed to stop' will be reported.

Signed-off-by: Walker Chen <walker.chen@starfivetech.com>
Link: https://lore.kernel.org/r/20230322094820.24738-4-walker.chen@starfivetech.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:43 +05:30
Walker Chen
790f3c8b8f dmaengine: dw-axi-dmac: Add support for StarFive JH7110 DMA
Add DMA reset operation in device probe and use different configuration
on CH_CFG registers according to match data. Update all uses of
of_device_is_compatible with of_device_get_match_data.

Signed-off-by: Walker Chen <walker.chen@starfivetech.com>
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Link: https://lore.kernel.org/r/20230322094820.24738-3-walker.chen@starfivetech.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 23:18:43 +05:30
Ian Rogers
1f94479edb libperf: Make perf_cpu_map__alloc() available as an internal function for tools/perf to use
We had the open coded equivalent in perf_cpu_map__empty_new(), so reuse
what is in libperf.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Link: https://lore.kernel.org/lkml/20230407230405.2931830-3-irogers@google.com
[ Split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-12 14:44:24 -03:00
Ian Rogers
7bb1d048bd perf cpumap: Use perf_cpu_map__nr(cpus) to access cpus->nr
So that we can have a single point where to refcount check 'struct perf_cpu_map'
instances for use after free, etc.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Link: https://lore.kernel.org/lkml/20230407230405.2931830-3-irogers@google.com
[ Split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-12 14:44:11 -03:00
Ondrej Mosnacek
bcab1adeaa selinux: fix Makefile dependencies of flask.h
Make the flask.h target depend on the genheaders binary instead of
classmap.h to ensure that it is rebuilt if any of the dependencies of
genheaders are changed.

Notably this fixes flask.h not being rebuilt when
initial_sid_to_string.h is modified.

Fixes: 8753f6bec3 ("selinux: generate flask headers during kernel build")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-04-12 13:34:20 -04:00
Tetsuo Handa
57dcd64c7e cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex
syzbot is reporting circular locking dependency between cpu_hotplug_lock
and freezer_mutex, for commit f5d39b0208 ("freezer,sched: Rewrite core
freezer logic") replaced atomic_inc() in freezer_apply_state() with
static_branch_inc() which holds cpu_hotplug_lock.

cpu_hotplug_lock => cgroup_threadgroup_rwsem => freezer_mutex

  cgroup_file_write() {
    cgroup_procs_write() {
      __cgroup_procs_write() {
        cgroup_procs_write_start() {
          cgroup_attach_lock() {
            cpus_read_lock() {
              percpu_down_read(&cpu_hotplug_lock);
            }
            percpu_down_write(&cgroup_threadgroup_rwsem);
          }
        }
        cgroup_attach_task() {
          cgroup_migrate() {
            cgroup_migrate_execute() {
              freezer_attach() {
                mutex_lock(&freezer_mutex);
                (...snipped...)
              }
            }
          }
        }
        (...snipped...)
      }
    }
  }

freezer_mutex => cpu_hotplug_lock

  cgroup_file_write() {
    freezer_write() {
      freezer_change_state() {
        mutex_lock(&freezer_mutex);
        freezer_apply_state() {
          static_branch_inc(&freezer_active) {
            static_key_slow_inc() {
              cpus_read_lock();
              static_key_slow_inc_cpuslocked();
              cpus_read_unlock();
            }
          }
        }
        mutex_unlock(&freezer_mutex);
      }
    }
  }

Swap locking order by moving cpus_read_lock() in freezer_apply_state()
to before mutex_lock(&freezer_mutex) in freezer_change_state().

Reported-by: syzbot <syzbot+c39682e86c9d84152f93@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=c39682e86c9d84152f93
Suggested-by: Hillf Danton <hdanton@sina.com>
Fixes: f5d39b0208 ("freezer,sched: Rewrite core freezer logic")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-04-12 07:30:28 -10:00
Alexei Starovoitov
10fd5f70c3 bpf: Handle NULL in bpf_local_storage_free.
During OOM bpf_local_storage_alloc() may fail to allocate 'storage' and
call to bpf_local_storage_free() with NULL pointer will cause a crash like:
[ 271718.917646] BUG: kernel NULL pointer dereference, address: 00000000000000a0
[ 271719.019620] RIP: 0010:call_rcu+0x2d/0x240
[ 271719.216274]  bpf_local_storage_alloc+0x19e/0x1e0
[ 271719.250121]  bpf_local_storage_update+0x33b/0x740

Fixes: 7e30a8477b ("bpf: Add bpf_local_storage_free()")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230412171252.15635-1-alexei.starovoitov@gmail.com
2023-04-12 10:27:50 -07:00
Shunsuke Mie
970b17dfe2 dmaengine: dw-edma: Fix to enable to issue dma request on DMA processing
The issue_pending request is ignored while driver is processing a DMA
request. Fix to issue the pending requests on any dma channel status.

Fixes: e63d79d1ff ("dmaengine: Add Synopsys eDMA IP core driver")
Signed-off-by: Shunsuke Mie <mie@igel.co.jp>
Link: https://lore.kernel.org/r/20230411101758.438472-2-mie@igel.co.jp
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 22:44:49 +05:30
Shunsuke Mie
a251994a44 dmaengine: dw-edma: Fix to change for continuous transfer
The dw-edma driver stops after processing a DMA request even if a request
remains in the issued queue, which is not the expected behavior. The DMA
engine API requires continuous processing.

Add a trigger to start after one processing finished if there are requests
remain.

Fixes: e63d79d1ff ("dmaengine: Add Synopsys eDMA IP core driver")
Signed-off-by: Shunsuke Mie <mie@igel.co.jp>
Link: https://lore.kernel.org/r/20230411101758.438472-1-mie@igel.co.jp
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 22:44:49 +05:30
Rob Herring
619d8ea96d dmaengine: qcom_hidma: Add explicit platform_device.h and of_device.h includes
qcom_hidma uses of_dma_configure() which is declared in of_device.h.
platform_device.h and of_device.h get implicitly included by of_platform.h,
but that is going to be removed soon.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Sinan Kaya <okaya@kernel.org>
Link: https://lore.kernel.org/r/20230410232654.1561462-1-robh@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 22:43:08 +05:30
Dmitry Baryshkov
91d6a468e3 dma: gpi: remove spurious unlock in gpi_ch_init
gpi_ch_init() doesn't lock the ctrl_lock mutex, so there is no need to
unlock it too. Instead the mutex is handled by the function
gpi_alloc_chan_resources(), which properly locks and unlocks the mutex.

=====================================
WARNING: bad unlock balance detected!
6.3.0-rc5-00253-g99792582ded1-dirty #15 Not tainted
-------------------------------------
kworker/u16:0/9 is trying to release lock (&gpii->ctrl_lock) at:
[<ffffb99d04e1284c>] gpi_alloc_chan_resources+0x108/0x5bc
but there are no more locks to release!

other info that might help us debug this:
6 locks held by kworker/u16:0/9:
 #0: ffff575740010938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x220/0x594
 #1: ffff80000809bdd0 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x220/0x594
 #2: ffff575740f2a0f8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x38/0x188
 #3: ffff57574b5570f8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x38/0x188
 #4: ffffb99d06a2f180 (of_dma_lock){+.+.}-{3:3}, at: of_dma_request_slave_channel+0x138/0x280
 #5: ffffb99d06a2ee20 (dma_list_mutex){+.+.}-{3:3}, at: dma_get_slave_channel+0x28/0x10c

stack backtrace:
CPU: 7 PID: 9 Comm: kworker/u16:0 Not tainted 6.3.0-rc5-00253-g99792582ded1-dirty #15
Hardware name: Google Pixel 3 (DT)
Workqueue: events_unbound deferred_probe_work_func
Call trace:
 dump_backtrace+0xa0/0xfc
 show_stack+0x18/0x24
 dump_stack_lvl+0x60/0xac
 dump_stack+0x18/0x24
 print_unlock_imbalance_bug+0x130/0x148
 lock_release+0x270/0x300
 __mutex_unlock_slowpath+0x48/0x2cc
 mutex_unlock+0x20/0x2c
 gpi_alloc_chan_resources+0x108/0x5bc
 dma_chan_get+0x84/0x188
 dma_get_slave_channel+0x5c/0x10c
 gpi_of_dma_xlate+0x110/0x1a0
 of_dma_request_slave_channel+0x174/0x280
 dma_request_chan+0x3c/0x2d4
 geni_i2c_probe+0x544/0x63c
 platform_probe+0x68/0xc4
 really_probe+0x148/0x2ac
 __driver_probe_device+0x78/0xe0
 driver_probe_device+0x3c/0x160
 __device_attach_driver+0xb8/0x138
 bus_for_each_drv+0x84/0xe0
 __device_attach+0x9c/0x188
 device_initial_probe+0x14/0x20
 bus_probe_device+0xac/0xb0
 device_add+0x60c/0x7d8
 of_device_add+0x44/0x60
 of_platform_device_create_pdata+0x90/0x124
 of_platform_bus_create+0x15c/0x3c8
 of_platform_populate+0x58/0xf8
 devm_of_platform_populate+0x58/0xbc
 geni_se_probe+0xf0/0x164
 platform_probe+0x68/0xc4
 really_probe+0x148/0x2ac
 __driver_probe_device+0x78/0xe0
 driver_probe_device+0x3c/0x160
 __device_attach_driver+0xb8/0x138
 bus_for_each_drv+0x84/0xe0
 __device_attach+0x9c/0x188
 device_initial_probe+0x14/0x20
 bus_probe_device+0xac/0xb0
 deferred_probe_work_func+0x8c/0xc8
 process_one_work+0x2bc/0x594
 worker_thread+0x228/0x438
 kthread+0x108/0x10c
 ret_from_fork+0x10/0x20

Fixes: 5d0c3533a1 ("dmaengine: qcom: Add GPI dma driver")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20230409233355.453741-1-dmitry.baryshkov@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 22:42:27 +05:30
Josh Poimboeuf
50f9a76ef1 iov_iter: Mark copy_compat_iovec_from_user() noinline
After commit 6376ce56feb6 ("iov_iter: import single vector iovecs as
ITER_UBUF"), GCC does an inter-procedural compiler optimization which
moves the user_access_begin() out of copy_compat_iovec_from_user() and
into its callers:

  lib/iov_iter.o: warning: objtool: .altinstr_replacement+0x0: redundant UACCESS disable
  lib/iov_iter.o: warning: objtool: iovec_from_user.part.0+0xc7: call to copy_compat_iovec_from_user.part.0() with UACCESS enabled
  lib/iov_iter.o: warning: objtool: __import_iovec+0x21d: call to copy_compat_iovec_from_user.part.0() with UACCESS enabled

Enforce the "no UACCESS enable across function boundaries" rule by
disabling cloning for copy_compat_iovec_from_user().

Fixes: 6376ce56feb6 ("iov_iter: import single vector iovecs as ITER_UBUF")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
https://lkml.kernel.org/lkml/20230327120017.6bb826d7@canb.auug.org.au
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-12 10:46:48 -06:00
Sinthu Raja
a010613237 phy: cadence: cdns-dphy-rx: Add common module reset support
DPHY RX module has a common module reset (RSTB_CMN) which is expected
to be released during configuration. In J721E SR1.0 the RSTB_CMN is
internally tied to CSI_RX_RST and is hardware controlled, for all
other newer platforms the common module reset is software controlled.
Add support to control common module reset during configuration and
also skip common module reset based on soc_device_match() for J721E SR1.0.

Signed-off-by: Sinthu Raja <sinthu.raja@ti.com>
Co-developed-by: Vaishnav Achath <vaishnav.a@ti.com>
Signed-off-by: Vaishnav Achath <vaishnav.a@ti.com>
Reviewed-by: Jai Luthra <j-luthra@ti.com>
Link: https://lore.kernel.org/r/20230314073137.2153-1-vaishnav.a@ti.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 22:16:16 +05:30
Siddharth Vadapalli
ec318c51b6 phy: ti: j721e-wiz: Add SGMII support in WIZ driver for J721E
Enable full rate divider configuration support for J721E_WIZ_16G for SGMII.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20230309092434.443550-1-s-vadapalli@ti.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-04-12 22:15:27 +05:30
Benno Lossin
1944caa8e8 rust: sync: add functions for initializing UniqueArc<MaybeUninit<T>>
Add two functions `init_with` and `pin_init_with` to
`UniqueArc<MaybeUninit<T>>` to initialize the memory of already allocated
`UniqueArc`s. This is useful when you want to allocate memory check some
condition inside of a context where allocation is forbidden and then
conditionally initialize an object.

Signed-off-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Link: https://lore.kernel.org/r/20230408122429.1103522-16-y86-dev@protonmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-12 18:41:05 +02:00
Benno Lossin
701608bd03 rust: sync: reduce stack usage of UniqueArc::try_new_uninit
`UniqueArc::try_new_uninit` calls `Arc::try_new(MaybeUninit::uninit())`.
This results in the uninitialized memory being placed on the stack,
which may be arbitrarily large due to the generic `T` and thus could
cause a stack overflow for large types.

Change the implementation to use the pin-init API which enables in-place
initialization. In particular it avoids having to first construct and
then move the uninitialized memory from the stack into the final location.

Signed-off-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Link: https://lore.kernel.org/r/20230408122429.1103522-15-y86-dev@protonmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-12 18:41:05 +02:00
Benno Lossin
692e8935e2 rust: types: add Opaque::ffi_init
This function allows to easily initialize `Opaque` with the pin-init
API. `Opaque::ffi_init` takes a closure and returns a pin-initializer.
This pin-initiailizer calls the given closure with a pointer to the
inner `T`.

Co-developed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20230408122429.1103522-14-y86-dev@protonmail.com
[ Fixed typo. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-12 18:41:05 +02:00
Benno Lossin
8586f1acd3 rust: prelude: add pin-init API items to prelude
Add `pin-init` API macros and traits to the prelude.

Signed-off-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Link: https://lore.kernel.org/r/20230408122429.1103522-13-y86-dev@protonmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-12 18:41:05 +02:00
Benno Lossin
38cde0bd7b rust: init: add Zeroable trait and init::zeroed function
Add the `Zeroable` trait which marks types that can be initialized by
writing `0x00` to every byte of the type. Also add the `init::zeroed`
function that creates an initializer for a `Zeroable` type that writes
`0x00` to every byte.

Signed-off-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Link: https://lore.kernel.org/r/20230408122429.1103522-12-y86-dev@protonmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-12 18:41:05 +02:00
Benno Lossin
6841d45a30 rust: init: add stack_pin_init! macro
The `stack_pin_init!` macro allows pin-initializing a value on the
stack. It accepts a `impl PinInit<T, E>` to initialize a `T`. It allows
propagating any errors via `?` or handling it normally via `match`.

Signed-off-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://lore.kernel.org/r/20230408122429.1103522-11-y86-dev@protonmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-12 18:41:05 +02:00
Benno Lossin
d0fdc39612 rust: init: add PinnedDrop trait and macros
The `PinnedDrop` trait that facilitates destruction of pinned types.
It has to be implemented via the `#[pinned_drop]` macro, since the
`drop` function should not be called by normal code, only by other
destructors. It also only works on structs that are annotated with
`#[pin_data(PinnedDrop)]`.

Co-developed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Link: https://lore.kernel.org/r/20230408122429.1103522-10-y86-dev@protonmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-12 18:41:05 +02:00
Benno Lossin
92c4a1e7e8 rust: init/sync: add InPlaceInit trait to pin-initialize smart pointers
The `InPlaceInit` trait that provides two functions, for initializing
using `PinInit<T, E>` and `Init<T>`. It is implemented by `Arc<T>`,
`UniqueArc<T>` and `Box<T>`.

Signed-off-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Link: https://lore.kernel.org/r/20230408122429.1103522-9-y86-dev@protonmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-12 18:41:05 +02:00
Benno Lossin
fc6c6baa1f rust: init: add initialization macros
Add the following initializer macros:
- `#[pin_data]` to annotate structurally pinned fields of structs,
  needed for `pin_init!` and `try_pin_init!` to select the correct
  initializer of fields.
- `pin_init!` create a pin-initializer for a struct with the
  `Infallible` error type.
- `try_pin_init!` create a pin-initializer for a struct with a custom
  error type (`kernel::error::Error` is the default).
- `init!` create an in-place-initializer for a struct with the
  `Infallible` error type.
- `try_init!` create an in-place-initializer for a struct with a custom
  error type (`kernel::error::Error` is the default).

Also add their needed internal helper traits and structs.

Co-developed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Link: https://lore.kernel.org/r/20230408122429.1103522-8-y86-dev@protonmail.com
[ Fixed three typos. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-12 18:41:05 +02:00
Benno Lossin
90e53c5e70 rust: add pin-init API core
This API is used to facilitate safe pinned initialization of structs. It
replaces cumbersome `unsafe` manual initialization with elegant safe macro
invocations.

Due to the size of this change it has been split into six commits:
1. This commit introducing the basic public interface: traits and
   functions to represent and create initializers.
2. Adds the `#[pin_data]`, `pin_init!`, `try_pin_init!`, `init!` and
   `try_init!` macros along with their internal types.
3. Adds the `InPlaceInit` trait that allows using an initializer to create
   an object inside of a `Box<T>` and other smart pointers.
4. Adds the `PinnedDrop` trait and adds macro support for it in
   the `#[pin_data]` macro.
5. Adds the `stack_pin_init!` macro allowing to pin-initialize a struct on
   the stack.
6. Adds the `Zeroable` trait and `init::zeroed` function to initialize
   types that have `0x00` in all bytes as a valid bit pattern.

--

In this section the problem that the new pin-init API solves is outlined.
This message describes the entirety of the API, not just the parts
introduced in this commit. For a more granular explanation and additional
information on pinning and this issue, view [1].

Pinning is Rust's way of enforcing the address stability of a value. When a
value gets pinned it will be impossible for safe code to move it to another
location. This is done by wrapping pointers to said object with `Pin<P>`.
This wrapper prevents safe code from creating mutable references to the
object, preventing mutable access, which is needed to move the value.
`Pin<P>` provides `unsafe` functions to circumvent this and allow
modifications regardless. It is then the programmer's responsibility to
uphold the pinning guarantee.

Many kernel data structures require a stable address, because there are
foreign pointers to them which would get invalidated by moving the
structure. Since these data structures are usually embedded in structs to
use them, this pinning property propagates to the container struct.
Resulting in most structs in both Rust and C code needing to be pinned.

So if we want to have a `mutex` field in a Rust struct, this struct also
needs to be pinned, because a `mutex` contains a `list_head`. Additionally
initializing a `list_head` requires already having the final memory
location available, because it is initialized by pointing it to itself. But
this presents another challenge in Rust: values have to be initialized at
all times. There is the `MaybeUninit<T>` wrapper type, which allows
handling uninitialized memory, but this requires using the `unsafe` raw
pointers and a casting the type to the initialized variant.

This problem gets exacerbated when considering encapsulation and the normal
safety requirements of Rust code. The fields of the Rust `Mutex<T>` should
not be accessible to normal driver code. After all if anyone can modify
the fields, there is no way to ensure the invariants of the `Mutex<T>` are
upheld. But if the fields are inaccessible, then initialization of a
`Mutex<T>` needs to be somehow achieved via a function or a macro. Because
the `Mutex<T>` must be pinned in memory, the function cannot return it by
value. It also cannot allocate a `Box` to put the `Mutex<T>` into, because
that is an unnecessary allocation and indirection which would hurt
performance.

The solution in the rust tree (e.g. this commit: [2]) that is replaced by
this API is to split this function into two parts:

1. A `new` function that returns a partially initialized `Mutex<T>`,
2. An `init` function that requires the `Mutex<T>` to be pinned and that
   fully initializes the `Mutex<T>`.

Both of these functions have to be marked `unsafe`, since a call to `new`
needs to be accompanied with a call to `init`, otherwise using the
`Mutex<T>` could result in UB. And because calling `init` twice also is not
safe. While `Mutex<T>` initialization cannot fail, other structs might
also have to allocate memory, which would result in conditional successful
initialization requiring even more manual accommodation work.

Combine this with the problem of pin-projections -- the way of accessing
fields of a pinned struct -- which also have an `unsafe` API, pinned
initialization is riddled with `unsafe` resulting in very poor ergonomics.
Not only that, but also having to call two functions possibly multiple
lines apart makes it very easy to forget it outright or during refactoring.

Here is an example of the current way of initializing a struct with two
synchronization primitives (see [3] for the full example):

    struct SharedState {
        state_changed: CondVar,
        inner: Mutex<SharedStateInner>,
    }

    impl SharedState {
        fn try_new() -> Result<Arc<Self>> {
            let mut state = Pin::from(UniqueArc::try_new(Self {
                // SAFETY: `condvar_init!` is called below.
                state_changed: unsafe { CondVar::new() },
                // SAFETY: `mutex_init!` is called below.
                inner: unsafe {
                    Mutex::new(SharedStateInner { token_count: 0 })
                },
            })?);

            // SAFETY: `state_changed` is pinned when `state` is.
            let pinned = unsafe {
                state.as_mut().map_unchecked_mut(|s| &mut s.state_changed)
            };
            kernel::condvar_init!(pinned, "SharedState::state_changed");

            // SAFETY: `inner` is pinned when `state` is.
            let pinned = unsafe {
                state.as_mut().map_unchecked_mut(|s| &mut s.inner)
            };
            kernel::mutex_init!(pinned, "SharedState::inner");

            Ok(state.into())
        }
    }

The pin-init API of this patch solves this issue by providing a
comprehensive solution comprised of macros and traits. Here is the example
from above using the pin-init API:

    #[pin_data]
    struct SharedState {
        #[pin]
        state_changed: CondVar,
        #[pin]
        inner: Mutex<SharedStateInner>,
    }

    impl SharedState {
        fn new() -> impl PinInit<Self> {
            pin_init!(Self {
                state_changed <- new_condvar!("SharedState::state_changed"),
                inner <- new_mutex!(
                    SharedStateInner { token_count: 0 },
                    "SharedState::inner",
                ),
            })
        }
    }

Notably the way the macro is used here requires no `unsafe` and thus comes
with the usual Rust promise of safe code not introducing any memory
violations. Additionally it is now up to the caller of `new()` to decide
the memory location of the `SharedState`. They can choose at the moment
`Arc<T>`, `Box<T>` or the stack.

--

The API has the following architecture:
1. Initializer traits `PinInit<T, E>` and `Init<T, E>` that act like
   closures.
2. Macros to create these initializer traits safely.
3. Functions to allow manually writing initializers.

The initializers (an `impl PinInit<T, E>`) receive a raw pointer pointing
to uninitialized memory and their job is to fully initialize a `T` at that
location. If initialization fails, they return an error (`E`) by value.

This way of initializing cannot be safely exposed to the user, since it
relies upon these properties outside of the control of the trait:
- the memory location (slot) needs to be valid memory,
- if initialization fails, the slot should not be read from,
- the value in the slot should be pinned, so it cannot move and the memory
  cannot be deallocated until the value is dropped.

This is why using an initializer is facilitated by another trait that
ensures these requirements.

These initializers can be created manually by just supplying a closure that
fulfills the same safety requirements as `PinInit<T, E>`. But this is an
`unsafe` operation. To allow safe initializer creation, the `pin_init!` is
provided along with three other variants: `try_pin_init!`, `try_init!` and
`init!`. These take a modified struct initializer as a parameter and
generate a closure that initializes the fields in sequence.
The macros take great care in upholding the safety requirements:
- A shadowed struct type is used as the return type of the closure instead
  of `()`. This is to prevent early returns, as these would prevent full
  initialization.
- To ensure every field is only initialized once, a normal struct
  initializer is placed in unreachable code. The type checker will emit
  errors if a field is missing or specified multiple times.
- When initializing a field fails, the whole initializer will fail and
  automatically drop fields that have been initialized earlier.
- Only the correct initializer type is allowed for unpinned fields. You
  cannot use a `impl PinInit<T, E>` to initialize a structurally not pinned
  field.

To ensure the last point, an additional macro `#[pin_data]` is needed. This
macro annotates the struct itself and the user specifies structurally
pinned and not pinned fields.

Because dropping a pinned struct is also not allowed to break the pinning
invariants, another macro attribute `#[pinned_drop]` is needed. This
macro is introduced in a following commit.

These two macros also have mechanisms to ensure the overall safety of the
API. Additionally, they utilize a combined proc-macro, declarative macro
design: first a proc-macro enables the outer attribute syntax `#[...]` and
does some important pre-parsing. Notably this prepares the generics such
that the declarative macro can handle them using token trees. Then the
actual parsing of the structure and the emission of code is handled by a
declarative macro.

For pin-projections the crates `pin-project` [4] and `pin-project-lite` [5]
had been considered, but were ultimately rejected:
- `pin-project` depends on `syn` [6] which is a very big dependency, around
  50k lines of code.
- `pin-project-lite` is a more reasonable 5k lines of code, but contains a
  very complex declarative macro to parse generics. On top of that it
  would require modification that would need to be maintained
  independently.

Link: https://rust-for-linux.com/the-safe-pinned-initialization-problem [1]
Link: 0a04dc4ddd [2]
Link: f509ede33f/samples/rust/rust_miscdev.rs [3]
Link: https://crates.io/crates/pin-project [4]
Link: https://crates.io/crates/pin-project-lite [5]
Link: https://crates.io/crates/syn [6]
Co-developed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Link: https://lore.kernel.org/r/20230408122429.1103522-7-y86-dev@protonmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-12 18:41:05 +02:00
Benno Lossin
3ff6e785ad rust: types: add Opaque::raw_get
This function mirrors `UnsafeCell::raw_get`. It avoids creating a
reference and allows solely using raw pointers.
The `pin-init` API will be using this, since uninitialized memory
requires raw pointers.

Signed-off-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20230408122429.1103522-6-y86-dev@protonmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-12 18:41:05 +02:00
Benno Lossin
d6dbca3592 rust: sync: change error type of constructor functions
Change the error type of the constructors of `Arc` and `UniqueArc` to be
`AllocError` instead of `Error`. This makes the API more clear as to
what can go wrong when calling `try_new` or its variants.

Signed-off-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://lore.kernel.org/r/20230408122429.1103522-4-y86-dev@protonmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-12 18:41:05 +02:00
Gary Guo
70a21e54a4 rust: macros: add quote! macro
Add the `quote!` macro for creating `TokenStream`s directly via the
given Rust tokens. It also supports repetitions using iterators.

It will be used by the pin-init API proc-macros to generate code.

Signed-off-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20230408122429.1103522-3-y86-dev@protonmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-04-12 18:41:05 +02:00