Currently we pin the GuC or HuC firmware image just before uploading.
Perma-pin during uC initialization instead and use the range reserved at
the top of the address space.
Moving the firmware resulted in needing to:
- use an additional pinning for the rsa signature which will be used
during HuC auth as addresses above GUC_GGTT_TOP do not map through GTT.
v2: Remove call to set to gtt domain
Do not restore fw gtt mapping unconditionally
Separate out pin/unpin functions and drop usage of pin/unpin
Use uc_fw init/fini functions to bind/unbind fw object
v3: Bind is only needed during xfer (Chris)
Remove attempts to bind outside of xfer (Chris)
Mark fw bind/unbind static
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190419230015.18121-4-fernando.pacheco@intel.com
GuC and HuC depend on struct_mutex for device reinitialization. Moving
away from this dependency requires perma-pinning the firmware images in
GGTT. The upper portion of the GuC address space has a sizeable hole
(several MB) that is inaccessible by GuC. Reserve this range within GGTT
as it can comfortably hold GuC/HuC firmware images.
v2: Reserve node rather than insert (Chris)
Simpler determination of node start/size (Daniele)
Move reserve/release out to intel_guc.* files
v3: Reserve starting at GUC_GGTT_TOP only and bail if this
fails (Chris)
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190419230015.18121-3-fernando.pacheco@intel.com
An interesting discussion regarding "hybrid interrupt polling" for NVMe
came to the conclusion that the ideal busyspin before sleeping was half
of the expected request latency (and better if it was already halfway
through that request). This suggested that we too should look again at
our tradeoff between spinning and waiting. Currently, our spin simply
tries to hide the cost of enabling the interrupt, which is good to avoid
penalising nop requests (i.e. test throughput) and not much else.
Studying real world workloads suggests that a spin of upto 500us can
dramatically boost performance, but the suggestion is that this is not
from avoiding interrupt latency per-se, but from secondary effects of
sleeping such as allowing the CPU reduce cstate and context switch away.
In a truly hybrid interrupt polling scheme, we would aim to sleep until
just before the request completed and then wake up in advance of the
interrupt and do a quick poll to handle completion. This is tricky for
ourselves at the moment as we are not recording request times, and since
we allow preemption, our requests are not on as a nicely ordered
timeline as IO. However, the idea is interesting, for it will certainly
help us decide when busyspinning is worthwhile.
v2: Expose the spin setting via Kconfig options for easier adjustment
and testing.
v3: Don't get caught sneaking in a change to the busyspin parameters.
v4: Explain more about the "hybrid interrupt polling" scheme that we
want to migrate towards.
Suggested-by: Sagar Kamble <sagar.a.kamble@intel.com>
References: http://events.linuxfoundation.org/sites/events/files/slides/lemoal-nvme-polling-vault-2017-final_0.pdf
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sagar Kamble <sagar.a.kamble@intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Sagar Kamble <sagar.a.kamble@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190419182625.11186-1-chris@chris-wilson.co.uk
First loop does copy_from_user() without the table lock held and
just stores the handle. Second loop looks up buffer objects with the
table_lock held without potentially blocking or faulting. This lets us
clean up a bunch of custom, non-faulting copy_from_user() code.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Now that we don't have the mmap_sem lock inversion, we don't need to
jump through this particular hoop anymore.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
We use a llist and a worker to delay the object cleanup. This avoids
taking mmap_sem and struct_mutex in the wrong order when calling
drm_gem_object_put_unlocked() from drm_gem_mmap().
Fixes lockdep problem with copy_from_user() in msm_ioctl_gem_submit().
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
The HFI tasklet was removed in df0dff1 ("drm/msm/a6xx: Poll for HFI
responses") but the tasklet_struct was accidentally left behind.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Currently if the GMU resume function fails all we try to do is clear the
BOOT_SLUMBER oob which usually times out and ends up in a cycle of death.
If the resume function fails at any point remove any RPMh votes that might
have been added and try to shut down the GMU hardware cleanly.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Now that the GX domain is sorted we can wire up a working GMU reset.
IF a GMU hang was detected then try to forcefully shut down the GMU
in the power down sequence which should ensure that it can recover
normally on the next power up.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
99.999% of the time during normal operation the GMU is responsible
for power and clock control on the GX domain and the CPU remains
blissfully unaware. However, there is one situation where the CPU
needs to get involved:
The power sequencing rules dictate that the GX needs to be turned
off before the CX so that the CX can be turned on before the GX
during power up. During normal operation when the CPU is taking
down the CX domain a stop command is sent to the GMU which turns
off the GX domain and then the CPU handles the CX domain.
But if the GMU happened to be unresponsive while the GX domain was
left then the CPU will need to step in and turn off the GX domain
before resetting the CX and rebooting the GMU. This unfortunately
means that the CPU needs to be marginally aware of the GX domain
even though it is expected to usually keep its hands off.
To support this we create a semi-disabled GX power domain that
does nothing to the hardware on power up but tries to shut it
down normally on power down. In this method the reference counting
is correct and we can step in with the pm_runtime_put() at the right
time during the failure path.
This patch sets up the connection to the GX power domain and does
the magic to "enable" and disable it at the right points.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
The GMU code currently has some misguided code to try to work around
a hardware quirk that requires the power domains on the GPU be
collapsed in a certain order. Upcoming patches will do this the
right way so get rid of the unused and unwanted regulator
code.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Add the capability to query information from a submit queue.
The first available parameter is for querying the number of GPU faults
(hangs) that can be attributed to the queue.
This is useful for implementing context robustness. A user context can
regularly query the number of faults to see if it is responsible for any
and if so it can invalidate itself.
This is also helpful for testing by confirming to the user driver if a
particular command stream caused a fault (or not as the case may be).
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
For KHR_robustness, userspace wants to know two things, the count of GPU
faults globally, and the count of faults attributed to a given context.
This patch providees the former, and the next patch provides the latter.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
For now it always returns '0' (false), but once the iommu work is in
place to enable per-process pagetables we can update the value returned.
Userspace needs to know this to make an informed decision about exposing
KHR_robustness.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Under SRIOV, we need disable DRIVER_ATOMIC.
Otherwise, it will trigger WARN_ON at drm_universal_plane_init.
Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_bo_destroy had a bug by calling amdgpu_bo_unref outside mutex_lock.
If amdgpu_device_recover_vram executed between amdgpu_bo_unref and list_del_init,
it would get NULL of shadow->parent, then caused Call Trace and GPU reset failed.
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Method of getting firmware version is the same across ASICs, so remove
them from ASIC-specific files and create one in amdgpu_amdkfd.c. This new
created get_fw_version simply reads fw_version from adev->gfx than parsing
the ucode header.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When a driver unloads without unloading TTM we don't correctly
clear the global structures leading to errors on re-init.
Next step should probably be to remove the global structures and
kobjs all together, but this is tricky since we need to maintain
backward compatibility.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
CC: stable@vger.kernel.org # 5.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The driver does not currently support unbinding from a device which is
in use. Since open file descriptors may still be pointing into kernel
memory where the device structures used to be, entirely correct kernel
panics protect the driver from being unbound as we should not be
unbinding it before those dangling pointers have been made safe.
According to the documentation found inside drivers/gpu/drm/drm_drv.c,
drm_dev_unplug() should be used instead of drm_dev_unregister() in
order to make a device inaccessible to users as soon as it is unpluged.
Follow that advice to make those possibly dangling pointers safe,
protected by DRM layer from a user who is otherwise left pointing into
possibly reused kernel memory after the driver has been unbound from
the device. Once done, also cancel inflight operations immediately by
calling i915_gem_set_wedged().
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190405130235.7707-2-janusz.krzysztofik@linux.intel.com
The call to of_get_child_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:57:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 47, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:66:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 47, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:118:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 47, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:57:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 51, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:66:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 51, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:118:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 51, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Sean Paul <sean@poorly.run>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jordan Crouse <jcrouse@codeaurora.org>
Cc: Mamta Shukla <mamtashukla555@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Sharat Masetty <smasetty@codeaurora.org>
Cc: linux-arm-msm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: freedreno@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org (open list)
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
The patch ("OPP: Add support for parsing the 'opp-level' property")
adds an API enabling a cleaner way to read the opp-level. Let's use
the new API.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
The frame_busy mask is used in frame_done event handling, which is not
invoked for async commits. So an async commit will leave the
frame_busy mask populated after it completes and future commits will start
with the busy mask incorrect.
This showed up on disable after cursor move. I was hitting the "this should
not happen" comment in the frame event worker since frame_busy was set,
we queued the event, but there were no frames pending (since async
also doesn't set that).
Reviewed-by: Fritz Koenig <frkoenig@google.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190130163220.138637-1-sean@poorly.run
Signed-off-by: Rob Clark <robdclark@chromium.org>
In the case of an async/cursor update, we don't wait for the frame_done
event, which means handle_frame_done is never called, and the frame_done
watchdog isn't canceled. Currently, this results in a frame_done timeout
every time the cursor moves without a synchronous frame following it up
before the timeout expires. Since we don't wait for frame_done, and
don't handle it, we shouldn't modify the watchdog.
Reviewed-by: Fritz Koenig <frkoenig@google.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128204306.95076-4-sean@poorly.run
Signed-off-by: Rob Clark <robdclark@chromium.org>
There exists a bunch of confusion as to what the actual units of
frame_done is:
- The definition states it's in # of frames
- CRTC treats it like it's ms
- frame_done_timeout comment thinks it's Hz, but it stores ms
- frame_done timer is setup such that it _should_ be in frames, but the
timeout is super long
So this patch tries to interpret what the driver really wants. I've
de-centralized the #define since the consumers are expecting different
units.
For crtc, we just use 60ms since that's what it was doing before.
Perhaps we could get fancy and scale with vrefresh, but that's for
another time.
For encoder, fix the comments and rename frame_done_timeout so it's
obvious what the units are. In practice, frame_done_timeout is really
just checked against 0 || !0, which I guess is why the units being wrong
didn't matter. I've also dropped the timeout from the previous 60 frames
to 5. That seems like more than enough time to give up on a frame, and
my guess is that no one intended for the timeout to _actually_ be 60
frames.
Reviewed-by: Fritz Koenig <frkoenig@google.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128204306.95076-3-sean@poorly.run
Signed-off-by: Rob Clark <robdclark@chromium.org>
Currently the IOMMU code calls pm_runtime_get/put on the GPU or display
device before doing a IOMMU operation. This was because usually the
IOMMU driver didn't do power control of its own and since the hardware
used the same clocks and power as the respective multimedia device it
was a easy way to make sure that the power was available.
Now two things have changed. First, the SMMU devices can do their own power
control and more important bringing up the a6xx GPU isn't as easy as
turning on some clocks. To bring the GPU up we need the GMU which itself
needs the IOMMU so we have a chicken and egg problem.
Luckily this is easily fixed by removing the pm_runtime calls from the
functions and letting the device link to the IOMMU device handle the magic.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
The pages backing the GEM objects are kept pinned in place as
long as they are alive, so they must not be allocated from the
MOVABLE zone. Blocking page migration for too long will cause
the VM subsystem headaches and will outright break CMA, as a
few pinned pages in CMA will lead to failure to find the
required large contiguous regions.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
It is the expectation of existing userspace (X11 + Mesa, in
particular) that jobs submitted to the kernel against a shared BO will
get implicitly synchronized by their submission order. If we want to
allow clever userspace to disable implicit synchronization, we should
do that under its own submit flag (as amdgpu and lima do).
Note that we currently only implicitly sync for the rendering pass,
not binning -- if you texture-from-pixmap in the binning vertex shader
(vertex coordinate generation), you'll miss out on synchronization.
Fixes flickering when multiple clients are running in parallel,
particularly GL apps and compositors.
v2: Fix a missing refcount on the CSD done fence for L2 cleaning.
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20190416225856.20264-6-eric@anholt.net
Acked-by: Rob Clark <robdclark@gmail.com>