The engine provides a mirror of the CSB and CSB write pointer in the HWSP.
Read these status from virtual HWSP in VM can reduce CPU utilization while
applications have much more short GPU workloads. Here we update the
corresponding data in virtual HWSP as it in virtual MMIO.
Before read these status from HWSP in GVT-g VM, please ensure the host
support it by checking the BIT(3) of caps in PVINFO.
Virtual HWSP only support GEN8+ platform, since the HWSP MMIO may change
follow the platform update, please add the corresponding MMIO emulation
when enable new platforms in GVT-g.
v3 : Add address audit in HWSP address update.
v4 :
Separate this patch with enalbe virtual HWSP in VM.
Use intel_gvt_render_mmio_to_ring_id() to determine ring_id by offset.
v5 : Remove unnessary check about Gen8, GVT-g only support Gen8+.
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Refine previously broken PPGTT scratch. Scratch PTE was no correctly
handled and also the handling of scratch entries in page table walk was
not well organized, which brings gaps of introducing lazy shadow.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Need to figure out page table type of current level by GTT entry type
during getting a scratch page table entry.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
During a vGPU reset, the scratch page table shouldn't be cleared, what
needs to be cleared should be the scratch page.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
As we want to re-use intel_vgpu_shadow_page in buidling scrach page table
and we don't want to put scrach page table page into hash table, a new
param is introduced to give the caller a choice to decide if a shadow page
should be put into hash table.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
As there is already an I915_GTT_PAGE_SIZE marco in i915, let GVT-g use it
as well. Also this patch re-names some GTT marcos with additional prefix.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Since many emulation logic needs to convert the offset of ring registers
into ring id, we export it for other caller which might need it.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
As the data structure of "intel_vgpu_guest_page" will become much heavier
in future, it's better to factor out the guest memory page track mechnisim
as early as possible.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
1) Use standard i915 GEM object sequence to access the shadow batch buffer.
2) Manage i915 vma life cycle to solve one FIXME.
v2:
- Refine code structure.
- Refine the usage of GEM APIs.
- Add the missing lock/unlock in release_shadow_batch_buffer.
Test on my SKL NuC.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Returns the error code if something is wrong and the size of batch buffer
is passed through the pointer.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
We need debugfs entry to expose some debug information of gvt and vGPUs.
The first tool will be added is mmio-diff, which help to find the
difference values of host and vGPU mmio. It's useful for platform
enabling.
This patch just add a basic debugfs infrastructure, each vGPU has its own
sub-folder. Two simple attributes are created as a template.
.
├── num_tracked_mmio
├── vgpu1
| └── active
└── vgpu2
└── active
Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
all the vGPU type related code in kvmgt will be moved into
gvt.c/gvt.h files while the common vGPU type related interfaces
will be called.
v2:
- intel_gvt_{init,cleanup}_vgpu_type_groups are initialized in
gvt part. (Wang, Zhi)
Signed-off-by: fred gao <fred.gao@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
In this patch, all the vGPU type related code will be merged into
same gvt file and the common interface will be exposed to both
XenGT and KvmGT.
v2:
- remove the useless mdev_* gvt_ops.
add get_gvt_attr ops for MPT module.
intel_gvt_{init,cleanup}_vgpu_type_groups are initialized in
gvt part. (Wang, Zhi)
- set gvt_vgpu_type_groups[i] to NULL. (Zhang,Xiong)
Signed-off-by: fred gao <fred.gao@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Move clean_workloads() into scheduler.c since it's not specific to
execlist.
v2:
- Remove clean_workloads in intel_vgpu_select_submission_ops. (Zhenyu)
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Introduce vGPU submission ops to support easy switching submission mode
of one vGPU between different OSes.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Move common vGPU workload creation functions into scheduler.c since
they are not specific to execlist emulation.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Move common workload preparation into prepare_workload() in scheduler.c,
as they are not specific to execlist emulation.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
It's better enable/disable and classify gvt debug info dynamically.
This patch change it to dyndbg so can be dynamically enable/disable
each item. All gvt log can be enabled by,
$ echo 'file *gvt* +p' > /sys/kernel/debug/dynamic_debug/control
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
An earlier fix changed the return type from find_bb_size however the
integer return is being assigned to a unsigned int so the -ve error
check will never be detected. Make bb_size an int to fix this.
Detected by CoverityScan CID#1456886 ("Unsigned compared against 0")
Fixes: 1e3197d6ad ("drm/i915/gvt: Refine error handling for perform_bb_shadow")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
When a scan error occurs in submit_context, this patch is to
decrease the mm ref count and free the workload struct before
the workload is abandoned.
v2:
- submit_context related code should be combined together. (Zhenyu)
v3:
- free all the unsubmitted workloads. (Zhenyu)
v4:
- refine the clean path. (Zhenyu)
v5:
- polish the title. (Zhenyu)
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
When a scan error occurs in dispatch_workload, this patch is to
check the healthy state and free all the queued workloads before
the failsafe mode is entered.
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Generally, there are 3 types of errors during command scan: a) some
commands might be unknown with EBADRQC; b) some cmd access invalid
address with EFAULT; c) some unexpected force nonpriv cmd with EPERM.
later the healthy state can be judged through the return error.
v2:
- remove some internal i915 errors rating. (Zhenyu)
v3:
- the healthy state is judged through the internal defined return
error. (Zhenyu)
- force non priv cmd error can be ignored. (Kevin)
v4:
- reuse standard defined errno instead of recreate, e.g EBADRQC for
unknown cmd, EFAULT for invalid address, EPERM for nonpriv. (Zhenyu)
v5:
- remove some irrelevant code for the patch.
- fix typo of vgpu_is_vm_unhealthy. (Zhenyu)
v6:
- move the healthy check and failsafe code into another patch. (Zhenyu)
v7:
- polish title and commit message. (Zhenyu)
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Theoretically, the largest bulk of commands in the ring buffer of an
engine might be the first submission, which usually contains a lot
of commands to initialize the HW. After removing the initial allocation
of the ring scan buffer and let krealloc() do everything we need, we
still have a big chance to get the buffer of suitable size in the first
submission.
Tested on my SKL NUC.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Move ring scan buffers into intel_vgpu_submission since they belongs to
a part of vGPU submission stuffs.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
"reserved" means reserve something from somewhere. Actually they are
buffers used by command scanner. Rename it to ring_scan_buffer.
v2:
- Remove the usage of an extra variable. (Zhenyu)
Fixes: 0a53bc07f0 ("drm/i915/gvt: Separate cmd scan from request allocation")
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Move vGPU workload cache initialization/de-initialization into
intel_vgpu_{setup, clean}_submission() since they are not specific to
execlist stuffs.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
To move workload related functions into scheduler.c, an expected way is
to collect all the init/clean functions related to vGPU workload
submission into fewer functions.
Rename intel_vgpu_{init, clean}_gvt_context() for above usage in future.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
The context descriptors in elsp_dwords are stored in a reversed order and
the definition of context descriptor is also reversed. The revesred stuff
is hard to be used and might cause misunderstanding. Make them in the right
oder for following code re-factoring.
Tested on my SKL NUC.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
opregion emulated with a copy from host which leads to some display
bugs such as guest resolution adjustment failure due to host opregion
fail to claim port D support. with a fake opregion table provided
to fully emulate opregion to meet guest port requirement.
v1 - initial patch
v2 - reforamt opregion arrary with 0x02x output
v3 - opregion array removed with opregion generation on host initizaiton
v4 - rebased v3 patch from stable branch to staging branch which also has
different struct child_device_config and addressed v3 review comments.
Signed-off-by: Xiaolin Zhang <xiaolin.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Merge updates from Andrew Morton:
- a few misc bits
- ocfs2 updates
- almost all of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits)
memory hotplug: fix comments when adding section
mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
mm: simplify nodemask printing
mm,oom_reaper: remove pointless kthread_run() error check
mm/page_ext.c: check if page_ext is not prepared
writeback: remove unused function parameter
mm: do not rely on preempt_count in print_vma_addr
mm, sparse: do not swamp log with huge vmemmap allocation failures
mm/hmm: remove redundant variable align_end
mm/list_lru.c: mark expected switch fall-through
mm/shmem.c: mark expected switch fall-through
mm/page_alloc.c: broken deferred calculation
mm: don't warn about allocations which stall for too long
fs: fuse: account fuse_inode slab memory as reclaimable
mm, page_alloc: fix potential false positive in __zone_watermark_ok
mm: mlock: remove lru_add_drain_all()
mm, sysctl: make NUMA stats configurable
shmem: convert shmem_init_inodecache() to void
Unify migrate_pages and move_pages access checks
mm, pagevec: rename pagevec drained field
...
As the page free path makes no distinction between cache hot and cold
pages, there is no real useful ordering of pages in the free list that
allocation requests can take advantage of. Juding from the users of
__GFP_COLD, it is likely that a number of them are the result of copying
other sites instead of actually measuring the impact. Remove the
__GFP_COLD parameter which simplifies a number of paths in the page
allocator.
This is potentially controversial but bear in mind that the size of the
per-cpu pagelists versus modern cache sizes means that the whole per-cpu
list can often fit in the L3 cache. Hence, there is only a potential
benefit for microbenchmarks that alloc/free pages in a tight loop. It's
even worse when THP is taken into account which has little or no chance
of getting a cache-hot page as the per-cpu list is bypassed and the
zeroing of multiple pages will thrash the cache anyway.
The truncate microbenchmarks are not shown as this patch affects the
allocation path and not the free path. A page fault microbenchmark was
tested but it showed no sigificant difference which is not surprising
given that the __GFP_COLD branches are a miniscule percentage of the
fault path.
Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With fast swap storage, the platform wants to use swap more aggressively
and swap-in is crucial to application latency.
The rw_page() based synchronous devices like zram, pmem and btt are such
fast storage. When I profile swapin performance with zram lz4
decompress test, S/W overhead is more than 70%. Maybe, it would be
bigger in nvdimm.
This patchset reduces swap-in latency by skipping swapcache if the swap
device is a synchronous device like a rw_page() based device.
It enhances by 45% my swapin test (5G sequential swapin, no readahead)
from 2.41sec to 1.64sec.
This patch (of 4):
Commit 19b7ccf865 ("block: get rid of blk_integrity_revalidate()")
fixed a weird thing (i.e., reset BDI_CAP_STABLE_WRITES flag
unconditionally whenever revalidat_disk is called) so zram doesn't need
to reset the flag any more when revalidating the bdev. Instead, set the
flag just once when the zram device is created.
It shouldn't change any behavior.
Link: http://lkml.kernel.org/r/1505886205-9671-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Hugh Dickins <hughd@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>