While the entry get in svm_range_unmap_from_cpu is the last entry, and
the entry is page fault, it also need to be dropped. So for equal case,
it also need to be dropped.
v2:
Only modify the svm_range_restore_pages.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Xiaogang Chen<xiaogang.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove unused declaration of gws_debug_workaround.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Amber Lin <amber.lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Similar to compute queue reset, flag SDMA queue reset capabilities to
user space for safe testing.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To reset hung SDMA queues on GFX 9.4+ for the GFX9 family, a soft reset
must be issued through SMU. Since soft resets will reset an entire SDMA
engine, use a common KGD call to do the reset as the KGD will handle
avoiding a reset of in flight GFX and paging queues on that engine.
In addition, create a common call for all reset types to simplify
the handling of module parameter settings that block gpu resets.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Through KFD IOCTL Fuzzing we encountered a NULL pointer derefrence
when calling kfd_queue_acquire_buffers.
Fixes: 629568d25f ("drm/amdkfd: Validate queue cwsr area and eop buffer size")
Signed-off-by: Andrew Martin <Andrew.Martin@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Andrew Martin <Andrew.Martin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
pqm_get_kernel_queue() has been unused since 2022's
commit 5bdd3eb253 ("drm/amdkfd: Remove unused old debugger
implementation")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
debugfs hang_hws is used by GPU reset test with HWS, for MES this crash
the kernel with NULL pointer access because dqm->packet_mgr is not setup
for MES path.
Skip GPU with MES for now, MES hang_hws debugfs interface will be
supported later.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If GPU in reset, destroy_queue return -EIO, pqm_destroy_queue should
delete the queue from process_queue_list and free the resource.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If HW scheduler hangs and mode1 reset is used to recover GPU, KFD signal
user space to abort the processes. After process abort exit, user queues
still use the GPU to access system memory before h/w is reset while KFD
cleanup worker free system memory and free VRAM.
There is use-after-free race bug that KFD allocate and reuse the freed
system memory, and user queue write to the same system memory to corrupt
the data structure and cause driver crash.
To fix this race, KFD cleanup worker terminate user queues, then flush
reset_domain wq to wait for any GPU ongoing reset complete, and then
free outstanding BOs.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If waiting for gpu reset done in KFD release_work, thers is WARNING:
possible circular locking dependency detected
#2 kfd_create_process
kfd_process_mutex
flush kfd release work
#1 kfd release work
wait for amdgpu reset work
#0 amdgpu_device_gpu_reset
kgd2kfd_pre_reset
kfd_process_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock((work_completion)(&p->release_work));
lock((wq_completion)kfd_process_wq);
lock((work_completion)(&p->release_work));
lock((wq_completion)amdgpu-reset-dev);
To fix this, KFD create process move flush release work outside
kfd_process_mutex.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
With GPU reset-domain worker implemented, KFD hw_exception worker is not
needed any more, just call amdgpu_amdkfd_gpu_reset directly from
kfd_hws_hang.
Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since kfd uses pasid values from graphic driver now do not need use kfd pasid
fucntions.
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If queue size is less than minimum, clamp it to minimum to prevent
underflow when writing queue mqd.
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Deprecate KFD XGMI peer info calls in favour of calling directly from
simplified XGMI peer info functions.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Even though GWS no longer exists, to maintain runtime usage for
cooperative launch, SW set legacy GWS size.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Acked-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When userspace applications call AMDKFD_IOC_UPDATE_QUEUE. Preserve
bitfields that do not need to be modified as they contain flags to
track queue states that are used by CP FW.
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the kfd_process_device_init_vm function, a valid error code is now
returned when the associated Process Address Space ID (PASID) is not
present.
If the address space virtual memory (avm) does not have an associated
PASID, the function sets the ret variable to -EINVAL before proceeding
to the error handling section. This ensures that the calling function,
such as kfd_ioctl_acquire_vm, can appropriately handle the error,
thereby preventing any issues during virtual memory initialization.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:1694 kfd_process_device_init_vm()
warn: missing error code 'ret'
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c
1647 int kfd_process_device_init_vm(struct kfd_process_device *pdd,
1648 struct file *drm_file)
1649 {
...
1690
1691 if (unlikely(!avm->pasid)) {
1692 dev_warn(pdd->dev->adev->dev, "WARN: vm %p has no pasid associated",
1693 avm);
--> 1694 goto err_get_pasid;
ret = -EINVAL?
1695 }
Fixes: 8544374c0f ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver")
Reported by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Xiaogang Chen <xiaogang.chen@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To workaround queue full h/w issue on Gfx7/8, when application create
AQL queue, the ring buffer bo allocate size is queue_size/2 and
map queue_size ring buffer to GPU in 2 pieces using 2 attachments, each
attachment map size is queue_size/2, with same ring_bo backing memory.
For Gfx7/8, user queue buffer validation should use queue_size/2 to
verify ring_bo allocation and mapping size.
Fixes: 68e599db7a ("drm/amdkfd: Validate user queue buffers")
Suggested-by: Tomáš Trnka <trnka@scm.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Curret kfd does not allocate pasid values, instead uses pasid value for each
vm from graphic driver. So should not prevent graphic driver from releasing
pasid values since the values are allocated by graphic driver, not kfd driver
anymore. This patch does not stop graphic driver release pasid values.
Fixes: 8544374c0f ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver")
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In some ASICs L2 cache info may miss in kfd topology,
because the first bitmap may be empty, that means
the first cu may be inactive, so to find the first
active cu will solve the issue.
v2: Only find the first active cu in the first xcc
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
find_system_memory() pulls out two fields from an SMBIOS type 17
device and sets them on KFD devices. The data offsets are counted
to find interesting data.
Instead use a struct representation to access the members and pull
out the two specific fields.
No intended functional changes.
Link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.8.0.pdf p99
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Link: https://lore.kernel.org/r/20250206214847.3334595-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On big and small APUs we send KFD VRAM allocations to GTT
since the carve out is either non-existent or relatively
small. However, if someone sets the carve out size to be
relatively large, we may end up using GTT rather than VRAM.
No change of logic with this patch, but it allows the
driver to determine which logic to use based on the
carve out size in the future.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
kfd_device_by_pci_dev(), kfd_get_pasid_limit() and kfd_set_pasid_limit()
have been unused since 2023's
commit c99a2e7ae2 ("drm/amdkfd: drop IOMMUv2 support")
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Current kfd driver has its own PASID value for a kfd process and uses it to
locate vm at interrupt handler or mapping between kfd process and vm. That
design is not working when a physical gpu device has multiple spatial
partitions, ex: adev in CPX mode. This patch has kfd driver use same pasid
values that graphic driver generated which is per vm per pasid.
These pasid values are passed to fw/hardware. We do not need change interrupt
handler though more pasid values are used. Also, pasid values at log are
replaced by user process pid; pasid values are not exposed to user. Users see
their process pids that have meaning in user space.
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This initializes SDMA IP version 6.1.3.
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This initializes GC IP version 11.5.3.
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It is possible for some waves in a workgroup to finish their save
sequence before the group leader has had time to capture the workgroup
barrier state. When this happens, having those waves exit do impact the
barrier state. As a consequence, the state captured by the group leader
is invalid, and is eventually incorrectly restored.
This patch proposes to have all waves in a workgroup wait for each other
at the end of their save sequence (just before calling s_endpgm_saved).
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.12.x
The destructor of a gtt bo is declared as
void amdgpu_amdkfd_free_gtt_mem(struct amdgpu_device *adev, void **mem_obj);
Which takes void** as the second parameter.
GCC allows passing void* to the function because void* can be implicitly
casted to any other types, so it can pass compiling.
However, passing this void* parameter into the function's
execution process(which expects void** and dereferencing void**)
will result in errors.
Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Fixes: fb91065851 ("drm/amdkfd: Refactor queue wptr_bo GART mapping")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The following page fault was observed duringthe KFD process release.
In this particular error case, the HIP test (./MemcpyPerformance -h)
does not require the queue. As a result, the process_context_addr was
not assigned when the KFD process was released, ultimately leading to
this page fault during the execution of the function
kfd_process_dequeue_from_all_devices().
[345962.294891] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0)
[345962.295333] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[345962.295775] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B33
[345962.296097] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[345962.296394] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[345962.296633] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x1
[345962.296876] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[345962.297135] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1
[345962.297377] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[345962.297682] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:169 vmid:0 pasid:0)
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The purpose of halt_if_hws_hang is to preserve GPU state for driver
debugging when queue preemption fails. Issuing per-queue reset may
kill wavefronts which caused the preemption failure.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.12.x
If user shader issues S_SETVSKIP then this state will persist when
executing the trap handler, causing vector instructions to be
skipped.
VSKIP state is already saved/restored through the MODE register.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Source and binary have become mismatched during branch activity.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For partial migrate from ram to vram, the migrate->cpages is not
equal to migrate->npages, should use migrate->npages to check all needed
migrate pages which could be copied or not.
And only need to set those pages could be migrated to migrate->dst[i], or
the migrate_vma_pages will migrate the wrong pages based on the migrate->dst[i].
v2:
Add mpages to break the loop earlier.
v3:
Uses MIGRATE_PFN_MIGRATE to identify whether page could be migrated.
v4:
Correct the error part.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
gfx12 derivatives will have substantially different trap handler
implementations from gfx10/gfx11. Add a separate source file for
gfx12+ and remove unneeded conditional code.
No functional change.
v2: Revert copyright date to 2018, minor comment fixes
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Cc: Jonathan Kim <jonathan.kim@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The header usr/linux/kfd_ioctl.h is a duplicate of uapi/linux/kfd_ioctl.h.
And it is actually not a file in the source code tree.
Ideally, the usr version should be updated whenever the source code is recompiled.
However, I have noticed a discrepancy between the two headers
even after rebuilding the kernel.
This commit modifies kfd_priv.h to always include the header from uapi to ensure
the latest changes are reflected. We should always include the source
code header other than the duplication.
Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If IH primary ring and KFD ih fifo overflows, we may miss CP, SDMA
interrupts and cause application soft hang. Show warning message with
ring name if overflow happens.
Add function to get ih ring name to avoid duplicating it. To keep
warning message consistent between GPU generations, change all
*_ih.c except ASICs older than Vega which has only one ih ring.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If event slot is not signaled, kfd_signal_event_interrupt goes to slow
path to scan all event slots to find the signaled event, this is needed
for old ASICs that don't have the event ID or the event IDs are
incorrect in the IH payload.
There is case that GPU signal the same event twice, then driver process
the first event interrupt, set_event and event slot is auto-reset, then
for the second event interrupt, KFD goes to slow path as event is not
signaled, just drop the second event interrupt because the application
only need wakeup once.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For CPX mode, each KFD node has interrupt worker to process ih_fifo to
send events to user space. Currently all interrupt workers of same adev
queue to same CPU, all workers execution are actually serialized and
this cause KFD ih_fifo overflow when CPU usage is high.
Use per-GPU unbounded highpri queue with number of workers equals to
number of partitions, let queue_work select the next CPU round robin
among the local CPUs of same NUMA.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After GPU page fault, there are lots of page fault interrupts generated
at short period even with CAM filter enabled because the fault address
is different. Each page fault copy to KFD ih fifo to send event to user
space by KFD interrupt worker, this could cause KFD ih fifo overflow
while other processes generate events at same time.
KFD process is aborted after GPU page fault, we only need one GPU page
fault interrupt sent to KFD ih fifo to send memory exception event to
user space.
Incease KFD ih fifo size to 2 times of IH primary ring size, to handle
the burst events case.
This patch handle the gfx v9 path, cover retry on/off and CAM filter
on/off cases.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To handle 40000 to 80000 interrupts per second running CPX mode with 4
streams/queues per KFD node, KFD interrupt handler becomes the
performance bottleneck.
Remove the kfifo_out memcpy overhead by accessing ih_fifo data in-place
and updating rptr with kfifo_skip_count.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch checks and warns if pdd is NULL.
Signed-off-by: Andrew Martin <Andrew.Martin@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This a pointer that is being passed into other functions, so it is best to
initialize it to NULL prior.
Signed-off-by: Andrew Martin <Andrew.Martin@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update cwsr area size for gfx950 to fit the new user queue buffer validation.
The size of LDS calculation is referred from gfx950 thunk implementation.
Signed-off-by: Le Ma <le.ma@amd.com>
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The gfx-9 trap handler is reading LDS allocation size in 256 bytes
granularity (from SQ_WAVE_LDS_ALLOC), but it using the assumption that
this value is always even (i.e. the LDS allocation is really done in
multiple of 512 bytes). This was true so far, but gfx-950 allocates LDS
in chunks of 1280 bytes, making this assumption invalid. This can cause
the trap handler to try to save / restore past the end of LDS, and past
the LDS allocated slot in the save are, overriding data from the
following wave.
This patch updates the trap handler to support LDS allocated in 1280
bytes blocks:
- During restore, copy from main memory directly to LDS in batch of 1280
bytes.
- During save, continue to use 512 bytes blocks (we only have 2 VGPRs we
can use to hold data), making sure to mask the upper half of the wave
when handling when the LDS size is not a multiple of 512 bytes.
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Co-authored-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In gfx950, the SQ_WAVE_LDS_ALLOC.LDS_SIZE field is extended to bits 12
to 22. The LDS_SIZE granularity remains unchanged (units of 64 dwords,
or 256 bytes). This patch adjusts the CWSR trap handler to read the
full extent of LDS_SIZE.
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instruction modifiers of the untyped vector memory buffer instructions
(MUBUF encoded) changed in gfx940. The slc, scc and glc modifiers have
been replaced with sc0, sc1 and nt.
The current CWSR trap handler is written using pre-gfx940 modifier
names, making the source incompatible with a strict gfx940 assembler.
This patch updates the cwsr_trap_handler_gfx9.s source file to be
compatible with all gfx9 variants of the ISA. The binary assembled code
is unchanged (so the behaviour is unchanged as well), only the source
representation is updated.
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Initial support for GC 9.5.0.
v2: squash in pqm_clean_queue_resource() fix from Lijo
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update mtype flags to meet gfx 9.5.0 requirements for remote GPU
memory and system memory.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GC 9.5.0 local memory MTYPE default should be set as RW.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For better readability. Also leftover orphaned code.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To have user better understand the causes triggering runlist oversubscription.
No function change.
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Before scheduling a recovery due to scheduler/job hang, check if a RAS
error is detected. If so, choose RAS recovery to handle the situation. A
scheduler/job hang could be the side effect of a RAS error. In such
cases, it is required to go through the RAS error recovery process. A
RAS error recovery process in certains cases also could avoid a full
device device reset.
An error state is maintained in RAS context to detect the block
affected. Fatal Error state uses unused block id. Set the block id when
error is detected. If the interrupt handler detected a poison error,
it's not required to look for a fatal error. Skip fatal error checking
in such cases.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When using MES creating a pdd will require talking to the GPU to
setup the relevant context. The code here forgot to wake up the GPU
in case it was in suspend, this causes KVM to EFAULT for passthrough
GPU for example. This issue can be masked if the GPU was woken up by
other things (e.g. opening the KMS node) first and have not yet gone to sleep.
v4: do the allocation of proc_ctx_bo in a lazy fashion
when the first queue is created in a process (Felix)
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Yunxiang Li <Yunxiang.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
This information is not available in ip discovery table.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: David Belanger <david.belanger@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
This information is not available in ip discovery table.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: David Belanger <david.belanger@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
In the function pqm_uninit there is a call-assignment of "pdd =
kfd_get_process_device_data" which could be null, and this value was
later dereferenced without checking.
Fixes: fb91065851 ("drm/amdkfd: Refactor queue wptr_bo GART mapping")
Signed-off-by: Andrew Martin <Andrew.Martin@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Cacheline size is not available in IP discovery for gc943,gc944.
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Add MEC version from which alternate support for no PCIe atomics
is provided so that device is not skipped during KFD device init in
GFX1200/GFX1201.
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
Write pointer could be 32-bit or 64-bit. Use the correct size during
initialization.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
In a consecutive packet submission, for example unmap and query status,
when CP is reading wptr caused by unmap packet doorbell ring, if in some
case CP operates slower (e.g. doorbell_mode=1) and wptr has been updated
to next packet (query status), but the query status packet content has
not been flushed to memory yet, it will cause CP fetched stalled data.
Adding mb to ensure ring buffer has been updated before updating wptr.
Also adding a mb to ensure wptr updated before doorbell ring.
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add back kfd queues in start scheduling that originally been
removed on stop scheduling.
Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
kfd process kref count(process->ref) is initialized to 1 by kref_init. After
it is created not need to increase its kref. Instad add kfd process kref at kfd
process mmu notifier allocation since we already decrease the kref at
free_notifier of mmu_notifier_ops, so pair them.
When user process opens kfd node multiple times the kfd process kref is
increased each time to balance with kfd node close operation.
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In kfd_procfs_show(), the sdma_activity_work_handler is a local variable
and the sdma_activity_work_handler.sdma_activity_work should initialize
with INIT_WORK_ONSTACK() instead of INIT_WORK().
Fixes: 32cb59f313 ("drm/amdkfd: Track SDMA utilization per process")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GFX 12 does not require a page size cap for the trap handler because
it does not require a CWSR work around like GFX 11 did.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: David Belanger <david.belanger@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The `kfd_get_cu_occupancy` function previously declared a large
`cu_occupancy` array as a local variable, which could lead to stack
overflows due to excessive stack usage. This commit replaces the static
array allocation with dynamic memory allocation using `kcalloc`,
thereby reducing the stack size.
This change avoids the risk of stack overflows in kernel space, in
scenarios where `AMDGPU_MAX_QUEUES` is large. The allocated memory is
freed using `kfree` before the function returns to prevent memory
leaks.
Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c: In function ‘kfd_get_cu_occupancy’:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:322:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
322 | }
| ^
Fixes: 6ae9e1aba9 ("drm/amdkfd: Update logic for CU occupancy calculations")
Cc: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add an interface to query whether KFD has any active queues.
v2: fix build issues
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Flag KFD support for per-queue reset on GFX9 devices.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Host drivers can create partial hives per guest by disabling xgmi sharing
between certain peers in the main hive.
Typically, these partial hives are fully connected per guest session.
In the event that the host makes a mistake by adding a non-shared node
to a guest session, have the KFD reflect sharing disabled by severing
the IO link.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: James Yao <yiqing.yao@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
as the adding of mb() should be sufficient in function unmap_queues_cpsch,
remove the add of volatile type as recommended
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit a3ab2d45b9.
The userspace side for this code is not ready yet so revert
for now.
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Philip Yang <Philip.Yang@amd.com>
make sure KFD_FENCE_INIT write to fence_addr before pm_send_query_status
called, to avoid qcm fence timeout caused by incorrect ordering.
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Process device data pdd->vram_usage is read by rocm-smi via sysfs, this
is currently missing the svm_bo usage accounting, so "rocm-smi
--showpids" per process VRAM usage report is incorrect.
Add pdd->vram_usage accounting when svm_bo allocation and release,
change to atomic64_t type because it is updated outside process mutex
now.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add new SMI event to report the dropped event count.
When the event kfifo is full, drop count is not zero, or no enough space
left to store the event message, increase drop count.
After reading event out from kfifo, if event was dropped, drop_count is
not zero, generate a dropped event record and reset drop count to zero.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
get_wave_state is not defined for sdma queue, copy_context_work_handler
calls it for sdma queue will crash.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SMI event fifo size 1KB was enough to report GPU vm fault or reset
event, but could drop the more frequent SVM migration events. Increase
kfifo size to 8KB to store about 100 migrate events, less chance to drop
the migrate events if lots of migration happened in the short period of
time. Add KFD prefix to the macro name.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If page migration failed, also output migrate end event to match with
migrate start event, with failure error_code added to the end of the
migrate message macro. This will not break uAPI because application uses
old message macro sscanf drop and ignore the error_code.
Output GPU page fault restore end event if migration failed.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Only creating a new reference for each process instead of each VM.
Fixes: 9a1c1339ab ("drm/amdkfd: Run restore_workers on freezable WQs")
Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
struct file *f is unused in queue creation, remove it.
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
program SDMAx_QUEUEx_SCHEDULE_CNTL for context switch due to
quantum in KFD for GFX12.
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
Make CU occupancy calculations work on GFX 9.4.3 by
updating the logic to handle multiple XCCs correctly.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently, the code uses the IH_VMID_X_LUT register to map
a queue's vmid to the corresponding PASID. This logic is racy
since CP can update the VMID-PASID mapping anytime especially
when there are more processes than number of vmids. Update the
logic to calculate CU occupancy by matching doorbell offset of
the queue with valid wave counts against the process's queues.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Variable hub_inst is unused.
Fixes: e28604d833 ("drm/amdkfd: Drop poison hanlding from gfx v10")
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We were removing the kernfs entry for queue info before checking if the
queue could be destroyed. If it failed to get destroyed (e.g. during
some GPU resets), then we would try to delete it later during pqm
teardown, but the file was already removed. This led to a kernel WARN
trying to remove size, gpuid and type. Move the remove to after the
destroy check.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instead of trying to use close_fd() on failure exits, just have
criu_get_prime_handle() store the file reference without inserting
it into descriptor table.
Then, once the callers are past the last failure exit, they can go
and either insert all those file references into the corresponding
slots of descriptor table, or drop all those file references and
free the unused descriptors.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To avoid memory leaks, release q_extra_data when exiting the restore queue.
v2: Correct the proto (Alex)
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Document how to use SMI system management interface to enable and
receive SVM events. Document SVM event triggers.
Define SVM events message string format macro that could be used by user
mode for sscanf to parse the event. Add it to uAPI header file to make
it obvious that is changing uAPI in future.
No functional changes.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Driver mode-2 is only supported by relative new
smc firmware.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If a queue is being destroyed but causes a HWS hang on removal, the KFD
may issue an unnecessary gpu reset if the destroyed queue can be fixed
by a queue reset.
This is because the queue has been removed from the KFD's queue list
prior to the preemption action on destroy so the reset call will fail to
match the HQD PQ reset information against the KFD's queue record to do
the actual reset.
To fix this, deactivate the queue prior to preemption since it's being
destroyed anyways and remove the queue from the KFD's queue list after
preemption.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Enables users to update SVM's default granularity, used in
buffer migration and handling of recoverable page faults.
Param value is set in terms of log(numPages(buffer)),
e.g. 9 for a 2 MIB buffer
Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Populate cache line size info in topology based on information from IP
discovery table.
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Sreekant Somasekharan <Sreekant.Somasekharan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ih1 is not initialized for APUs. Don't drain it or NULL pointer
error will be triggered.
Fixes: 6ef29715ac ("drm/amdkfd: Change kfd/svm page fault drain handling")
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When app unmap vm ranges(munmap) kfd/svm starts drain pending page fault and
not handle any incoming pages fault of this process until a deferred work item
got executed by default system wq. The time period of "not handle page fault"
can be long and is unpredicable. That is advese to kfd performance on page
faults recovery.
This patch uses time stamp of incoming page fault to decide to drop or recover
page fault. When app unmap vm ranges kfd records each gpu device's ih ring
current time stamp. These time stamps are used at kfd page fault recovery
routine.
Any page fault happened on unmapped ranges after unmap events is application
bug that accesses vm range after unmap. It is not driver work to cover that.
By using time stamp of page fault do not need drain page faults at deferred
work. So, the time period that kfd does not handle page faults is reduced
and can be controlled.
Signed-off-by: Xiaogang.Chen <Xiaogang.Chen@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This needs to be set to 1 to avoid a potential deadlock in
the GC 10.x and newer. On GC 9.x and older, this needs
to be set to 0. This can lead to hangs in some mixed
graphics and compute workloads.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3575
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Not supported.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Traditional utcl2 fault_status polling does not
work in SRIOV environment. The polling of fault
status register from guest side will be dropped
by hardware.
Driver should switch to check utcl2 interrupt
source id to identify utcl2 poison event. It is
set to 1 when poisoned data interrupts are
signaled.
v2: drop the unused local variable (Tao)
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Based on the recommendation of MEC FW, update BadOpcode interrupt
handling by unmapping all queues, removing the queue that got the
interrupt from scheduling and remapping rest of the queues back when
using MES scheduler. This is done to prevent the case where unmapping
of the bad queue can fail thereby causing a GPU reset.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
MEC FW expects MES to unmap all queues when a VM fault is observed
on a queue and then resumed once the affected process is terminated.
Use the MES Suspend and Resume APIs to achieve this.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When amdgpu enable enforce_isolation, KFD enables single-process mode in
HWS and sets exec_cleaner_shader bit in MAP_PROCESS.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Provide amdgpu_amdkfd_stop_sched() for amdgpu to stop KFD scheduling
compute work on HIQ. amdgpu_amdkfd_start_sched() resumes the scheduling.
When amdgpu_amdkfd_stop_sched is called, KFD will unmap queues from
runlist. If users send ioctls to KFD to create queues, they'll be added
but those queues won't be mapped to runlist (so not scheduled) until
amdgpu_amdkfd_start_sched is called.
v2: fix build (Alex)
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If there are multiple nodes per kfd device, add nodeid to location_id to
differentiate.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add helper function kfd_queue_unreference_buffers to reduce queue buffer
refcount, separate it from release queue buffers.
Because it is circular locking to hold dqm_lock to take vm lock,
kfd_ioctl_destroy_queue should take vm lock, unreference queue buffers
first, but not release queue buffers, to handle error in case failed to
hold vm lock. Then hold dqm_lock to remove queue from queue list and
then release queue buffers.
Restore process worker restore queue hold dqm_lock, will always find
the queue with valid queue buffers.
v2 (Felix):
- renamed kfd_queue_unreference_buffer(s) to kfd_queue_unref_bo_va(s)
- added two FIXME comments for follow up
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When users dynamically set the partition mode through sysfs writes,
this can lead to a double lock situation where the KFD is trying to take
the partition lock when updating the recommended SDMA engines.
Have the KFD reference its saved socket device number count instead.
Also ensure we have enough SDMA xGMI engines to report the recommended
engines in the first place.
Fixes: e06b71b231 ("drm/amdkfd: allow users to target recommended SDMA engines")
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The number of watchpoints should be set and constrained per logical
partition device, not by the socket device.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Support per-queue reset for GFX9. The recommendation is for the driver
to target reset the HW queue via a SPI MMIO register write.
Since this requires pipe and HW queue info and MEC FW is limited to
doorbell reports of hung queues after an unmap failure, scan the HW
queue slots defined by SET_RESOURCES first to identify the user queue
candidates to reset.
Only signal reset events to processes that have had a queue reset.
If queue reset fails, fall back to GPU reset.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes the below if kernel config not enable HMM support
>> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:107:26: error:
implicit declaration of function 'svm_range_from_addr'
>> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:107:24: error:
assignment to 'struct svm_range *' from 'int' makes pointer from integer
without a cast [-Wint-conversion]
>> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:111:28: error:
invalid use of undefined type 'struct svm_range'
Fixes: b049504e21 ("drm/amdkfd: Validate user queue svm memory residency")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202407252127.zvnxaKRA-lkp@intel.com/
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Certain GPUs have better copy performance over xGMI on specific
SDMA engines depending on the source and destination GPU.
Allow users to create SDMA queues on these recommended engines.
Close to 2x overall performance has been observed with this
optimization.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When creating KFD user compute queue, check if queue eop buffer size,
cwsr area size, ctl stack size equal to the size of KFD node
properities.
Check the entire cwsr area which may split into multiple svm ranges
aligned to granularity boundary.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the queue eop buffer size, cwsr area size, ctl stack size
calculation from Thunk, store the value to KFD node properties.
Those will be used to validate queue eop buffer size, cwsr area size,
ctl stack size when creating KFD user compute queue.
Those will be exposed to user space via sysfs KFD node properties, to
remove the duplicate calculation code from Thunk.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ensure update queue new ring buffer is mapped on GPU with correct size.
Decrease queue old ring_bo queue_refcount and increase new ring_bo
queue_refcount.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Queue CWSR area maybe registered to GPU as svm memory, create queue to
ensure svm mapped to GPU with KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag.
Add queue_refcount to struct svm_range, to track queue CWSR area usage.
Because unmap mmu notifier callback return value is ignored, if
application unmap the CWSR area while queue is active, pr_warn message
in dmesg log. To be safe, evict user queue.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap
BO from the GPU if the bo_va queue_refcount is not zero.
Create queue to increase the bo_va queue_refcount, destroy queue to
decrease the bo_va queue_refcount, to ensure the queue buffers mapped on
the GPU when queue is active.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Find user queue rptr, ring buf, eop buffer and cwsr area BOs, and
check BOs are mapped on the GPU with correct size and take the BO
reference.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add helper function kfd_queue_acquire_buffers to get queue wptr_bo
reference from queue write_ptr if it is mapped to the KFD node with
expected size.
Add wptr_bo to structure queue_properties because structure queue is
allocated after queue buffers are validated, then we can remove wptr_bo
parameter from pqm_create_queue.
Rename structure queue wptr_bo_gart to hold wptr_bo reference for GART
mapping and umapping. Move MES wptr_bo_gart mapping to init_user_queue,
the same location with queue ctx_bo GART mapping.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pass pointer reference to amdgpu_bo_unref to clear the correct pointer,
otherwise amdgpu_bo_unref clear the local variable, the original pointer
not set to NULL, this could cause use-after-free bug.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter
instead of adev, to support spatial partition. This is only used by CRIU
checkpoint restore now. No functional change.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
to avoid reading wrong WPTR from doorbell in sriov vf, set
CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD.
Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add amdgpu ras POSION_CONSUMPTION event id support.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Enable KFD setting SDMA info for SDMA 6.1.2.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Convert some pr_* to some dev_* APIs to identify the device.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We recently added locking to add_queue_mes() but this error path was
overlooked. Add an unlock to the error path.
Fixes: 1802b042a3 ("drm/amdgpu/kfd: remove is_hws_hang and is_resetting")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Per firmware's requirement, replace mode2 with mode1.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In commit fda812ebe3 ("drm/amdkfd: gfx12 context save/restore trap handler fixes")
the following fix was introduced but incorrectly restricted to gfx12.
The same issue and a corresponding fix apply to gfx10 and gfx11.
Do not overwrite TRAPSTS.{SAVECTX,HOST_TRAP} when restoring this
register. Both of these fields can assert while the wavefront is
running the trap handler.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need to take the reset domain lock before talking to MES. While in
this case we can take the lock inside the mes helper. We can't do so for
most other mes helpers since they are used during reset. So for
consistency sake we add the lock here.
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
is_hws_hang and is_resetting serves pretty much the same purpose and
they all duplicates the work of the reset_domain lock, just check that
directly instead. This also eliminate a few bugs listed below and get
rid of dqm->ops.pre_reset.
kfd_hws_hang did not need to avoid scheduling another reset. If the
on-going reset decided to skip GPU reset we have a bad time, otherwise
the extra reset will get cancelled anyway.
remove_queue_mes forgot to check is_resetting flag compared to the
pre-MES path unmap_queue_cpsch, so it did not block hw access during
reset correctly.
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
idr_for_each_entry can ensure that mem is not empty during the loop.
So don't need check mem again.
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
reset cause is requested by customer as additional
info for gpu reset smi event.
v2: integerate reset sources suggested by Lijo Lazar
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
kfd_create_vcrat_image_gpu itself checks the avail_size at the start.
So the value of avail_size is at least VCRAT_SIZE_FOR_GPU(16384),
minus struct crat_header(40UL) and struct crat_subtype_compute(40UL) it cannot be less than 0.
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The expression caps | HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED
and caps | HSA_CAP_TRAP_DEBUG_PRECISE_ALU_OPERATIONS_SUPPORTED
are always 1/true regardless of the values of its operand.
Fixes: 9243240bed ("drm/amdkfd: enable single alu ops for gfx12")
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To fix the warning about unused value,
remove the use_static and use the parameter is_static directly.
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
A wavefront may deallocate its VGPRs at the end of a program while
waiting for memory transactions to complete. If it subsequently
receives a context save exception it will be unable to save,
since this requires VGPRs. In this case the trap handler should
terminate the wavefront.
Fixes intermittent VM faults under context switching load.
V2: Use S_ENDPGM instead of S_ENDPGM_SAVED for performance counters
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The varible uncached set false, the condition uncached cannot be true.
So remove the dead code, mapping flags will set the flag AMDGPU_VM_MTYPE_UC in else.
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix LDS size interpretation: 512 bytes (>= gfx12) vs 256 (< gfx12).
Ensure STATE_PRIV.BARRIER_COMPLETE cannot change after reading or
before writing. Other waves in the threadgroup may cause this field
to assert if they complete the barrier.
Do not overwrite EXCP_FLAG_PRIV.{SAVE_CONTEXT,HOST_TRAP} when
restoring this register. Both of these fields can assert while the
wavefront is running the trap handler.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Newer assemblers reject S_WAITCNT. All instances of S_WAITCNT can be
replaced by S_WAITCNT 0 (< gfx12) or S_WAIT_IDLE (>= gfx12) since
there is no concurrency of different memory instruction classes.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Source and binary have become mismatched during branch activity.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
With commit 89773b8559
("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs")
big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU
is set for both big and small APUs, we can simplify the checks in
the code.
v2: clean up a few more places (Lang)
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 28ebbb4981.
Revert this commit as apparently the LLVM code to take advantage of
this never landed.
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Feifei Xu <feifei.xu@amd.com>
It was an enablement vehicle for MES 11 and was never
productized. Remove it.
v2: drop additional checks in the GFX10 code.
v3: drop mes_api_def.h
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GFX1201 was missed in the commit below. Adding it in.
Fixes: 628e1ace23 ("drm/amdkfd: mark GFX12 system and peer GPU memory mappings as MTYPE_NC")
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
gpu_id needs to be unique for user space to identify GPUs via KFD
interface. In the current implementation there is a very small
probability of having non unique gpu_ids.
v2: Add check to confirm if gpu_id is unique. If not unique, find one
Changed commit header to reflect the above
v3: Use crc16 as suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Ensure that gpu_id != 0
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No functional change. This will help in moving gpu_id creation to next
step while still being able to identify the correct GPU
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We are incorrectly passing the first XCC's MQD when
updating CU masks for other XCCs in the partition. Fix
this by passing the MQD for the XCC currently being
updated with CU mask to update_cu_mask function.
Fixes: fc6efed2c7 ("drm/amdkfd: Update CU masking for GFX 9.4.3")
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Analysis of code by Coverity, a static code analyser, has identified
a resource leak in the symbol hmm_range. This leak occurs when one of
the prior steps before it is released encounters an error.
Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On system with khugepaged enabled and user cases with THP buffer, the
hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary
timeout value is not accurate, cause memory allocation failure.
Remove the arbitrary timeout value, return EAGAIN to application if
hmm_range_fault return EBUSY, then userspace libdrm and Thunk will call
ioctl again.
Change EAGAIN to debug message as this is not error.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently oem_id is defined as uint8_t[6] and casted to uint64_t*
in some use case. This would lead code scanner to complain about
access beyond. Re-define it in union to enforce 8-byte size and
alignment to avoid potential issue.
Signed-off-by: Michael Chen <michael.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We don't get the right offset in that case. The GPU has
an unused 4K area of the register BAR space into which you can
remap registers. We remap the HDP flush registers into this
space to allow userspace (CPU or GPU) to flush the HDP when it
updates VRAM. However, on systems with >4K pages, we end up
exposing PAGE_SIZE of MMIO space.
Fixes: d8e408a827 ("drm/amdkfd: Expose HDP registers to user space")
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
In interrupt context, write dbg_ev_file will be run by work queue. It
will cause write dbg_ev_file execution after debug_trap_disable, which
will cause NULL pointer access.
v2: cancel work "debug_event_workarea" before set dbg_ev_file as NULL.
Signed-off-by: Lin.Cao <lincao12@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit c37ce764cd.
RCCL library is currently not treating spatial partitions differently,
hence this change is causing issues. Revert temporarily till RCCL
implementation is ready for spatial partitions.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Small APUs(i.e., consumer, embedded products) usually have a small
carveout device memory which can't satisfy most compute workloads
memory allocation requirements.
We can't even run a Basic MNIST Example with a default 512MB carveout.
https://github.com/pytorch/examples/tree/main/mnist. Error Log:
"torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate
84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes
is free. Of the allocated memory 103.83 MiB is allocated by PyTorch,
and 22.17 MiB is reserved by PyTorch but unallocated"
Though we can change BIOS settings to enlarge carveout size,
which is inflexible and may bring complaint. On the other hand,
the memory resource can't be effectively used between host and device.
The solution is MI300A approach, i.e., let VRAM allocations go to GTT.
Then device and host can flexibly and effectively share memory resource.
v2: Report local_mem_size_private as 0. (Felix)
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Due to a HW bug, the system memory mappings and peer GPU mappings
on GFX12 need to be marked as MTYPE_NC.
Cc: Joe Greathouse <joseph.greathouse@amd.com>
Cc: David Belanger <david.belanger@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add new GFX12 PTE flag AMDGPU_PTE_IS_PTE to svm_range_get_pte_flags
function. This resolves the issues related to SVM enablement in GFX12.
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Enable flag in KFD and set the atomic support bit in MQD.
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
mqd_stride function in gfx v12 is not implemented, that
causes NULL ptr error. Add the generic func to fix it.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GFX12 debugging requires setting up precise ALU operation for catching
ALU exceptions.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: Lancelot Six <lancelot.six@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Similar to GFX11, always enable the setup of trap temporaries on GFX12.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Updated switch statement to use GFX12 trap handler.
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When trap_ctrl.trap_after_inst is set, it is possible for a wave to
enter the trap handler, after single-stepping an instruction and a
save_context is raised, with only save_context set in excp_flag_priv.
Because excp_flag_priv.trap_after_inst is not reliably set, we need to
use the missed single-step workaround for gfx12 as well.
Also add wave_start and wave_end as exceptions that should be handled
by the 2nd level trap handler.
Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Tested-by: Lancelot Six <lancelot.six@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add support to save and restore the work group barrier state in gfx12
CWSR trap handler.
There is no support to directly restore the signal count of a barrier
state, so instead this patch repeatedly calls s_barrier_signal to
increment the signal count to the desired value.
In this patch, I have implemented the logic to restore the barrier at
the end of the block restoring the HWREGs. This process needs to be
done by exactly 1 wave per work group. To achieve this, the initial
value of s_restore_spi_init_hi (containing a FIRST_WAVE bit) needs to be
saved up until that point. An alternative could be restore the barrier
earlier in the process (around when LDS is restored, as the same wave
does both). Doing this would break the pattern that the restore
procedure follows the CWSR area layout.
Before restoring the barrier, this patch checks if the barrier was whose
state was saved has the "valid" bit set, even if I don't think this
barrier can be in an invalid state during context save. I expect this
test to always be true.
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
- HWREG changes since gfx11
- Save/restore barrier state
- get_wave_size is now reserved by assembler
v2: rebase (Alex)
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No functional change. Preparation for gfx12 support.
v2: drop unrelated change (Alex)
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Initial implementation, based on GFX11.
v2: Removed functions not needed by cp scheduler.
v3: Fixed typos.
v4: squash in warning fix (Alex)
Signed-off-by: David Belanger <david.belanger@amd.com>
Acked-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Initial implementation, based on GFX11.
v2: squash in include fix from David (Alex)
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Initial implementation, based on GFX11.
v2: Removed dbg_wa code as not needed on GFX12.
v3: squash in SDMA queue fixes (Alex)
v4: rebase (Alex)
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Added cases for GFX12 in switch statement, code relying on GFX11
implementation until GFX12 implementation is complete.
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Added GFX12 support to a few switch statements.
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add gfx v9_4_4 ip block support
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add sdma v4_4_5 ip block support
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is a race condition when re-creating a kfd_process for a process.
This has been observed when a process under the debugger executes
exec(3). In this scenario:
- The process executes exec.
- This will eventually release the process's mm, which will cause the
kfd_process object associated with the process to be freed
(kfd_process_free_notifier decrements the reference count to the
kfd_process to 0). This causes kfd_process_ref_release to enqueue
kfd_process_wq_release to the kfd_process_wq.
- The debugger receives the PTRACE_EVENT_EXEC notification, and tries to
re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE).
- When handling this request, KFD tries to re-create a kfd_process.
This eventually calls kfd_create_process and kobject_init_and_add.
At this point the call to kobject_init_and_add can fail because the
old kfd_process.kobj has not been freed yet by kfd_process_wq_release.
This patch proposes to avoid this race by making sure to drain
kfd_process_wq before creating a new kfd_process object. This way, we
know that any cleanup task is done executing when we reach
kobject_init_and_add.
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Queue buffer, though it is in system memory, has to be created using the
correct amdgpu device. Enforce this as the BO needs to mapped to the
GART for MES Hardware scheduler to access it.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Replace tmz flag into buffer flag to make it easier to understand
and extend
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Do VRAM accounting when doing migrations to vram to make sure
there is enough available VRAM and migrating to VRAM doesn't evict
other possible non-unified memory BOs. If migrating to VRAM fails,
driver can fall back to using system memory seamlessly.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Handle the case that the restore worker was already scheduled by another
eviction while the restore was in progress.
Fixes: 9a1c1339ab ("drm/amdkfd: Run restore_workers on freezable WQs")
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Yunxiang Li <Yunxiang.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's not really an error since the devices don't support
the necessary hardware functionality.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3331
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmYlapoeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGJLgH/RFrsXefpxAWACJv
wRmIx/YFjI3Z4nypBZPXysX/JOuf/cf4bR22jMHC1vhQDtxVgB7eUpEYx0C0Y5RD
ll7cunAZCpP+PB725/bOWbaF94eigl8xC4RrKPon4tG1CS9NAvCJFDyNgWpqh48s
sSKKae68a16zMtIX5Za8nvmsBQOeD67ZFBgI4m7wIDDhTXY3XlG84r8Rz/ZwcdE/
rExFkDMUqjpoJVAMBtETQNk4pvKZVMrW2B0f/xZSR+IKXw5l2FO0raXkIG/6TERs
CxJvCuUjKIosqFLIK2O/cU9G+5V2axZ/WD8YRwDr2vlnnOPht+dFQzyPcuSqezbG
ShIhQo8=
=GI5F
-----END PGP SIGNATURE-----
Backmerge tag 'v6.9-rc5' into drm-next
Linux 6.9-rc5
I've had a persistent msm failure on clang, and the fix is in fixes
so just pull it back to fix that.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Handle case that dma_fence_get_rcu_safe returns NULL.
If restore work is already scheduled, only update its timer. The same
work item cannot be queued twice, so undo the extra queue eviction.
Fixes: 9a1c1339ab ("drm/amdkfd: Run restore_workers on freezable WQs")
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Gang BA <Gang.Ba@amd.com>
Reviewed-by: Gang BA <Gang.Ba@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
mode-2 reset is the only reliable method that can get
GC/SDMA back when poison is consumed. mmhub requires
mode-1 reset.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix memory leak due to a leaked mmget reference on an error handling
code path that is triggered when attempting to create KFD processes
while a GPU reset is in progress.
Fixes: 0ab2d7532b ("drm/amdkfd: prepare per-process debug enable and disable")
CC: Xiaogang Chen <xiaogang.chen@amd.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Tested-by: Harish Kasiviswanthan <Harish.Kasiviswanthan@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Fix memory leak due to a leaked mmget reference on an error handling
code path that is triggered when attempting to create KFD processes
while a GPU reset is in progress.
Fixes: 0ab2d7532b ("drm/amdkfd: prepare per-process debug enable and disable")
CC: Xiaogang Chen <xiaogang.chen@amd.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Tested-by: Harish Kasiviswanthan <Harish.Kasiviswanthan@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If there are more than one device doing reset in parallel, the first
device will call kfd_suspend_all_processes() to evict all processes
on all devices, this call takes time to finish. other device will
start reset and recover without waiting. if the process has not been
evicted before doing recover, it will be restored, then caused page
fault.
Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently, with F32 HWS GPU reset is only when unmap queue fails.
However, if compute queue doesn't repond to preemption request in time
unmap will return without any error. In this case, only preemption error
is logged and Reset is not triggered. Call GPU reset in this case also.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
If there are more than one device doing reset in parallel, the first
device will call kfd_suspend_all_processes() to evict all processes
on all devices, this call takes time to finish. other device will
start reset and recover without waiting. if the process has not been
evicted before doing recover, it will be restored, then caused page
fault.
Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Due to a CP interrupt bug, bad packet garbage exception codes are raised.
Do a range check so that the debugger and runtime do not receive garbage
codes.
Update the user api to guard exception code type checking as well.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
TLB flush after unmap accidentially was removed on
gfx9.4.2. It is to add it back.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Check cgroup permissions when returning DMA-buf info and
based on cgroup info return the GPU id of the GPU that have
access to the BO.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently, with F32 HWS GPU reset is only when unmap queue fails.
However, if compute queue doesn't repond to preemption request in time
unmap will return without any error. In this case, only preemption error
is logged and Reset is not triggered. Call GPU reset in this case also.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Destroy the high priority workqueue that handles interrupts
during KFD node cleanup.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Due to a CP interrupt bug, bad packet garbage exception codes are raised.
Do a range check so that the debugger and runtime do not receive garbage
codes.
Update the user api to guard exception code type checking as well.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
TLB flush after unmap accidentially was removed on
gfx9.4.2. It is to add it back.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Check cgroup permissions when returning DMA-buf info and
based on cgroup info return the GPU id of the GPU that have
access to the BO.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Each RAS block has different requirement for gpu reset in poison
consumption handling.
Add support for mmhub RAS poison consumption handling.
v2: remove the mmhub poison support for kfd int v10.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Support the query for both gfxhub and mmhub, also replace
xcc_id with hub_inst.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds the following functionality:
- Check the queue preemption status on all XCDs in a partition
for GFX 9.4.3.
- Update the queue preemption debug message to print the queue
doorbell id for which preemption failed.
- Change the signature of check preemption failed function to
return a bool instead of uint32_t and pass the MQD manager
as an argument.
Suggested-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rename read_doorbell_id function to a more meaningful name,
implying what it is used for. No functional change.
Suggested-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Replace it with related interface in gfxhub functions.
v2: replace node id with xcc id.
get node id for query_utcl2_poison_status
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since commit 43a7206b09 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the kfd_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Similarly to gfx9, gfx10.1 drops vector stores when an xnack error is
raised. To work around this issue, use scalar stores instead of vector
stores when trapsts.xnack_error == 1.
Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In a future commit, the cwsr trap handler code size for gfx10.1 will
increase to slightly above the one page mark. Since the TMA does not
need to be page aligned, and only 2 pointers are stored in it, push
the TMA offset by 2 KiB and keep the TBA+TMA reserved memory size
to two pages.
Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its uasge is
reference counted.
- introducing two new helper funcs for task_info lifecycle management
- amdgpu_vm_get_task_info: reference counts up task_info before
returning this info
- amdgpu_vm_put_task_info: reference counts down task_info
- last put to task_info() frees task_info from the vm.
This patch also does logistical changes required for existing usage
of vm->task_info.
V2: Do not block all the prints when task_info not found (Felix)
V3: Fixed review comments from Felix
- Fix wrong indentation
- No debug message for -ENOMEM
- Add NULL check for task_info
- Do not duplicate the debug messages (ti vs no ti)
- Get first reference of task_info in vm_init(), put last
in vm_fini()
V4: Fixed review comments from Felix
- fix double reference increment in create_task_info
- change amdgpu_vm_get_task_info_pasid
- additional changes in amdgpu_gem.c while porting
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The adev can be found from bo by amdgpu_ttm_adev(bo->tbo.bdev),
and adev is also not used in the function
amdgpu_amdkfd_map_gtt_bo_to_gart().
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On devices which have multi-partition nodes, keep partition id in
location_id[31:28].
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If fatal error is detected, packet submission won't go through. Return
error in such cases. Also, avoid waiting for fence when fatal error is
detected.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Prevent dropping the KFD process reference at the end of a debug
IOCTL call where the acquired process value is an error.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.
By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it much
less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.
v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier
v3:
- Move VM max PFN calculation into AMDGPU_VA_RESERVED macros
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Gfx11 debug flags mask is currently set with an implicit assumption that
no other mqd update flags exist. This needs to be fixed with newly
introduced flag UPDATE_FLAG_IS_GWS by the previous patch.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In certain cooperative group dispatch scenarios the default SPI resource
allocation may cause reduced per-CU workgroup occupancy. Set
COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang
scenarions.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Suggested-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Its currently incorrectly multiplied by number of XCCs in the partition
Fixes: be457b2252 ("drm/amdkfd: Update cache info for GFX 9.4.3")
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Call the 2nd level trap handler if the cwsr handler is entered with any
one of wave_start, wave_end, or trap_after_inst exceptions.
Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Tested-by: Lancelot Six <lancelot.six@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The debugger requires the control stack header to be filled in to
update_waves.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The KFD topology includes cache line size, but we have not been
filling that information out unless we are parsing a CRAT table.
Fill in this information for the devices where we have cache
information structs, and pipe this information to the topology
sysfs files.
v2: squash in fix from Joe (Alex)
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On GFX 9.4.3, for a given KFD node, fetch the correct drm device from
XCP manager when checking for cgroup permissions.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This instruction has no functional difference to S_ENDPGM
but allows performance counters to track save events correctly.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Laurent Morichetti <laurent.morichetti@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Partial migration to system memory should use migrate.addr, not
prange->start as virtual address to allocate system memory page.
Fixes: a546a27684 ("drm/amdkfd: Use partial migrations/mapping for GPU/CPU page faults in SVM")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This needs to be set to 1 to avoid a potential deadlock in
the GC 10.x and newer. On GC 9.x and older, this needs
to be set to 0. This can lead to hangs in some mixed
graphics and compute workloads. Updated firmware is also
required for AQL.
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
This needs to be set to 1 to avoid a potential deadlock in
the GC 10.x and newer. On GC 9.x and older, this needs
to be set to 0. This can lead to hangs in some mixed
graphics and compute workloads. Updated firmware is also
required for AQL.
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c:1024 kfd_dbg_trap_device_snapshot() warn: variable dereferenced before check 'entry_size' (see line 1021)
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the
process_info->kfd_bo_list. There is no explicit KFD API call to validate
them or add eviction fences to them.
This patch automatically validates and fences dymanic DMABuf imports when
they are added to a compute VM. Revalidation after evictions is handled
in the VM code.
v2:
* Renamed amdgpu_vm_validate_evicted_bos to amdgpu_vm_validate
* Eliminated evicted_user state, use evicted state for VM BOs and user BOs
* Fixed and simplified amdgpu_vm_fence_imports, depends on reserved BOs
* Moved dma_resv_reserve_fences for amdgpu_vm_fence_imports into
amdgpu_vm_validate, outside the vm->status_lock
* Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds
without KFD
v4: Eliminate amdgpu_vm_fence_imports. It's not needed because the
reservation with its fences is shared with the export, as long as all
imports are from KFD, with the exports already reserved, validated and
fenced by the KFD restore worker.
v5: Reintroduced separate evicted_user state to simplify the state machine
and CS error handling when amdgpu_vm_validate is called without a ticket.
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Range interval [start, last] is ordered by rb_tree, rb_prev, rb_next
return value still needs NULL check, thus modified from "node" to "rb_node".
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:2691 svm_range_get_range_boundaries() warn: can 'node' even be NULL?
Suggested-by: Philip Yang <Philip.Yang@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix err return value and reset pgmap->type after checking it.
Fixes: c83dee9b63 ("drm/amdkfd: add SPM support for SVM")
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Properly mark kfd_process->ef as __rcu and consistently use the right
accessor functions.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312052245.yFpBSgNH-lkp@intel.com/
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
======================================================
WARNING: possible circular locking dependency detected
6.5.0-kfd-fkuehlin #276 Not tainted
------------------------------------------------------
kworker/8:2/2676 is trying to acquire lock:
ffff9435aae95c88 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550
but task is already holding lock:
ffff9435cd8e1720 (&svms->lock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&svms->lock){+.+.}-{3:3}:
__mutex_lock+0x97/0xd30
kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu]
kfd_ioctl+0x1b2/0x5d0 [amdgpu]
__x64_sys_ioctl+0x86/0xc0
do_syscall_64+0x39/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #1 (&mm->mmap_lock){++++}-{3:3}:
down_read+0x42/0x160
svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu]
process_one_work+0x27a/0x540
worker_thread+0x53/0x3e0
kthread+0xeb/0x120
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x11/0x20
-> #0 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}:
__lock_acquire+0x1426/0x2200
lock_acquire+0xc1/0x2b0
__flush_work+0x80/0x550
__cancel_work_timer+0x109/0x190
svm_range_bo_release+0xdc/0x1c0 [amdgpu]
svm_range_free+0x175/0x180 [amdgpu]
svm_range_deferred_list_work+0x15d/0x340 [amdgpu]
process_one_work+0x27a/0x540
worker_thread+0x53/0x3e0
kthread+0xeb/0x120
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x11/0x20
other info that might help us debug this:
Chain exists of:
(work_completion)(&svm_bo->eviction_work) --> &mm->mmap_lock --> &svms->lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&svms->lock);
lock(&mm->mmap_lock);
lock(&svms->lock);
lock((work_completion)(&svm_bo->eviction_work));
I believe this cannot really lead to a deadlock in practice, because
svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO
refcount is non-0. That means it's impossible that svm_range_bo_release
is running concurrently. However, there is no good way to annotate this.
To avoid the problem, take a BO reference in
svm_range_schedule_evict_svm_bo instead of in the worker. That way it's
impossible for a BO to get freed while eviction work is pending and the
cancel_work_sync call in svm_range_bo_release can be eliminated.
v2: Use svm_bo_ref_unless_zero and explained why that's safe. Also
removed redundant checks that are already done in
amdkfd_fence_enable_signaling.
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
That commit causes NULL pointer dereferences in dmesgs when
running applications using ROCm, including clinfo, blender,
and PyTorch, since v6.6.1. Revert it to fix blender again.
This reverts commit 96c211f1f9.
Closes: https://github.com/ROCm/ROCm/issues/2596
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2991
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Kaibo Ma <ent3rm4n@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the following about iterator use:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1456 kfd_add_peer_prop() warn: iterator used outside loop: 'iolink3'
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Before using list_first_entry, make sure to check that list is not
empty, if list is empty return -ENODATA.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1347 kfd_create_indirect_link_prop() warn: can 'gpu_link' even be NULL?
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1428 kfd_add_peer_prop() warn: can 'iolink1' even be NULL?
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1433 kfd_add_peer_prop() warn: can 'iolink2' even be NULL?
Fixes: 0f28cca87e ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
dbg_flags looks to be defined with incorrect data type; to process
multiple debug flag options, and hence defined dbg_flags as u32.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_packet_manager_v9.c:117 pm_map_process_aldebaran() warn: maybe use && instead of &
Fixes: 0de4ec9a03 ("drm/amdgpu: prepare map process for multi-process debug devices")
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Updates for v6.8:
Core:
- Add support for SDM670, SM8650
- Handle the CFG interconnect to fix the obscure hangs / timeouts
on register write
- Kconfig fix for QMP dependency
- DT schema fixes
DPU:
- Add support for SDM670, SM8650
- Enable SmartDMA on SM8350 and SM8450
- Correct UBWC settings for SC8280XP
- Fix catalog settings for SC8180X
- Actually make use of the version to switch between QSEED3/3LITE/4
scalers
- Use devres-managed and drm-managed allocations where appropriate
- misc other fixes
- Enabled YUV writeback on SC7280, SM8250
- Enabled writeback on SM8350, SM8450
- CRC fix when encoder is selected as the input source
- other misc fixes
MDP4:
- Use devres-managed and drm-managed allocations where appropriate
- flush vblank event on CRTC disable
MDP5:
- Use devres-managed and drm-managed allocations where appropriate
DP:
- Add support for SM8650
- Enable PM runtime support
- Merge msm-specific debugfs dir with the generic one
- Described DisplayPort on SM8150 in DeviceTree bindings
- Moved dp_display_get_next_bridge() to probe()
DSI:
- Add support for SM8650
- Enable PM runtime support
GPU/GEM:
- demote userspace triggerable warnings to debug
- add GEM object metadata UAPI
- move GPU devcoredumps to GPU device
- fix hangcheck to skip retired submits
- expose UBWC config to userspace
- fix a680 chip-id
- drm_exec conversion
- drm/ci: remove rebase-merge directory (to unblock CI)
[airlied: fix drm_exec/amd interaction]
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGs9auYqmo-7NSd9FsbNBCDf7aBevd=4xkcF3A5G_OGvMQ@mail.gmail.com
SVM uses hmm page walk to valid buffer before map to gpu vm. After have partial
migration/mapping do validation on same vm range as migration/map do instead of
whole svm range that can be very large. This change is expected to improve svm
code performance.
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On gfx943 APU there is no VRAM and page migration, queue CWSR area, svm
range with always mapped flag, is not mapped to GPU correctly. This
works fine if retry fault on CWSR area can be recovered, but could cause
deadlock if there is another retry fault recover waiting for CWSR to
finish.
Fix this by mapping svm range with always mapped flag to GPU with ACCESS
attribute if XNACK ON.
There is side effect, because all GPUs have ACCESS attribute by default
on new svm range with XNACK on, the CWSR area will be mapped to all GPUs
after this change. This side effect will be fixed with Thunk change to
set CWSR svm range with ACCESS_IN_PLACE attribute on the GPU that user
queue is created.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix up on mes process context flush to prevent non-mes devices from
spamming error messages or running into undefined behaviour during
process termination.
Fixes: bd33bb1409 ("drm/amdkfd: fix mes set shader debugger process management")
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
MES provides the driver a call to explicitly flush stale process memory
within the MES to avoid a race condition that results in a fatal
memory violation.
When SET_SHADER_DEBUGGER is called, the driver passes a memory address
that represents a process context address MES uses to keep track of
future per-process calls.
Normally, MES will purge its process context list when the last queue
has been removed. The driver, however, can call SET_SHADER_DEBUGGER
regardless of whether a queue has been added or not.
If SET_SHADER_DEBUGGER has been called with no queues as the last call
prior to process termination, the passed process context address will
still reside within MES.
On a new process call to SET_SHADER_DEBUGGER, the driver may end up
passing an identical process context address value (based on per-process
gpu memory address) to MES but is now pointing to a new allocated buffer
object during KFD process creation. Since the MES is unaware of this,
access of the passed address points to the stale object within MES and
triggers a fatal memory violation.
The solution is for KFD to explicitly flush the process context address
from MES on process termination.
Note that the flush call and the MES debugger calls use the same MES
interface but are separated as KFD calls to avoid conflicting with each
other.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: Alice Wong <shiwei.wong@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use drm_gem_prime_fd_to_handle to import DMABufs for interop. This
ensures that a GEM handle is created on import and that obj->dma_buf
will be set and remain set as long as the object is imported into KFD.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Xiaogang.Chen <Xiaogang.Chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Create GEM handles for exporting DMABufs using GEM-Prime APIs. The GEM
handles are created in a drm_client_dev context to avoid exposing them
in user mode contexts through a DMABuf import.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In cases where the # is known ahead of time, it is silly to do the table
resize dance.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Patchwork: https://patchwork.freedesktop.org/patch/568338/
This patch implements partial migration/mapping for gpu/cpu page faults in SVM
according to migration granularity(default 2MB). A svm range may include pages
from both system ram and vram of one gpu now. These chagnes are expected to
improve migration performance and reduce mmu callback and TLB flush workloads.
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Memory leaks of gang_ctx_bo and wptr_bo.
[How]
Free gang_ctx_bo and wptr_bo in pqm_uninit.
v2: add a common function pqm_clean_queue_resource to
free queue's resources.
v3: reset pdd->pqd.num_gws when destorying GWS queue.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make restore workers freezable so we don't have to explicitly flush them
in suspend and GPU reset code paths, and we don't accidentally try to
restore BOs while the GPU is suspended. Not having to flush restore_work
also helps avoid lock/fence dependencies in the GPU reset case where we're
not allowed to wait for fences.
A side effect of this is, that we can now have multiple concurrent threads
trying to signal the same eviction fence. Rework eviction fence signaling
and replacement to account for that.
The GPU reset path can no longer rely on restore_process_worker to resume
queues because evict/restore workers can run independently of it. Instead
call a new restore_process_helper directly.
This is an RFC and request for testing.
v2:
- Reworked eviction fence signaling
- Introduced restore_process_helper
v3:
- Handle unsignaled eviction fences in restore_process_bos
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
KFD_GC_VERSION was recently updated to use a new function
for IP version checks. As a result, use KFD_GC_VERSION as
the common function for all IP version checks in KFD.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes issue where user events of type KFD_EVENT_TYPE_HW_EXCEPTION do not
have valid data
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The trap handler could be entered with pending VALU exceptions, so
clear the exception state before issuing vector instructions.
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Tested-by: Lancelot Six <lancelot.six@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This will make it possible for amdgpu GEM ioctls to flush TLBs on compute
VMs.
This removes VMID-based TLB flushing and always uses PASID-based
flushing. This still works because it scans the VMID-PASID mapping
registers to find the right VMID. It's only slightly less efficient. This
is not a production use case.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
- big pile of amd fixes, but mostly hw support newly added in 6.7
- i915 fixes, mostly minor things
- qxl memory leak fix
- vc4 uaf fix in mock helpers
- syncobj fix for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEb4nG6jLu8Y5XI+PfTA9ye/CYqnEFAmVOlEQACgkQTA9ye/CY
qnGawQ//d7M2CZpd9LvkRnaV2+fG9yGONOwZsG5fVtXrfT4RDQmITC9KMEh2TxJb
s1W6HA+UijMMx6RtQN6cTYNHeYDaX2b55/g3lMnreXydii0COJwkWe52iFn0Dpcm
RpsT5cYLEiRtiTvEzKbrkxS+rrQMu9jwxcA23b+lMkmybVgqQe1m9hYtRxZCFqr7
6BMKOgrCRoY1mYZrNaccXBHvvgOtcpWPOsuNNgjW3MDKB663BmpABT1iDg3xxPdX
BI8SAl6PX6ju2Jwi3WPlscmI199fhcUXDuqb8LXhJsuqynhwU940aUlxcbI7Hz6Y
LaFwSK4OiaIsIC8yisa7cZ1z2mqnMIiGXasP6mfYVmYqpGYMW+AZcOmzui/MLiGd
duOcvK/bLxN7moSqcKgz+mmrLfZzJkPLV8pGEVk0IbTn239AnR76bqrEQJ+Iqukx
d4yLGE4OJchD4zzw5RlVmQIhwA8M/5croJBIo6yNyQB5xgN/Krd4QUc6KjjgzNo8
e402NjaW2/PqWIiPtsL5tK4XdkVtvMvVUq+bJSch6Wfn3j9u06/wgFl1FBRyX7zS
3QYvMtF/QM5/QTpbdl82hSSsJ78iO62tPYSOhycIxgB/BHoc/fap+IOFtKKnT3RZ
xr7UiwAQA043gAMvS+TkZAc6bFW8U8Dzxu5XxEPE1L+WU6XCSbs=
=l45e
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2023-11-10' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Daniel Vetter:
"Dave's VPN to the big machine died, so it's on me to do fixes pr this
and next week while everyone else is at plumbers.
- big pile of amd fixes, but mostly for hw support newly added in 6.7
- i915 fixes, mostly minor things
- qxl memory leak fix
- vc4 uaf fix in mock helpers
- syncobj fix for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE"
* tag 'drm-next-2023-11-10' of git://anongit.freedesktop.org/drm/drm: (78 commits)
drm/amdgpu: fix error handling in amdgpu_vm_init
drm/amdgpu: Fix possible null pointer dereference
drm/amdgpu: move UVD and VCE sched entity init after sched init
drm/amdgpu: move kfd_resume before the ip late init
drm/amd: Explicitly check for GFXOFF to be enabled for s0ix
drm/amdgpu: Change WREG32_RLC to WREG32_SOC15_RLC where inst != 0 (v2)
drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v5)
drm/amdgpu: add smu v13.0.6 pcs xgmi ras error query support
drm/amdgpu: fix software pci_unplug on some chips
drm/amd/display: remove duplicated argument
drm/amdgpu: correct mca debugfs dump reg list
drm/amdgpu: correct acclerator check architecutre dump
drm/amdgpu: add pcs xgmi v6.4.0 ras support
drm/amdgpu: Change extended-scope MTYPE on GC 9.4.3
drm/amdgpu: disable smu v13.0.6 mca debug mode by default
drm/amdgpu: Support multiple error query modes
drm/amdgpu: refine smu v13.0.6 mca dump driver
drm/amdgpu: Do not program PF-only regs in hdp_v4_0.c under SRIOV (v2)
drm/amdgpu: Skip PCTL0_MMHUB_DEEPSLEEP_IB write in jpegv4.0.3 under SRIOV
drm: amd: Resolve Sphinx unexpected indentation warning
...
Change local memory type to MTYPE_UC on revision id 0
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The purpose of this patch is to disable XNACK or set XNACK OFF mode
on SRIOV platform which doesn't support it.
This will prevent user-space application to fail or result into
unexpected behaviour whenever the application need to run test-case
in XNACK ON mode.
Signed-off-by: Surbhi Kakarya <surbhi.kakarya@amd.com>
Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update cache info reporting based on compute and
memory partitioning modes.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GFX 9.4.3 uses a new version of the GC info table which
contains the cache info. This patch adds a new function
to populate the cache info from IP discovery for GFX 9.4.3.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
kernel:
- add initial vmemdup-user-array
core:
- fix platform remove() to return void
- drm_file owner updated to reflect owner
- move size calcs to drm buddy allocator
- let GPUVM build as a module
- allow variable number of run-queues in scheduler
edid:
- handle bad h/v sync_end in EDIDs
panfrost:
- add Boris as maintainer
fbdev:
- use fb_ops helpers more
- only allow logo use from fbcon
- rename fb_pgproto to pgprot_framebuffer
- add HPD state to drm_connector_oob_hotplug_event
- convert to fbdev i/o mem helpers
i915:
- Enable meteorlake by default
- Early Xe2 LPD/Lunarlake display enablement
- Rework subplatforms into IP version checks
- GuC based TLB invalidation for Meteorlake
- Display rework for future Xe driver integration
- LNL FBC features
- LNL display feature capability reads
- update recommended fw versions for DG2+
- drop fastboot module parameter
- added deviceid for Arrowlake-S
- drop preproduction workarounds
- don't disable preemption for resets
- cleanup inlines in headers
- PXP firmware loading fix
- Fix sg list lengths
- DSC PPS state readout/verification
- Add more RPL P/U PCI IDs
- Add new DG2-G12 stepping
- DP enhanced framing support to state checker
- Improve shared link bandwidth management
- stop using GEM macros in display code
- refactor related code into display code
- locally enable W=1 warnings
- remove PSR watchdog timers on LNL
amdgpu:
- RAS/FRU EEPROM updatse
- IP discovery updatses
- GC 11.5 support
- DCN 3.5 support
- VPE 6.1 support
- NBIO 7.11 support
- DML2 support
- lots of IP updates
- use flexible arrays for bo list handling
- W=1 fixes
- Enable seamless boot in more cases
- Enable context type property for HDMI
- Rework GPUVM TLB flushing
- VCN IB start/size alignment fixes
amdkfd:
- GC 10/11 fixes
- GC 11.5 support
- use partial migration in GPU faults
radeon:
- W=1 Fixes
- fix some possible buffer overflow/NULL derefs
nouveau:
- update uapi for NO_PREFETCH
- scheduler/fence fixes
- rework suspend/resume for GSP-RM
- rework display in preparation for GSP-RM
habanalabs:
- uapi: expose tsc clock
- uapi: block access to eventfd through control device
- uapi: force dma-buf export to PAGE_SIZE alignments
- complete move to accel subsystem
- move firmware interface include files
- perform hard reset on PCIe AXI drain event
- optimise user interrupt handling
msm:
- DP: use existing helpers for DPCD
- DPU: interrupts reworked
- gpu: a7xx (a730/a740) support
- decouple msm_drv from kms for headless devices
mediatek:
- MT8188 dsi/dp/edp support
- DDP GAMMA - 12 bit LUT support
- connector dynamic selection capability
rockchip:
- rv1126 mipi-dsi/vop support
- add planar formats
ast:
- rename constants
panels:
- Mitsubishi AA084XE01
- JDI LPM102A188A
- LTK050H3148W-CTA6
ivpu:
- power management fixes
qaic:
- add detach slice bo api
komeda:
- add NV12 writeback
tegra:
- support NVSYNC/NHSYNC
- host1x suspend fixes
ili9882t:
- separate into own driver
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmVAgzYACgkQDHTzWXnE
hr7ZEQ//UXne3tyGOsU3X8r+lstLFDMa90a3hvTg6hX+Q0MjHd/clwkKFkLpkipL
n7gIZlaHl11dRs0FzrIZA5EVAAgjMLKmIl10NBDFec6ZFA3VERcggx8y61uifI15
VviMR1VbLHYZaCdyrQOK0A4wcktWnKXyoXp7cwy9crdc2GOBMUZkdIqtvD7jHxQx
UMIFnzi1CyKUX/Fjt/JceYcNk9y2ZGkzakYO3sHcUdv4DPu9qX4kNzpjF691AZBP
UeKWvCswTRVg2M0kuo/RYIBzqaTmOlk6dHLWBognIeZPyuyhCcaGC2d64c6tShwQ
dtHdi+IgyQ8s2qb350ymKTQUP7xA/DfZBwH7LvrZALBxeQGYQN1CnsgDMOS2wcUc
XrRFiS7PxEOtMMBctcPBnnoV5ttnsLLlPpzM9puh9sUFMn6CgLzcAMqXdqxzMajH
+dz2aD1N0vMqq4varozOg9SC2QamgUiPN/TQfrulhCTCfQaXczy5x1OYiIz65+Sl
mKoe2WASuP9Ve8do4N/wEwH5SZY2ItipBdUTRxttY9NTanmV0X5DjZBXH5b9XGci
Zl5Ar613f9zwm5T5BVA5k6s3ZbGY6QcP5pDNTCPaSgitfFXIdReBZ2CaYzK3MPg/
Wit/TXrud9yT6VPpI1igboMyasf5QubV1MY1K83kOCWr9u8R2CM=
=l79u
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Highlights:
- AMD adds some more upcoming HW platforms
- Intel made Meteorlake stable and started adding Lunarlake
- nouveau has a bunch of display rework in prepartion for the NVIDIA
GSP firmware support
- msm adds a7xx support
- habanalabs has finished migration to accel subsystem
Detail summary:
kernel:
- add initial vmemdup-user-array
core:
- fix platform remove() to return void
- drm_file owner updated to reflect owner
- move size calcs to drm buddy allocator
- let GPUVM build as a module
- allow variable number of run-queues in scheduler
edid:
- handle bad h/v sync_end in EDIDs
panfrost:
- add Boris as maintainer
fbdev:
- use fb_ops helpers more
- only allow logo use from fbcon
- rename fb_pgproto to pgprot_framebuffer
- add HPD state to drm_connector_oob_hotplug_event
- convert to fbdev i/o mem helpers
i915:
- Enable meteorlake by default
- Early Xe2 LPD/Lunarlake display enablement
- Rework subplatforms into IP version checks
- GuC based TLB invalidation for Meteorlake
- Display rework for future Xe driver integration
- LNL FBC features
- LNL display feature capability reads
- update recommended fw versions for DG2+
- drop fastboot module parameter
- added deviceid for Arrowlake-S
- drop preproduction workarounds
- don't disable preemption for resets
- cleanup inlines in headers
- PXP firmware loading fix
- Fix sg list lengths
- DSC PPS state readout/verification
- Add more RPL P/U PCI IDs
- Add new DG2-G12 stepping
- DP enhanced framing support to state checker
- Improve shared link bandwidth management
- stop using GEM macros in display code
- refactor related code into display code
- locally enable W=1 warnings
- remove PSR watchdog timers on LNL
amdgpu:
- RAS/FRU EEPROM updatse
- IP discovery updatses
- GC 11.5 support
- DCN 3.5 support
- VPE 6.1 support
- NBIO 7.11 support
- DML2 support
- lots of IP updates
- use flexible arrays for bo list handling
- W=1 fixes
- Enable seamless boot in more cases
- Enable context type property for HDMI
- Rework GPUVM TLB flushing
- VCN IB start/size alignment fixes
amdkfd:
- GC 10/11 fixes
- GC 11.5 support
- use partial migration in GPU faults
radeon:
- W=1 Fixes
- fix some possible buffer overflow/NULL derefs
nouveau:
- update uapi for NO_PREFETCH
- scheduler/fence fixes
- rework suspend/resume for GSP-RM
- rework display in preparation for GSP-RM
habanalabs:
- uapi: expose tsc clock
- uapi: block access to eventfd through control device
- uapi: force dma-buf export to PAGE_SIZE alignments
- complete move to accel subsystem
- move firmware interface include files
- perform hard reset on PCIe AXI drain event
- optimise user interrupt handling
msm:
- DP: use existing helpers for DPCD
- DPU: interrupts reworked
- gpu: a7xx (a730/a740) support
- decouple msm_drv from kms for headless devices
mediatek:
- MT8188 dsi/dp/edp support
- DDP GAMMA - 12 bit LUT support
- connector dynamic selection capability
rockchip:
- rv1126 mipi-dsi/vop support
- add planar formats
ast:
- rename constants
panels:
- Mitsubishi AA084XE01
- JDI LPM102A188A
- LTK050H3148W-CTA6
ivpu:
- power management fixes
qaic:
- add detach slice bo api
komeda:
- add NV12 writeback
tegra:
- support NVSYNC/NHSYNC
- host1x suspend fixes
ili9882t:
- separate into own driver"
* tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm: (1803 commits)
drm/amdgpu: Remove unused variables from amdgpu_show_fdinfo
drm/amdgpu: Remove duplicate fdinfo fields
drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver is unloaded
drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems
drm/amdgpu: Retrieve CE count from ce_count_lo_chip in EccInfo table
drm/amdgpu: Identify data parity error corrected in replay mode
drm/amdgpu: Fix typo in IP discovery parsing
drm/amd/display: fix S/G display enablement
drm/amdxcp: fix amdxcp unloads incompletely
drm/amd/amdgpu: fix the GPU power print error in pm info
drm/amdgpu: Use pcie domain of xcc acpi objects
drm/amd: check num of link levels when update pcie param
drm/amdgpu: Add a read to GFX v9.4.3 ring test
drm/amd/pm: call smu_cmn_get_smc_version in is_mode1_reset_supported.
drm/amdgpu: get RAS poison status from DF v4_6_2
drm/amdgpu: Use discovery table's subrevision
drm/amd/display: 3.2.256
drm/amd/display: add interface to query SubVP status
drm/amd/display: Read before writing Backlight Mode Set Register
drm/amd/display: Disable SYMCLK32_SE RCO on DCN314
...
- Limit the hardcoded topology quirk for Hygon CPUs to those which have a
model ID less than 4. The newer models have the topology CPUID leaf 0xB
correctly implemented and are not affected.
- Make SMT control more robust against enumeration failures
SMT control was added to allow controlling SMT at boottime or
runtime. The primary purpose was to provide a simple mechanism to
disable SMT in the light of speculation attack vectors.
It turned out that the code is sensible to enumeration failures and
worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration
which means the primary thread mask is not set up correctly. By chance
a XEN/PV boot ends up with smp_num_siblings == 2, which makes the
hotplug control stay at its default value "enabled". So the mask is
never evaluated.
The ongoing rework of the topology evaluation caused XEN/PV to end up
with smp_num_siblings == 1, which sets the SMT control to "not
supported" and the empty primary thread mask causes the hotplug core to
deny the bringup of the APS.
Make the decision logic more robust and take 'not supported' and 'not
implemented' into account for the decision whether a CPU should be
booted or not.
- Fake primary thread mask for XEN/PV
Pretend that all XEN/PV vCPUs are primary threads, which makes the
usage of the primary thread mask valid on XEN/PV. That is consistent
with because all of the topology information on XEN/PV is fake or even
non-existent.
- Encapsulate topology information in cpuinfo_x86
Move the randomly scattered topology data into a separate data
structure for readability and as a preparatory step for the topology
evaluation overhaul.
- Consolidate APIC ID data type to u32
It's fixed width hardware data and not randomly u16, int, unsigned long
or whatever developers decided to use.
- Cure the abuse of cpuinfo for persisting logical IDs.
Per CPU cpuinfo is used to persist the logical package and die
IDs. That's really not the right place simply because cpuinfo is
subject to be reinitialized when a CPU goes through an offline/online
cycle.
Use separate per CPU data for the persisting to enable the further
topology management rework. It will be removed once the new topology
management is in place.
- Provide a debug interface for inspecting topology information
Useful in general and extremly helpful for validating the topology
management rework in terms of correctness or "bug" compatibility.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmU+yX0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoROUD/4vlvKEcpm9rbI5DzLcaq4DFHKbyEZF
cQtzuOSM/9vTc9DHnuoNNLl9TWSYxiVYnejf3E21evfsqspYlzbTH8bId9XBCUid
6B68AJW842M2erNuwj0b0HwF1z++zpDmBDyhGOty/KQhoM8pYOHMvntAmbzJbuso
Dgx6BLVFcboTy6RwlfRa0EE8f9W5V+JbmG/VBDpdyCInal7VrudoVFZmWQnPIft7
zwOJpAoehkp8OKq7geKDf79yWxu9a1sNPd62HtaVEvfHwehHqE6OaMLss1us+0vT
SJ/D6gmRQBOwcXaZL0wL1dG7Km9Et4AisOvzhXGvTa5b2D5oljVoqJ7V7FTf5g3u
y3aqWbeUJzERUbeJt1HoGVAKyA4GtZOvg+TNIysf6F1Z4khl9alfa9jiqjj4g1au
zgItq/ZMBEBmJ7X4FxQUEUVBG2CDsEidyNBDRcimWQUDfBakV/iCs0suD8uu8ZOD
K5jMx8Hi2+xFx7r1YqsfsyMBYOf/zUZw65RbNe+kI992JbJ9nhcODbnbo5MlAsyv
vcqlK5FwXgZ4YAC8dZHU/tyTiqAW7oaOSkqKwTP5gcyNEqsjQHV//q6v+uqtjfYn
1C4oUsRHT2vJiV9ktNJTA4GQHIYF4geGgpG8Ih2SjXsSzdGtUd3DtX1iq0YiLEOk
eHhYsnniqsYB5g==
=xrz8
-----END PGP SIGNATURE-----
Merge tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Thomas Gleixner:
- Limit the hardcoded topology quirk for Hygon CPUs to those which have
a model ID less than 4.
The newer models have the topology CPUID leaf 0xB correctly
implemented and are not affected.
- Make SMT control more robust against enumeration failures
SMT control was added to allow controlling SMT at boottime or
runtime. The primary purpose was to provide a simple mechanism to
disable SMT in the light of speculation attack vectors.
It turned out that the code is sensible to enumeration failures and
worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration
which means the primary thread mask is not set up correctly. By
chance a XEN/PV boot ends up with smp_num_siblings == 2, which makes
the hotplug control stay at its default value "enabled". So the mask
is never evaluated.
The ongoing rework of the topology evaluation caused XEN/PV to end up
with smp_num_siblings == 1, which sets the SMT control to "not
supported" and the empty primary thread mask causes the hotplug core
to deny the bringup of the APS.
Make the decision logic more robust and take 'not supported' and 'not
implemented' into account for the decision whether a CPU should be
booted or not.
- Fake primary thread mask for XEN/PV
Pretend that all XEN/PV vCPUs are primary threads, which makes the
usage of the primary thread mask valid on XEN/PV. That is consistent
with because all of the topology information on XEN/PV is fake or
even non-existent.
- Encapsulate topology information in cpuinfo_x86
Move the randomly scattered topology data into a separate data
structure for readability and as a preparatory step for the topology
evaluation overhaul.
- Consolidate APIC ID data type to u32
It's fixed width hardware data and not randomly u16, int, unsigned
long or whatever developers decided to use.
- Cure the abuse of cpuinfo for persisting logical IDs.
Per CPU cpuinfo is used to persist the logical package and die IDs.
That's really not the right place simply because cpuinfo is subject
to be reinitialized when a CPU goes through an offline/online cycle.
Use separate per CPU data for the persisting to enable the further
topology management rework. It will be removed once the new topology
management is in place.
- Provide a debug interface for inspecting topology information
Useful in general and extremly helpful for validating the topology
management rework in terms of correctness or "bug" compatibility.
* tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/apic, x86/hyperv: Use u32 in hv_snp_boot_ap() too
x86/cpu: Provide debug interface
x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids
x86/apic: Use u32 for wakeup_secondary_cpu[_64]()
x86/apic: Use u32 for [gs]et_apic_id()
x86/apic: Use u32 for phys_pkg_id()
x86/apic: Use u32 for cpu_present_to_apicid()
x86/apic: Use u32 for check_apicid_used()
x86/apic: Use u32 for APIC IDs in global data
x86/apic: Use BAD_APICID consistently
x86/cpu: Move cpu_l[l2]c_id into topology info
x86/cpu: Move logical package and die IDs into topology info
x86/cpu: Remove pointless evaluation of x86_coreid_bits
x86/cpu: Move cu_id into topology info
x86/cpu: Move cpu_core_id into topology info
hwmon: (fam15h_power) Use topology_core_id()
scsi: lpfc: Use topology_core_id()
x86/cpu: Move cpu_die_id into topology info
x86/cpu: Move phys_proc_id into topology info
x86/cpu: Encapsulate topology information in cpuinfo_x86
...
On gfx943 APU, EXT_COHERENT should give MTYPE_CC for local and
MTYPE_UC for nonlocal memory.
On NUMA systems, local memory gets the local mtype, set by an
override callback. If EXT_COHERENT is set, memory will be set as
MTYPE_UC by default, with local memory MTYPE_CC.
Add an option in the override function for this case, and
add a check to ensure it is not used on UNCACHED memory.
V2: Combined APU and NUMA code into one patch
V3: Fixed a potential nullptr in amdgpu_vm_bo_update
Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit dc427a473e.
The change prevents migrating the entire range to VRAM because retry
fault restore_pages map the remaining system memory range to GPUs. It
will work correctly to submit together with partial mapping to GPU
patch later.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit f9caf6cdd5.
Needed for the next revert patch.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:2073: warning: Function parameter or member 'remap_list' not described in 'svm_range_add'
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Split SVM ranges that have been mapped into 2MB page table entries,
require to be remap in case the split has happened in a non-aligned
VA.
[WHY]:
This condition causes the 2MB page table entries be split into 4KB
PTEs.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Function svm_range_split_by_grinity is not used,
so it is removed.
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The topology related information is randomly scattered across cpuinfo_x86.
Create a new structure cpuinfo_topo and move in a first step initial_apicid
and apicid into it.
Aside of being better readable this is in preparation for replacing the
horribly fragile CPU topology evaluation code further down the road.
Consolidate APIC ID fields to u32 as that represents the hardware type.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.269787744@linutronix.de
Here, Adding db_size in byte to find the doorbell's
absolute offset for both 32-bit and 64-bit doorbell sizes.
So that doorbell offset will be aligned based on the doorbell
size.
v2:
- Addressed the review comment from Felix.
v3:
- Adding doorbell_size as parameter to get db absolute offset.
v4:
Squash the two patches into one.
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch implements partial migration in gpu page fault according to migration
granularity(default 2MB) and not split svm range in cpu page fault handling.
A svm range may include pages from both system ram and vram of one gpu now.
These chagnes are expected to improve migration performance and reduce mmu
callback and TLB flush workloads.
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If there is no VRAM domain, bo_node is NULL and this causes crash.
Refactor the change, and use the module parameter as higher privilege.
Need another patch to support override PTE flag on APU.
Fixes: 5f248462c6 ("drm/amdgpu: Add EXT_COHERENT memory allocation flags")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
I think this was an abstraction back from when
kfd supported both radeon and amdgpu. Since we just
support amdgpu now, there is no more need for this and
we can use the amdgpu structures directly.
This also avoids having the kfd_cu_info structures on
the stack when inlining which can blow up the stack.
Cc: Arnd Bergmann <arnd@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
kfd_topology.c:2082:1: warning: the frame size of 1440 bytes is larger than 1024 bytes
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2866
Cc: Arnd Bergmann <arnd@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
prange->svm_bo unref can happen in both mmu callback and a callback after
migrate to system ram. Both are async call in different tasks. Sync svm_bo
unref operation to avoid random "use-after-free".
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Tested-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If new range is splited to multiple pranges with max_svm_range_pages
alignment and added to update_list, svm validate and map should keep
going after error to make sure prange->mapped_to_gpu flag is up to date
for the whole range.
svm validate and map update set prange->mapped_to_gpu after mapping to
GPUs successfully, otherwise clear prange->mapped_to_gpu flag (for
update mapping case) instead of setting error flag, we can remove
the redundant error flag to simpliy code.
Refactor to remove goto and update prange->mapped_to_gpu flag inside
svm_range_lock, to guarant we always evict queues or unmap from GPUs if
there are invalid ranges.
After svm validate and map return error -EAGIN, the caller retry will
update the mapping for the whole range again.
Fixes: c22b044070 ("drm/amdkfd: flag added to handle errors from svm validate and map")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: James Zhu <james.zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch fixes:
1: ref number of prange's svm_bo got decreased by an async call from hmm. When
wait svm_bo of prange got released we shoul also wait prang->svm_bo become NULL,
otherwise prange->svm_bo may be set to null after allocate new vram buffer.
2: During waiting svm_bo of prange got released in a while loop should reschedule
current task to give other tasks oppotunity to run, specially the the workque
task that handles svm_bo ref release, otherwise we may enter to softlock.
Signed-off-by: Xiaogang.Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise GPU may access the stale mapping and generate IOMMU
IO_PAGE_FAULT.
Move this to inside p->mutex to prevent multiple threads mapping and
unmapping concurrently race condition.
After kfd_mem_dmaunmap_attachment is removed from unmap_bo_from_gpuvm,
kfd_mem_dmaunmap_attachment is called if failed to map to GPUs, and
before free the mem attachment in case failed to unmap from GPUs.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Directly use tbo's start address will miss the domain start offset. Need
to use gpu_offset instead.
Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The validated_once flag is not used after the prefault was removed, The
prefault was needed to ensure validate all system memory pages at least
once before mapping or migrating the range to GPU.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We do not need free dma address array of svm_range each time we do dma unmap
for pages in svm_range as we can reuse the same array. Only free it when free
svm_range. Separate these two operations and use them accordingly.
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Directly use tbo's start address will miss the domain start offset. Need
to use gpu_offset instead.
Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
These flags (for GEM and SVM allocations) allocate
memory that allows for system-scope atomic semantics.
On GFX943 these flags cause caches to be avoided on
non-local memory.
On all other ASICs they are identical in functionality to the
equivalent COHERENT flags.
Corresponding Thunk patch is at
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There are cases where HSA runtime is not enabled through the
AMDKFD_IOC_RUNTIME_ENABLE call when adding queues and the MES ADD_QUEUE
API should clear the MES process context instead of SET_SHADER_DEBUGGER.
Such examples are legacy HSA runtime builds that do not support the
current exception handling and running KFD tests.
The only time ADD_QUEUE.skip_process_ctx_clear is required is for
debugger use cases where a debugged process is always runtime enabled
when adding a queue.
Tested-by: Shikai Guo <shikai.guo@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use an inline function for version check. Gives more flexibility to
handle any format changes.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Heavy-weight TLB flush is required after unmap on all GPUs for
correctness and security.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Heavy-weight TLB flush is required after unmap on all GPUs for
correctness and security.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The code in kfd_mqd_manager_v11.c to support criu dump and
restore of queue state was missing.
Added it; should be equivalent to kfd_mqd_manager_v10.c.
CC: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The CU mask passed from user-space will change based on
different spatial partitioning mode. As a result, update
CU masking code for GFX9.4.3 to work for all partitioning
modes.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update cache info reporting in sysfs to report the correct
number of CUs and associated cache information based on
different spatial partitioning modes.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently, we store CU info only for a single XCC assuming
that it is the same for all XCCs. However, that may not be
true. As a result, store CU info for all XCCs. This info is
later used for CU masking.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch fixes the case where the code currently passes
absolute register address and not the reg offset, which HWS
expects, when sending the PM4 packet to set/update CWSR grace
period. Additionally, cleanup the signature of
build_grace_period_packet_info function as it no longer needs
the inst parameter.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Merge all developer debug options available as separated module
parameters in one, making it obvious that are for developers.
Drop the obsolete module options in favor of the new ones.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The code in kfd_mqd_manager_v11.c to support criu dump and
restore of queue state was missing.
Added it; should be equivalent to kfd_mqd_manager_v10.c.
CC: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rename KGD_MAX_QUEUES to AMDGPU_MAX_QUEUES to conform with
the naming convention followed in amdgpu_gfx.h. No functional
change.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The CU mask passed from user-space will change based on
different spatial partitioning mode. As a result, update
CU masking code for GFX9.4.3 to work for all partitioning
modes.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>