-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Rocm execution provider for AMD GPUs #1110
base: master
Are you sure you want to change the base?
Conversation
Sure log➜ build git:(rocm) ./bin/offline-tts-c-api \
--vits-model=./vits-ljs.onnx \
--vits-lexicon=./lexicon.txt \
--vits-tokens=./tokens.txt \
--sid=0 \
--provider=rocm \
--output-filename=./generated.wav \
'liliana, the most beautiful and lovely assistant of our team!'
here
terminate called after throwing an instance of 'Ort::Exception'
what(): /shared/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_rocm.so with error: libMIOpen.so.1: cannot open shared object file: No such file or directory
[1] 16225 IOT instruction (core dumped) ./bin/offline-tts-c-api --vits-model=./vits-ljs.onnx --sid=0 --provider=roc
➜ build git:(rocm) find / -name libMIOpen.so.1 2>/dev/null
/opt/rocm-6.0.0/lib/libMIOpen.so.1
➜ build git:(rocm) export LD_LIBRARY_PATH="/opt/rocm-6"
➜ build git:(rocm) export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/rocm-6.0.0/lib"
➜ build git:(rocm) ./bin/offline-tts-c-api \
--vits-model=./vits-ljs.onnx \
--vits-lexicon=./lexicon.txt \
--vits-tokens=./tokens.txt \
--sid=0 \
--provider=rocm \
--output-filename=./generated.wav \
'liliana, the most beautiful and lovely assistant of our team!'
here
terminate called after throwing an instance of 'Ort::Exception'
what(): /shared/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_rocm.so with error: libroctx64.so.4: cannot open shared object file: No such file or directory
[1] 16506 IOT instruction (core dumped) ./bin/offline-tts-c-api --vits-model=./vits-ljs.onnx --sid=0 --provider=roc
➜ build git:(rocm) find / -name libroctx64.so.4 2>/dev/null
Looks like roctracer is missing. although I installed rocm from amd website. AUTOMATIC1111/stable-diffusion-webui#10435 I tested on Ubuntu 22.04.4 LTS with amd ryzen 5 4500U (TPU) When trying to compile and install rocmtracer (because it's not included in my rocm installer) log~/roctracer/build ~/roctracer
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
CMake Error at CMakeLists.txt:54 (find_package):
Could not find a package configuration file provided by "HIP" with any of
the following names:
HIPConfig.cmake
hip-config.cmake
Add the installation prefix of "HIP" to CMAKE_PREFIX_PATH or set "HIP_DIR"
to a directory containing one of the above files. If "HIP" provides a
separate development package or SDK, be sure it has been installed.
-- Configuring incomplete, errors occurred!
See also "/home/user/roctracer/build/CMakeFiles/CMakeOutput.log". Trying further log➜ roctracer git:(amd-master) find /opt -name "hip-config.cmake"
/opt/rocm-6.0.0/lib/cmake/hip/hip-config.cmake
➜ roctracer git:(amd-master) export CMAKE_PREFIX_PATH=/opt/rocm-6.0.0
➜ roctracer git:(amd-master) ./build.sh
~/roctracer/build ~/roctracer
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
CMake Error at /opt/rocm-6.0.0/lib/cmake/hsa-runtime64/hsa-runtime64Targets.cmake:80 (message):
The imported target "hsa-runtime64::hsa-runtime64" references the file
"/opt/rocm-6.0.0/lib/libhsa-runtime64.so.1.12.60000"
but this file does not exist. Possible reasons include:
* The file was deleted, renamed, or moved to another location.
* An install or uninstall procedure did not complete successfully.
* The installation package was faulty and contained
"/opt/rocm-6.0.0/lib/cmake/hsa-runtime64/hsa-runtime64Targets.cmake"
but not all the files it references.
Call Stack (most recent call first):
/opt/rocm-6.0.0/lib/cmake/hsa-runtime64/hsa-runtime64-config.cmake:82 (include)
CMakeLists.txt:53 (find_package)
-- Configuring incomplete, errors occurred!
|
By the way, the onnxruntime lib with rocm we are using is built using rocm 6.1 https://github.com/csukuangfj/onnxruntime-libs/actions/runs/9886930772/job/27307628445#step:11:339
Could you try rocm 6.1 instead? |
I tried but seems like amdgpu-install is broken or something like that ➜ ~ amdgpu-install --rocmrelease=6.1.0
Hit:1 https://packages.microsoft.com/repos/code stable InRelease
Hit:3 https://brave-browser-apt-release.s3.brave.com stable InRelease
Hit:4 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:5 http://il.archive.ubuntu.com/ubuntu jammy InRelease
Hit:6 http://il.archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:2 https://apt.llvm.org/jammy llvm-toolchain-jammy-18 InRelease
Hit:7 https://repo.radeon.com/amdgpu/6.1.2/ubuntu jammy InRelease
Hit:8 http://il.archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:9 https://repo.radeon.com/rocm/apt/6.1.2 jammy InRelease
Hit:10 https://repo.radeon.com/rocm/apt/6.1.1 jammy InRelease
Reading package lists... Done
N: Skipping acquire of configured file 'main/binary-i386/Packages' as repository 'https://brave-browser-apt-release.s3.brave.com stable InRelease' doesn't support architecture 'i386'
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target Translations (main/i18n/Translation-en_IL) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target Translations (main/i18n/Translation-en) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11 (main/dep11/Components-amd64.yml) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11 (main/dep11/Components-all.yml) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11-icons-small (main/dep11/icons-48x48.tar) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11-icons (main/dep11/icons-64x64.tar) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11-icons-hidpi (main/dep11/[email protected]) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target CNF (main/cnf/Commands-amd64) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target CNF (main/cnf/Commands-all) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target Translations (main/i18n/Translation-en_IL) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target Translations (main/i18n/Translation-en) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11 (main/dep11/Components-amd64.yml) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11 (main/dep11/Components-all.yml) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11-icons-small (main/dep11/icons-48x48.tar) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11-icons (main/dep11/icons-64x64.tar) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11-icons-hidpi (main/dep11/[email protected]) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target CNF (main/cnf/Commands-amd64) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target CNF (main/cnf/Commands-all) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package rocm-opencl-runtime6.1.0
E: Couldn't find any package by glob 'rocm-opencl-runtime6.1.0'
E: Couldn't find any package by regex 'rocm-opencl-runtime6.1.0'
E: Unable to locate package rocm-hip-runtime6.1.0
E: Couldn't find any package by glob 'rocm-hip-runtime6.1.0'
E: Couldn't find any package by regex 'rocm-hip-runtime6.1.0' I can use 6.1.2 |
Does rocm 6.1.2 work for you? |
I compiled it with 6.1.2 and run, here's the log: ➜ sherpa-onnx git:(rocm) ./build/bin/offline-tts-c-api \
--vits-model=./vits-ljs.onnx \
--vits-lexicon=./lexicon.txt \
--vits-tokens=./tokens.txt \
--sid=0 \
--provider=rocm \
--output-filename=./generated.wav \
'liliana, the most beautiful and lovely assistant of our team!'
here
2024-07-12 15:09:47.142736969 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-07-12 15:09:47.142779433 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV liliana. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV the. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV most. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV beautiful. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV and. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV lovely. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV assistant. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV of. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV our. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV team. Ignore it!
2024-07-12 15:09:47.491127847 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Gather node. Name:'/enc_p/emb/Gather' Status Message: HIP error hipErrorInvalidDeviceFunction:invalid device function
terminate called after throwing an instance of 'Ort::Exception'
what(): Non-zero status code returned while running Gather node. Name:'/enc_p/emb/Gather' Status Message: HIP error hipErrorInvalidDeviceFunction:invalid device function
[1] 87518 IOT instruction (core dumped) ./build/bin/offline-tts-c-api --vits-model=./vits-ljs.onnx --sid=0 Maybe I need to set the GPU arch like in ROCm/ROCm#2536 (comment) but I couldn't find the specific architecture info |
It seems to be working. Please re-check your lexicon.txt, tokens.txt and the onnx model. Make sure you don't mix them. That's, don't use lexicon.txt from model1 with tokens.txt from model2 and onnx form model3. |
The same command with cpu provider works ./build/bin/offline-tts-c-api \
--vits-model=./vits-ljs.onnx \
--vits-lexicon=./lexicon.txt \
--vits-tokens=./tokens.txt \
--sid=0 \
--provider=cpu \
--output-filename=./generated.wav \
'liliana, the most beautiful and lovely assistant of our team!'
here
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV liliana. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV the. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV most. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV beautiful. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV and. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV lovely. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV assistant. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV of. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV our. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV team. Ignore it!
Input text is: liliana, the most beautiful and lovely assistant of our team!
Speaker ID is is: 0
Saved to: ./generated.wav
|
Please don't ignore OOVs. You can listen to the generated.wav Please post your lexicon.txt and tokens.txt |
There must be something wrong with your lexicon.txt and vits-ljs.onnx. |
You right. something was wrong with tokens / lexicon. the generated wav was invalid. ./build/bin/offline-tts-c-api \
--vits-model=./vits-ljs.onnx \
--vits-lexicon=./lexicon.txt \
--vits-tokens=./tokens.txt \
--sid=0 \
--provider=cpu \
--output-filename=./generated.wav \
'liliana, the most beautiful and lovely assistant of our team!'
here
Input text is: liliana, the most beautiful and lovely assistant of our team!
Speaker ID is is: 0
Saved to: ./generated.wav The generated file is valid and sounds good. Now with rocm: ./build/bin/offline-tts-c-api \
--vits-model=./vits-ljs.onnx \
--vits-lexicon=./lexicon.txt \
--vits-tokens=./tokens.txt \
--sid=0 \
--provider=rocm \
--output-filename=./generated.wav \
'liliana, the most beautiful and lovely assistant of our team!'
here
2024-07-12 16:05:19.861435222 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-07-12 16:05:19.861482854 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-07-12 16:05:20.499969272 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Gather node. Name:'/enc_p/emb/Gather' Status Message: HIP error hipErrorInvalidDeviceFunction:invalid device function
terminate called after throwing an instance of 'Ort::Exception'
what(): Non-zero status code returned while running Gather node. Name:'/enc_p/emb/Gather' Status Message: HIP error hipErrorInvalidDeviceFunction:invalid device function
[1] 98649 IOT instruction (core dumped) ./build/bin/offline-tts-c-api --vits-model=./vits-ljs.onnx --sid=0 |
New lexicon & tokens sha256sum vits-ljs.onnx
5bbd273797a9ecf8d94bd6ec02ad16cb41cbb85f055ad98d528ced3e44c9b31a vits-ljs.onnx |
Could you have a look If you can run on CPU and the generated wav listens normal then there is no need to send the lexicon.txt and tokens.txt. Thanks! |
I think that it works and there's just another bug in rocm itself. export ROCM_PATH=/opt/rocm-6.1.2
export HIP_VISIBLE_DEVICES=0
export ROCM_ARCH="gfx902'"
export HSA_OVERRIDE_GFX_VERSION=11.0.0
./build/bin/offline-tts-c-api \
--vits-model=./vits-ljs.onnx \
--vits-lexicon=./lexicon.txt \
--vits-tokens=./tokens.txt \
--sid=0 \
--provider=rocm \
--output-filename=./generated.wav \
'liliana, the most beautiful and lovely assistant of our team!' 2>&1 | tee gpu.txt System hangs...Screen flickering...Display server down....Log cat gpu.txt
here
:3:rocdevice.cpp :468 : 0220036292 us: [pid:4692 tid:0x73c526d91b00] Initializing HSA stack.
:3:rocdevice.cpp :528 : 0220041500 us: [pid:4692 tid:0x73c526d91b00] Enumerated GPU agents = 1
:3:rocdevice.cpp :232 : 0220041581 us: [pid:4692 tid:0x73c526d91b00] Numa selects cpu agent[0]=0x5fb6a8aa3bd0(fine=0x5fb6a8acd530,coarse=0x5fb6a9a4ae40) for gpu agent=0x5fb6a9a500b0 CPU<->GPU XGMI=0
:3:comgrctx.cpp :33 : 0220041588 us: [pid:4692 tid:0x73c526d91b00] Loading COMGR library.
:3:rocdevice.cpp :1785: 0220041963 us: [pid:4692 tid:0x73c526d91b00] Gfx Major/Minor/Stepping: 11/0/0
:3:rocdevice.cpp :1787: 0220041971 us: [pid:4692 tid:0x73c526d91b00] HMM support: 1, XNACK: 0, Direct host access: 0
:3:rocdevice.cpp :1789: 0220041972 us: [pid:4692 tid:0x73c526d91b00] Max SDMA Read Mask: 0x1, Max SDMA Write Mask: 0x1
:3:hip_context.cpp :49 : 0220043222 us: [pid:4692 tid:0x73c526d91b00] Direct Dispatch: 1
:3:hip_device.cpp :471 : 0220058354 us: [pid:4692 tid:0x73c526d91b00] hipGetDevicePropertiesR0600 ( 0x7ffff6e01f70, 0 )
:3:hip_device.cpp :473 : 0220058379 us: [pid:4692 tid:0x73c526d91b00] hipGetDevicePropertiesR0600: Returned hipSuccess :
:3:hip_device_runtime.cpp :653 : 0220398012 us: [pid:4692 tid:0x73c526d91b00] hipSetDevice ( 0 )
:3:hip_device_runtime.cpp :657 : 0220398034 us: [pid:4692 tid:0x73c526d91b00] hipSetDevice: Returned hipSuccess :
:3:hip_device_runtime.cpp :608 : 0220398038 us: [pid:4692 tid:0x73c526d91b00] hipDeviceSynchronize ( )
:3:hip_device_runtime.cpp :611 : 0220398043 us: [pid:4692 tid:0x73c526d91b00] hipDeviceSynchronize: Returned hipSuccess :
:3:hip_device.cpp :471 : 0220398047 us: [pid:4692 tid:0x73c526d91b00] hipGetDevicePropertiesR0600 ( 0x5fb6b19ffb98, 0 )
:3:hip_device.cpp :473 : 0220398052 us: [pid:4692 tid:0x73c526d91b00] hipGetDevicePropertiesR0600: Returned hipSuccess :
:3:hip_memory.cpp :777 : 0220398064 us: [pid:4692 tid:0x73c526d91b00] hipMemGetInfo ( 0x7ffff6e03158, 0x7ffff6e03160 )
:3:hip_memory.cpp :801 : 0220398074 us: [pid:4692 tid:0x73c526d91b00] hipMemGetInfo: Returned hipSuccess :
2024-07-12 16:41:38.625224858 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-07-12 16:41:38.625262851 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
:3:hip_device_runtime.cpp :653 : 0220967630 us: [pid:4692 tid:0x73c526d91b00] hipSetDevice ( 0 )
:3:hip_device_runtime.cpp :657 : 0220967647 us: [pid:4692 tid:0x73c526d91b00] hipSetDevice: Returned hipSuccess :
:3:hip_device_runtime.cpp :623 : 0220967654 us: [pid:4692 tid:0x73c526d91b00] hipGetDevice ( 0x7ffff6e01ed8 )
:3:hip_device_runtime.cpp :631 : 0220967657 us: [pid:4692 tid:0x73c526d91b00] hipGetDevice: Returned hipSuccess :
:3:hip_device.cpp :471 : 0220967661 us: [pid:4692 tid:0x73c526d91b00] hipGetDevicePropertiesR0600 ( 0x7ffff6e01ed8, 0 )
:3:hip_device.cpp :473 : 0220967665 us: [pid:4692 tid:0x73c526d91b00] hipGetDevicePropertiesR0600: Returned hipSuccess :
:3:hip_memory.cpp :599 : 0220967689 us: [pid:4692 tid:0x73c526d91b00] hipMalloc ( 0x5fb6b193feb8, 33554432 )
:3:rocdevice.cpp :2363: 0220967805 us: [pid:4692 tid:0x73c526d91b00] device=0x5fb6a9a64ea0, freeMem_ = 0x1e000000
:3:hip_memory.cpp :601 : 0220967811 us: [pid:4692 tid:0x73c526d91b00] hipMalloc: Returned hipSuccess : 0x73c3b1800000: duration: 122 us
:3:hip_context.cpp :137 : 0220967819 us: [pid:4692 tid:0x73c526d91b00] hipInit ( 0 )
:3:hip_context.cpp :143 : 0220967823 us: [pid:4692 tid:0x73c526d91b00] hipInit: Returned hipSuccess :
:3:hip_device_runtime.cpp :623 : 0220967827 us: [pid:4692 tid:0x73c526d91b00] hipGetDevice ( 0x7ffff6e022b8 )
:3:hip_device_runtime.cpp :631 : 0220967829 us: [pid:4692 tid:0x73c526d91b00] hipGetDevice: Returned hipSuccess :
:3:hip_device_runtime.cpp :623 : 0220967832 us: [pid:4692 tid:0x73c526d91b00] hipGetDevice ( 0x7ffff6e01c98 )
:3:hip_device_runtime.cpp :631 : 0220967834 us: [pid:4692 tid:0x73c526d91b00] hipGetDevice: Returned hipSuccess :
:3:hip_device.cpp :471 : 0220967837 us: [pid:4692 tid:0x73c526d91b00] hipGetDevicePropertiesR0600 ( 0x7ffff6e01c98, 0 )
:3:hip_device.cpp :473 : 0220967840 us: [pid:4692 tid:0x73c526d91b00] hipGetDevicePropertiesR0600: Returned hipSuccess :
:3:hip_memory.cpp :599 : 0220967844 us: [pid:4692 tid:0x73c526d91b00] hipMalloc ( 0x5fb6b19c6688, 33554432 )
:3:rocdevice.cpp :2363: 0220971114 us: [pid:4692 tid:0x73c526d91b00] device=0x5fb6a9a64ea0, freeMem_ = 0x1c000000
:3:hip_memory.cpp :601 : 0220971136 us: [pid:4692 tid:0x73c526d91b00] hipMalloc: Returned hipSuccess : 0x73c3a9e00000: duration: 3292 us
:3:hip_device.cpp :471 : 0220971149 us: [pid:4692 tid:0x73c526d91b00] hipGetDevicePropertiesR0600 ( 0x7ffff6e018e8, 0 )
:3:hip_device.cpp :473 : 0220971155 us: [pid:4692 tid:0x73c526d91b00] hipGetDevicePropertiesR0600: Returned hipSuccess :
:3:hip_device.cpp :471 : 0220971188 us: [pid:4692 tid:0x73c526d91b00] hipGetDevicePropertiesR0600 ( 0x7ffff6e018e8, 0 )
:3:hip_device.cpp :473 : 0220971191 us: [pid:4692 tid:0x73c526d91b00] hipGetDevicePropertiesR0600: Returned hipSuccess :
:3:hip_device_runtime.cpp :623 : 0221013862 us: [pid:4692 tid:0x73c526d91b00] hipGetDevice ( 0x7ffff6e01c00 )
:3:hip_device_runtime.cpp :631 : 0221013878 us: [pid:4692 tid:0x73c526d91b00] hipGetDevice: Returned hipSuccess :
:3:hip_memory.cpp :599 : 0221013885 us: [pid:4692 tid:0x73c526d91b00] hipMalloc ( 0x7ffff6e01c00, 1048576 )
:3:rocdevice.cpp :2363: 0221013985 us: [pid:4692 tid:0x73c526d91b00] device=0x5fb6a9a64ea0, freeMem_ = 0x1bf00000
:3:hip_memory.cpp :601 : 0221013993 us: [pid:4692 tid:0x73c526d91b00] hipMalloc: Returned hipSuccess : 0x73c3b8200000: duration: 108 us
:3:hip_memory.cpp :674 : 0221014025 us: [pid:4692 tid:0x73c526d91b00] hipMemcpy ( 0x73c3b8200000, 0x5fb6b1c64340, 768, hipMemcpyHostToDevice )
:3:rocdevice.cpp :2935: 0221014031 us: [pid:4692 tid:0x73c526d91b00] number of allocated hardware queues with low priority: 0, with normal priority: 0, with high priority: 0, maximum per priority is: 4
:3:rocdevice.cpp :3013: 0221016894 us: [pid:4692 tid:0x73c526d91b00] created hardware queue 0x73c526d7e000 with size 16384 with priority 1, cooperative: 0
:3:rocdevice.cpp :3105: 0221016909 us: [pid:4692 tid:0x73c526d91b00] acquireQueue refCount: 0x73c526d7e000 (1)
:3:devprogram.cpp :2679: 0221225178 us: [pid:4692 tid:0x73c526d91b00] Using Code Object V5. |
I will try to build a version of onnxruntime with rocm support for Windows. |
Fixes #196
Usage
(Please make sure you have installed ROCm on your computer and you have a discrete AMD GPU)
@thewh1teagle
Could you help test it? I don't have an AMD GPU and cannot test it.
Note it supports only Linux x64 at present.