Skip to content

Profile rocFFT kernels

tingxingdong edited this page Jan 17, 2018 · 11 revisions

On AMD GPU

By environment variable

in bash: "export HIP_TRACE_API=1" (reset by =0)

Launch your application, then it profiles every HIP APIs, including rocFFT kernels, memory copy and allocation/deallocation.

For more profiling tools, see Profiling and Debugging HIP Code

The IR and ISA can be dumped by setting the following environment variable before building and running the app.

export KMDUMPISA=1

export KMDUMPLLVM=1

export KMDUMPDIR=/path/to/dump

By roprof and CodeXL

roprof is a tool very similar to nvprof. roprof is a command line tool to profile HIP kernels, roprof is located in /opt/rocm/profiler/bin

example usage

/opt/rocm/profiler/bin/rcprof -A ./your_executable Then the dumped output apitrace.atp will be in your home directory.

View is with CodeXL GUI. Download and install CodeXL

Open CodeXL and create a project. Import the *.atp into the session. Notice: switch to profile mode and clock HSA mode (by default OpenCL mode) before importing the *.atp

/opt/rocm/profiler/bin/rcprof --help for more options

On NVIDIA GPU

"nvprof ./your_executable" to profile every CUDA runtime invocations including kernels, memory copy.