-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace rocprof with rocprofv2 for the tune gemm script #613
Conversation
scripts/amd/gemm/tune_gemm.py
Outdated
@@ -392,9 +392,14 @@ def main(): | |||
|
|||
|
|||
def extract_kernel_time(M, N, K, config, df, bias_size): | |||
# Correct the header by removing 'sig' and 'obj' to reduce number from 21 to 19 | |||
# once the bug is fixed, we should not need below two lines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the issue of rocprof here so that people know what you are referring to
ROCm/rocprofiler#144
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -409,7 +414,7 @@ def profile_batch_kernels(M, N, K, gpuid, gpus, jobs, verbose): | |||
kernel_name = generated_kernel_name(M, N, K, jobId) | |||
if verbose: | |||
print(f"profiling {kernel_name} on GPU {gpuid}") | |||
run_bash_command_wrapper(f"rocprof --stats -o results-{jobId}.csv python {kernel_name}", capture=(verbose < 2)) | |||
run_bash_command_wrapper(f"rocprofv2 --plugin file --plugin-version 1 --kernel-trace -o {jobId} python {generated_kernel_name(M, N, K, jobId)}", capture=(verbose < 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generated_kernel_name(M, N, K, jobId) is the same as kernel_name
, the latter is simpler.
And why did you change the output from results-{jobId}.csv to {jobId}? Is it automatically expanded to results_{jobId}.csv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is automatically expanded.
Have you checked performance on a few shapes to confirm before and after is the same? |
yes, I did @vgokhale Here is with tune_streamk using rocprof
and tune_streamk using rocprofv2
tune_gemm with rocprof on smc300x
tune_gemm with rocprofv2 on smc300x
|
rocprofv2 is much faster than rocprof
rocprofv2 naturally support python
for 8192x8192x8192, it reduce tuning time from more than an hour to 5.22 mins with full tuning space.