-
Notifications
You must be signed in to change notification settings - Fork 15
what is option2 #19
Comments
Hi, FYI: @ahmadki Right on, we probably not perfectly commented our examples ...
This message means that the delay to start profiling is longer than the time it takes for your program to finish. If you look here: https://github.com/NVIDIA/nvtx-plugins/blob/master/examples/run_tf_session.sh nsys profile \
-d 60 \
-w true \
--force-overwrite=true \
--sample=cpu \
-t 'nvtx,cuda' \
--stop-on-exit=true \
--kill=sigkill \
-o examples/tf_session_example \
python examples/tf_session_example.py Adjust these settings and you'll be fine ;)
|
I am more looking this one:
I tried 1) and 2) individually along with their combination. But still cannot get it run. The code I used to call profiler is below:
I am interested in profiling certain places rather than a certain period of time. thanks |
Try the following: nsys profile \
-d 60 \
-w true \
--force-overwrite=true \
--sample=cpu \
-t 'nvtx,cuda' \
--stop-on-exit=true \
--kill=sigkill \
-o examples/tf_session_example \
python main.py \
--arch resnet50 \
--mode train \
--data_dir /raid/ethem/tfr_small \
--export_dir /raid/ethem/results \
--batch_size 128 \
--num_iter 1 \
--iter_unit epoch \
--results_dir /raid/ethem/results \
--display_every 10 \
--lr_init 0.01 \
--seed 12345 You don't need to combine Option 1 & 2 & 3. They are completely independent.
You don't want to profile for the whole training it doesn't make it any sense and it will hurt your performance. Profiling is designed to give you on a short script representative of the normal script. You can use some delay to account for warmup and library loading, but a profiling script doesn't more than 50 good iterations to be useful. |
@DEKHTIARJonathan thanks for the suggestion. I will try. You touched a great point, I use pyprof with pytorch, where I can control how many iterations to profile. I usually do 1 or 2 iterations (e.g., 10th iteration) which gives me what I want. The reason I tried start and end ntvx plugins to do the same thing with tensorflow. Is there a way to control this or is it just using time related parameters? |
here is the verdict: when I add "-c cudaProfilerApi" I get the message |
So it seems like -c cudaProfilerApi is not working. As far as I understand, having start and end to limit the part of the code to profile is dependent upon to this parameter (along with --stop-on-range-end true). Therefore, it also does NOT work. Please correct me if I am wrong. |
Using Instead, you can try using Of course, you can disable the capture range limit entirely by removing |
@rrforte I am trying to do profiling for only 1 iteration. In the pytorch, I use pyprof and start and stop profiling for a certain iteration. With start and stop I use |
In this example, I am seeing "option 1". What is option2?
Is there a clear example that shows how to use nvtx plugins:
I am trying to get nvtx plugin working but I am keep getting "The application terminated before the collection started. No report was generated" I am definitely doing something wrong but where.
The text was updated successfully, but these errors were encountered: