Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prod_env_mat: allocate GPU memory out of frame loop #2832

Merged
merged 7 commits into from
Sep 20, 2023

Conversation

njzjz
Copy link
Member

@njzjz njzjz commented Sep 16, 2023

Allocating GPU memory is not a cheap operator. This PR allocates memory for int_temp, uint64_temp, and tensor_list[0, 1, 3, 4, 5, 6] out of the frame loop, so they can be reused in each loop without allocating many times.
In the original code, tensor_list[3], tensor_list[4], and tensor_list[6] may need to reallocate if the memory is not enough. This behavior still exists.
The shape of tensor_list[2] is dynamic, so it is not refactored in this PR.
With CUDA enabled, unit tests for C++ and Python can pass. The examples can be performed.
The speedup can be observed when the number of frames (samples) in a batch is not small.

Signed-off-by: Jinzhe Zeng <[email protected]>
Signed-off-by: Jinzhe Zeng <[email protected]>
Signed-off-by: Jinzhe Zeng <[email protected]>
@github-actions github-actions bot added the OP label Sep 16, 2023
@codecov
Copy link

codecov bot commented Sep 16, 2023

Codecov Report

Patch coverage: 9.37% and project coverage change: -0.26% ⚠️

Comparison is base (339ce47) 75.52% compared to head (c2a6aa6) 75.26%.

Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #2832      +/-   ##
==========================================
- Coverage   75.52%   75.26%   -0.26%     
==========================================
  Files         242      242              
  Lines       24370    24466      +96     
  Branches     1571     1580       +9     
==========================================
+ Hits        18405    18414       +9     
- Misses       5037     5121      +84     
- Partials      928      931       +3     
Files Changed Coverage Δ
source/op/prod_env_mat_multi_device.cc 60.42% <9.37%> (-10.54%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Jinzhe Zeng <[email protected]>
Copy link
Member

@denghuilu denghuilu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wanghan-iapcm wanghan-iapcm merged commit 7fb1d11 into deepmodeling:devel Sep 20, 2023
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants