prod_env_mat: allocate GPU memory out of frame loop #2832

njzjz · 2023-09-16T06:27:24Z

Allocating GPU memory is not a cheap operator. This PR allocates memory for int_temp, uint64_temp, and tensor_list[0, 1, 3, 4, 5, 6] out of the frame loop, so they can be reused in each loop without allocating many times.
In the original code, tensor_list[3], tensor_list[4], and tensor_list[6] may need to reallocate if the memory is not enough. This behavior still exists.
The shape of tensor_list[2] is dynamic, so it is not refactored in this PR.
With CUDA enabled, unit tests for C++ and Python can pass. The examples can be performed.
The speedup can be observed when the number of frames (samples) in a batch is not small.

Signed-off-by: Jinzhe Zeng <[email protected]>

codecov · 2023-09-16T06:37:05Z

Codecov Report

Patch coverage: 9.37% and project coverage change: -0.26% ⚠️

Comparison is base (339ce47) 75.52% compared to head (c2a6aa6) 75.26%.

Additional details and impacted files

@@            Coverage Diff             @@
##            devel    #2832      +/-   ##
==========================================
- Coverage   75.52%   75.26%   -0.26%     
==========================================
  Files         242      242              
  Lines       24370    24466      +96     
  Branches     1571     1580       +9     
==========================================
+ Hits        18405    18414       +9     
- Misses       5037     5121      +84     
- Partials      928      931       +3

Files Changed	Coverage Δ
source/op/prod_env_mat_multi_device.cc	`60.42% <9.37%> (-10.54%)`	⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Jinzhe Zeng <[email protected]>

denghuilu

LGTM

njzjz added 5 commits September 16, 2023 00:48

allocate int_temp and uint64_temp out of loop

9fb24a3

Signed-off-by: Jinzhe Zeng <[email protected]>

allocate tensor_list[0,1,3,4,5,6] out of loop

4b83221

Signed-off-by: Jinzhe Zeng <[email protected]>

*=2

f373e4b

Signed-off-by: Jinzhe Zeng <[email protected]>

clean rocm

c8654ca

Signed-off-by: Jinzhe Zeng <[email protected]>

fix status

25c25be

Signed-off-by: Jinzhe Zeng <[email protected]>

github-actions bot added the OP label Sep 16, 2023

njzjz requested review from denghuilu and wanghan-iapcm September 16, 2023 06:39

clean rocm

27d8e5f

Signed-off-by: Jinzhe Zeng <[email protected]>

wanghan-iapcm approved these changes Sep 18, 2023

View reviewed changes

Merge branch 'devel' into allocate-memory-out-of-loop

c2a6aa6

Signed-off-by: Jinzhe Zeng <[email protected]>

denghuilu approved these changes Sep 20, 2023

View reviewed changes

wanghan-iapcm merged commit 7fb1d11 into deepmodeling:devel Sep 20, 2023
34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prod_env_mat: allocate GPU memory out of frame loop #2832

prod_env_mat: allocate GPU memory out of frame loop #2832

njzjz commented Sep 16, 2023

codecov bot commented Sep 16, 2023 •

edited

Loading

denghuilu left a comment

prod_env_mat: allocate GPU memory out of frame loop #2832

prod_env_mat: allocate GPU memory out of frame loop #2832

Conversation

njzjz commented Sep 16, 2023

codecov bot commented Sep 16, 2023 • edited Loading

Codecov Report

denghuilu left a comment

Choose a reason for hiding this comment

codecov bot commented Sep 16, 2023 •

edited

Loading