Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
prod_env_mat: allocate GPU memory out of frame loop (#2832)
Allocating GPU memory is not a cheap operator. This PR allocates memory for `int_temp`, `uint64_temp`, and `tensor_list[0, 1, 3, 4, 5, 6]` out of the frame loop, so they can be reused in each loop without allocating many times. In the original code, `tensor_list[3]`, `tensor_list[4]`, and `tensor_list[6]` may need to reallocate if the memory is not enough. This behavior still exists. The shape of `tensor_list[2]` is dynamic, so it is not refactored in this PR. With CUDA enabled, unit tests for C++ and Python can pass. The examples can be performed. The speedup can be observed when the number of frames (samples) in a batch is not small. --------- Signed-off-by: Jinzhe Zeng <[email protected]>
- Loading branch information