Replace thrust::device_vector with torch::Tensor #17

mthrok · 2025-01-15T21:53:55Z

thrust::device_vector owns the underlying CUDA memory, and it allocates/deallocates for every invocation of solve_cuda_batch.

Since this code is written for Torch, it is more efficient to rely on Torch's CUDA caching allocator, which virtually eliminates the memory allocation/deallocation in TLA code.

Before
batch_linear_assignment took 7ms (3ms is device query which is addressed in Query device only once #16)
The majority of time is cudaFree. (2.6ms)
After
batch_linear_assignment takes 0.37 ms.

thrust::device_vector owns the underlying CUDA memory, and it allocates/deallocates for every invocation of `solve_cuda_batch`. Since this code is written for Torch, it is more efficient to rely on Torch's CUDA caching allocator, which virtually eliminates the memory allocation/deallocation in TLA code.

mthrok · 2025-01-15T21:55:40Z

src/torch_linear_assignment_cuda_kernel.cu

@@ -265,6 +271,7 @@ std::vector<torch::Tensor> batch_linear_assignment_cuda(torch::Tensor cost) {

  AT_DISPATCH_FLOATING_TYPES(cost.scalar_type(), "solve_cuda_batch", [&] {
    solve_cuda_batch<scalar_t>(
+        cost.scalar_type(),


note: if we initialize u, v, shortestPathCosts before invoking solve_cuda_batch, then we don't have to pass scalar_type as argument. I took this approach because the diff is easier to see.

ivan-chai · 2025-01-16T07:39:27Z

Thank you for your excellent work profiling the code and identifying the bottleneck! I will make a new release at Pypi soon.

ivan-chai · 2025-01-16T08:42:32Z

The updated version 0.0.3 is on PyPi.

mthrok commented Jan 15, 2025

View reviewed changes

ivan-chai merged commit 346784e into ivan-chai:main Jan 16, 2025
2 checks passed

mthrok deleted the torch branch January 16, 2025 12:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace thrust::device_vector with torch::Tensor #17

Replace thrust::device_vector with torch::Tensor #17

mthrok commented Jan 15, 2025 •

edited

Loading

mthrok Jan 15, 2025

ivan-chai commented Jan 16, 2025

ivan-chai commented Jan 16, 2025

Replace thrust::device_vector with torch::Tensor #17

Replace thrust::device_vector with torch::Tensor #17

Conversation

mthrok commented Jan 15, 2025 • edited Loading

mthrok Jan 15, 2025

Choose a reason for hiding this comment

ivan-chai commented Jan 16, 2025

ivan-chai commented Jan 16, 2025

mthrok commented Jan 15, 2025 •

edited

Loading