support neighbor stat on GPUs #2897

njzjz · 2023-10-04T06:11:52Z

The GPU implementation in this PR is usually faster than the CPU in one thread (i.e., not using the feature implemented in #1624). Still, it needs parallelism in the batch dimension, which is blocked by #2618, regarding building the neighbor list. The GPU utilization is less than 10% for the water system. It should be improved when #2618 makes progress.

Signed-off-by: Jinzhe Zeng <[email protected]>

… using a constant is fine Signed-off-by: Jinzhe Zeng <[email protected]>

Signed-off-by: Jinzhe Zeng <[email protected]>

codecov · 2023-10-04T06:22:14Z

Codecov Report

Attention: 23 lines in your changes are missing coverage. Please review.

Comparison is base (f256dff) 75.46% compared to head (0ce3376) 75.87%.
Report is 3 commits behind head on devel.

Additional details and impacted files

@@            Coverage Diff             @@
##            devel    #2897      +/-   ##
==========================================
+ Coverage   75.46%   75.87%   +0.41%     
==========================================
  Files         244      245       +1     
  Lines       24522    24929     +407     
  Branches     1580     1615      +35     
==========================================
+ Hits        18505    18916     +411     
+ Misses       5086     5049      -37     
- Partials      931      964      +33

Files	Coverage Δ
deepmd/utils/neighbor_stat.py	`96.49% <100.00%> (+1.75%)`	⬆️
source/op/custom_op.h	`100.00% <ø> (ø)`
source/op/prod_env_mat_multi_device.cc	`73.35% <100.00%> (+12.92%)`	⬆️
source/op/neighbor_stat.cc	`72.61% <78.09%> (+6.26%)`	⬆️

... and 16 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

njzjz · 2023-10-04T18:38:22Z

Let's run Test CUDA after #2892 is merged.

wanghan-iapcm

I am not an expert on this part of the code. I will listen to Denghui's review.

denghuilu

Looks good to me! If we can implement batched parallelism for this function, we should see a significant improvement in performance.

njzjz added 10 commits October 3, 2023 22:11

add the GPU kernel for NeighborStat (first working version)

70f87ed

Signed-off-by: Jinzhe Zeng <[email protected]>

we don't need to calculate the maximum neighbor number of each frame;…

fd4f62d

… using a constant is fine Signed-off-by: Jinzhe Zeng <[email protected]>

make neighbor atoms in parallel

ce92bac

Signed-off-by: Jinzhe Zeng <[email protected]>

use tf.reduce_min instead of np.min

36fb319

Signed-off-by: Jinzhe Zeng <[email protected]>

use GPU to max

1f44ebc

Signed-off-by: Jinzhe Zeng <[email protected]>

use mem_nnei; make several variable to be class attr

a730226

Signed-off-by: Jinzhe Zeng <[email protected]>

use shared memory

b92f0b6

Signed-off-by: Jinzhe Zeng <[email protected]>

do not sqrt for every dist; instead do it in the final

5bbc62e

Signed-off-by: Jinzhe Zeng <[email protected]>

revert gpu_cuda/rocm.h

685e34c

Signed-off-by: Jinzhe Zeng <[email protected]>

clean unused codes

0ce3376

Signed-off-by: Jinzhe Zeng <[email protected]>

njzjz requested a review from wanghan-iapcm October 4, 2023 06:11

github-actions bot added Python Core CUDA ROCM OP labels Oct 4, 2023

njzjz linked an issue Oct 4, 2023 that may be closed by this pull request

[Feature Request] support neighbor stat on GPUs #2619

Closed

njzjz requested a review from denghuilu October 4, 2023 06:16

wanghan-iapcm reviewed Oct 5, 2023

View reviewed changes

wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Oct 5, 2023

github-actions bot removed the Test CUDA Trigger test CUDA workflow label Oct 5, 2023

denghuilu approved these changes Oct 7, 2023

View reviewed changes

wanghan-iapcm merged commit da100dc into deepmodeling:devel Oct 7, 2023
37 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support neighbor stat on GPUs #2897

support neighbor stat on GPUs #2897

njzjz commented Oct 4, 2023 •

edited

Loading

codecov bot commented Oct 4, 2023 •

edited

Loading

njzjz commented Oct 4, 2023

wanghan-iapcm left a comment

denghuilu left a comment

support neighbor stat on GPUs #2897

support neighbor stat on GPUs #2897

Conversation

njzjz commented Oct 4, 2023 • edited Loading

codecov bot commented Oct 4, 2023 • edited Loading

Codecov Report

njzjz commented Oct 4, 2023

wanghan-iapcm left a comment

Choose a reason for hiding this comment

denghuilu left a comment

Choose a reason for hiding this comment

njzjz commented Oct 4, 2023 •

edited

Loading

codecov bot commented Oct 4, 2023 •

edited

Loading