-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support neighbor stat on GPUs #2897
Conversation
Signed-off-by: Jinzhe Zeng <[email protected]>
… using a constant is fine Signed-off-by: Jinzhe Zeng <[email protected]>
Signed-off-by: Jinzhe Zeng <[email protected]>
Signed-off-by: Jinzhe Zeng <[email protected]>
Signed-off-by: Jinzhe Zeng <[email protected]>
Signed-off-by: Jinzhe Zeng <[email protected]>
Signed-off-by: Jinzhe Zeng <[email protected]>
Signed-off-by: Jinzhe Zeng <[email protected]>
Signed-off-by: Jinzhe Zeng <[email protected]>
Signed-off-by: Jinzhe Zeng <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## devel #2897 +/- ##
==========================================
+ Coverage 75.46% 75.87% +0.41%
==========================================
Files 244 245 +1
Lines 24522 24929 +407
Branches 1580 1615 +35
==========================================
+ Hits 18505 18916 +411
+ Misses 5086 5049 -37
- Partials 931 964 +33
☔ View full report in Codecov by Sentry. |
Let's run Test CUDA after #2892 is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not an expert on this part of the code. I will listen to Denghui's review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! If we can implement batched parallelism for this function, we should see a significant improvement in performance.
Fix #2619.
The GPU implementation in this PR is usually faster than the CPU in one thread (i.e., not using the feature implemented in #1624). Still, it needs parallelism in the batch dimension, which is blocked by #2618, regarding building the neighbor list. The GPU utilization is less than 10% for the water system. It should be improved when #2618 makes progress.