[Kernel&Prim] Fix IndexPutCudaKernel
for thread safe and add index_put_double_grad
#69095
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
Operator Mechanism
PR Types
New features
Description
Pcard-75624
Related PR: deepmodeling/deepmd-kit#4157
IndexPutCudaKernel
线程不安全的+=
加法,改为phi::CudaAtomicAdd
,避免indices
中含有重复的坐标导致结果不正确index_put
添加到api.yaml
里作为基础算子,而后在index_put_double_grad
中复用该前向算子Note
由于gradient_checker不支持indices这一
Tuple[Tensor, ..]
输入类型,因此单测仅作覆盖率测试,精度测试与pytorch对比,如下所示(accumulate=False
时前向计算的结果为赋值操作,具有不确定性,因此该前向结果不进行对比)由于使用
CudaAtomicAdd
对性能可能有影响,测试结果如下(单位:毫秒,测试100次取后80次的平均值):可以看到
CudaAtomicAdd
会略微增加计算耗时,但是不会导致算子比pytorch更慢测试脚本如下
精度测试脚本如下: