Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float atomic add/subtract/min/max #367

Open
Devaniti opened this issue Jan 18, 2025 · 0 comments
Open

Float atomic add/subtract/min/max #367

Devaniti opened this issue Jan 18, 2025 · 0 comments
Labels
enhancement New feature or request needs-triage

Comments

@Devaniti
Copy link

Is your feature request related to a problem? Please describe.

One usecase for this feature is GPU simulations. One example is in MPM-MLS algorithm, you have a grid of values, that you update from each particle you have in simulation. And since each grid node can be updated from number of particles, you need to update grid via atomics. Another example is aggregating data particles or grid nodes in some other way, for example counting avarage velocity or forces those may apply to objects.

Another usecase is calculating bounding boxes of programmably generated meshes, or for something you can't offline process for some reason. You can just do atomic min/max with value of each position.

Describe the solution you'd like

Intrinsics that allow usage of float32 atomic add/subtract/min/max, similar to ones int32 has already

Describe alternatives you've considered
There are 2 alternatives:

  1. Converting floats to fixed point
  • This works in many circumstances, but have few nuances:
    • You first need to balance tradeoff between representable range and precision. Native floats are better at keeping relative precision.
    • Fixed point floats can overflow without any way to detect that. Native floats can handle overflow.
  1. Using CAS loop to emulate float atomics
  • This works in some circumstances, but has numerous nuances:
    • If hardware have native float atomics, performance should be better with those
    • If too many threads try to do atomic operation on a single value, contention between threads may get so high that shader times out and causes DEVICE_HUNG

Additional context

Since atomic add/subtract is not associative, results will not be deterministic. This is fine in most circumstances. Additionally, vendors should be free to perform certain optimizations that affect order of operations, for example do wave sum before doing atomic once per wave. This would also change order of operations, but if it is not consistent either way, it should be fine.
Atomic min/max won't have any catches like this though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs-triage
Projects
Status: No status
Development

No branches or pull requests

1 participant