Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use indirect dispatch without "check_order" optimization #4

Open
arcman7 opened this issue Aug 25, 2024 · 4 comments
Open

Use indirect dispatch without "check_order" optimization #4

arcman7 opened this issue Aug 25, 2024 · 4 comments

Comments

@arcman7
Copy link

arcman7 commented Aug 25, 2024

Hi there again!

I was wondering, are there any consequences that you're aware of if I were to use indirect dispatching without the "check_order" optimization enabled?

In my scenario I would be running a pre-processing step prior to calling the RadixSortKernel. The keys and values buffers will be updated frequently. If I'm able to determine the dispatch sizes necessary for all pipelines used by the RadixSortKernel in my pre-processing shader, I'd be able to use the dispatchPipelinesIndirect method - is that correct?

@kishimisu
Copy link
Owner

kishimisu commented Aug 26, 2024

Hey!

There are a few things to take into account but it's not that hard :)
I've created a new branch that includes an additional use_indirect_dispatch boolean parameter that can be used if check_order is disabled. I can push it to main if you find this parameter useful.

However I would suggest reading the Order Checking section in the readme. During my testing I've observed that using indirect dispatch for the compute passes resulted in slower performances, that's why I didn't include the option.
I would be curious to see if it's faster for you!

@arcman7
Copy link
Author

arcman7 commented Aug 27, 2024

During my testing I've observed that using indirect dispatch for the compute passes resulted in slower performances, that's why I didn't include the option.
I would be curious to see if it's faster for you!

It is odd... I've noticed that as well just by testing out different settings on your demo page. Right now my only guess is it has something to do with the number of pipelines that are created and the corresponding volume of data getting uploaded to GPU memory. I noticed though that if the number of sorted elements is large enough, the check order optimization does start to pay off in terms of reducing the time it takes to sort.

Taking a look at your branch now

@arcman7
Copy link
Author

arcman7 commented Aug 29, 2024

No updates as of yet - still integrating your indirect branch with some specific modifications that I need.

I did have a small question though -

What's the difference between WORKGROUP_COUNT and num_workgroups just below on line 19?

@kishimisu
Copy link
Owner

Sorry for the late reply, they both represent the number of workgroups (or dispatch size) in the current pass, only in different formats:
num_workgroup is a builtin vec3 containing the number of workgroups in each dimension, and WORKGROUP_COUNT is a constant containing the total number of workgroups (x * y * z)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants