Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the correct way to present a high-resolution gpu-computed image with minimal overhead? #8590

Open
r0levrai opened this issue Oct 5, 2024 · 3 comments

Comments

@r0levrai
Copy link

r0levrai commented Oct 5, 2024

Taichi noob here :D

What is the correct way to present a high-resolution gpu-computed image with minimal overhead?

Is there a way to directly write to a render target / framebuffer from a taichi kernel with a constrained format (or even writing to a provided data structure)? Some kind of fast_gui=True "I don't need any conversion/copy/clear" option for ggui?

Background

Let's say I want to show a dead simple kernel like this one from the taichi python examples:

@ti.kernel
def render(t: float):
    for i, j in img:
        a = ti.Vector([i / res[0], j / res[1] + 2, i / res[0] + 4])
        img[i, j] = ti.cos(a + t) * 0.5 + 0.5

...at native resolution and framerate (2560x1440@165fps), which should be a walk in the park for the laptop 3070 I'm testing this on. However,

  • Using the old gui in zero-copying frame buffer mode (fast_gui=True), the presentation takes 36ms with 3 different bottlenecks (line profile detail: cuda, vukan). Not sure but seems like a cpu-side conversion, copy and clear.

  • Using ggui, it seems the presentation still takes about 12ms (twice my entire frame budget) in canvas.set_image(img) and window.show() (line profile detail: cuda, vukan). I'm not sure why this take so much time since I think we're on gpu-land here.

  • Using an external more game-oriented presentation layer might be another option?

repro: test_taichi.py

related: #8466 #5438 #4922

Note that this is easy to run into when first learning/evaluating taichi from a gamedev/game engine - adjascent background (the first chapters of the manuals talk a lot about performance, so the first thing that come to mind is upping the minuscules hello world resolutions to see how the shaders scale there), and running into presentation api bottleneck at this point can be pretty confusing. Would be glad to propose a PR for the documentation if I get this figured out.

@oliver-batchelor
Copy link
Contributor

I have also been searching for a way to do this, so I did some digging just now. I think we can use an OpenGL library e.g. moderngl to create a VBO then use something like this to map the VBO to a cuda array (and then wrap the cuda array in pytorch so it can be used with taichi directly).

https://gist.github.com/stgatilov/0bb58bf5296c3dfabd2eecd8dbf42237

@r0levrai
Copy link
Author

r0levrai commented Nov 3, 2024

Wow!
I didn't dig around this much, but hoped taichi would have an API for mapping to a VBO with less code and more backend coverage.
But thank you very much, that looks like a promising start!

@oliver-batchelor
Copy link
Contributor

Seems like Taichi ImGui uses Vulkan, so I would guess there's some way to build such a thing in - i.e. draw a cuda image directly.

To be honest I'm more keen on figuring out how to do it generically for OpenGL as I'm using Qt for our application(s) and I could draw on the Qt OpenGL widget.

I'm keen to hear how you get on - if I have a chance, I'll play with this at some point, but it's quite low on the priority list!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Untriaged
Development

No branches or pull requests

2 participants