-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unoptimized gamma correction shader math in crt-pi #35
Comments
Yeah, probably just done that way for code clarity. It'd be worth looking at the assembly to see how much of a difference it makes. |
I'd be really surprised if any compiler knew to optimize a squaring and
subsequent square root into one operation. The assignments, probably.
How can I compile this to assembly and check the output? Does OpenGL have
an app for that, or do I just do something with GCC? Not used to GL shaders.
On Mon, Oct 9, 2017 at 12:03 PM hizzlekizzle ***@***.***> wrote:
Yeah, probably just done that way for code clarity. It'd be worth looking
at the assembly to see how much of a difference it makes.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#35 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA-SsuzmUAvY7hzypNjR2-2k9woYy6Oqks5sqkO7gaJpZM4Pxk3D>
.
--
Mike
|
That's a good question. I've used fxc.exe for HLSL shaders, but there doesn't seem to be anything as universally easy to use for GLSL, which probably shouldn't surprise me... However, it seems this Radeon GPU Analyzer from AMD may be able to do it: |
It gains about 15-20 fps this way in my test. 668 after, 650 before, thats 2-3% difference. |
There's a few unoptimized lines of code in the gamma correction part of crt-pi.glsl, which is linked for reference here:
https://github.com/libretro/glsl-shaders/blob/master/crt/shaders/crt-pi.glsl
Gamma correction has been noted to be a potential source of slowdown in the code, and also in this thread here. However, all of the math here is really unoptimized, which is likely what is causing the slowdown.
Gamma correction is done on line 190-208. For reference here:
If we assume SCANLINES, GAMMA and FAKE_GAMMA are all defined, the above reduces to the following:
Is there a reason it's being done like this? All of that is equivalent to
This saves one multiplication and three assignments per loop! We avoid the unnecessary squaring and subsequent square rooting of
colour
, and we also don't need to updatescanLineWeight
as it's never used again in this scope.we'
I don't know how much the assignments matter or if they're optimized out anyway, but fighting with the emulator over memory accesses has been noted as one of the major causes of slowdown, so worth bringing up...
There's a similar (but slightly trickier) thing you can do with the true gamma correction, not just FAKE_GAMMA, but I'll start here for now to see if I'm on the right wavelength...
The text was updated successfully, but these errors were encountered: