-
-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nv2a: Adjust NaN handling to be similar to HW #913
Conversation
Hey @NZJenkins I wasn't sure if you were already looking at a fix, but I started to suspect this might be partly to blame for #801 so I threw together a partial fix to unblock my debugging there. This is mostly a proof of concept at the moment as it doesn't fully match the HW behavior and because some investigation will need to be done on the performance impact (and possibly some optimization) |
hw/xbox/nv2a/shaders.c
Outdated
" return src;\n" | ||
" }\n" | ||
" ivec4 signs = floatBitsToInt(src);\n" | ||
" vec4 dest;\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stuff can be vectorized, assuming the logic will stay the same:
" return mix(src, mix(vec4(1.0), vec4(0.0), lessThan(signs, ivec4(0))), nans);\n"
glitch in NGB this PR |
@ko81e24wy where is the nearest save point so I can reproduce that? |
I think you just need to start a new game |
No not looking at a fix. For texcoord and fog outputs, additional processing happens on the VS output values before they are used in the PS, which may interpret NaN, INF, etc. differently to the mapping that happens when they are passed directly from the VS to the PS.
The Min test is using "subnormal" float values- it might be interesting to try it with the minimum "normal" values instead, and see if they behave differently to 0. |
415a071
to
b210268
Compare
There was a stupid mistake in my code that is now fixed, can you test again when you have a chance? |
I am just going to work and test with a combine version with this PR merged |
@ko81e24wy cool, thanks for double checking! I have not tried to debug 783 yet, so if this PR happens to fix the missing lighting that'd be a nice bonus. I'm not exactly sure what that screenshot should look like compared to HW though; are you saying it looks more correct with this PR than before? |
that white issue remind me of that light effect,the last PR just as before with that effect missing. |
@mysteria25 as far as I can see from the compatibility page there is no bug tracking this problem, please file one and it'll get looked into eventually (PRs are definitely not the place to report untracked issues). Grabbed by the Ghoulies has a black texture problem related to texture compression; there are many ways for things to fail so it's entirely possible what you're seeing with that game is something novel that will need a specific fix. |
b210268
to
24c5c9b
Compare
This doesn't entirely match HW behavior on my GTX 1070 but it is much closer and fixes #365 despite the remaining departure around infinite values and zeros. |
hw/xbox/nv2a/shaders.c
Outdated
" return src;\n" | ||
" }\n" | ||
" ivec4 signs = floatBitsToInt(src);\n" | ||
" return mix(src, mix(vec4(1.0), vec4(0.0), lessThan(signs, ivec4(0))), nans);\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The sign bit vectorization is neat, nice :)
- So "negative NaN" (msb set) gets 0 and "positive NaN" gets 1.0?
- Seems odd because you mentioned 0*NaN should == 0?
So under what conditions do you expect it to be 1?(nvm, I see your comment about it) I don't think this it makes sense to generalize this based on sign bit alone here. More about this thought below. - It looks like
mix(genBType)
was added in GL4.5 so this might fail in a 4.0 context. Does it work on your M1 Mac? You could cast the boolean to a float
As far as I can see in the general case, I think this NaN handling is not more correct than just using 0.0 in the event we do get a NaN (which would force all platforms to behave the same at least, but might be unnecessary if we don't have issues cataloged for it). Is there another case in vsh output that this solves for, beyond fog?
To fix this fog issue for Otogi (and now at least 1 other game (Triggerman) that is affected in the same way that I've seen), I do think it's a fair compromise to have specialized fog NaN handling (like we did with Inf previously); and if this is only intended to fix the fog issue, I think we should just stick with the specialized handling for that case.
The fishing game is a different, interesting case that appears to be setting light colors to NaN, so for this second specialized case I think it makes sense to interpret NaN as 1.0 before performing lighting calculations, so lighting is calculated correctly. Perhaps it makes sense to generalize NaNs in the uniform spaces to 1.0 for such cases but I'd have to think more about it. Needs more investigation to understand how hardware is handling that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial response:
As far as I can see in the general case, I think this NaN handling is not more correct than just using 0.0 in the event we do get a NaN (which would force all platforms to behave the same at least, but might be unnecessary if we don't have issues cataloged for it). Is there another case in vsh output that this solves for, beyond fog?
The tester itself is using the diffuse channel and the xemu output does not match HW, if that's what you're asking. I don't know of any game where this makes a difference offhand, but that doesn't mean there isn't one with graphics glitching attributable to this handling.
An example difference with all three cases in one place:
To fix this fog issue for Otogi (and now at least 1 other game (Triggerman) that is affected in the same way that I've seen), I do think it's a fair compromise to have specialized fog NaN handling (like we did with Inf previously); and if this is only intended to fix the fog issue, I think we should just stick with the specialized handling for that case.
Otogi's fog is the only case I know of (I assume someone has already looked into Triggerman and determined that just applying the fix to the fog fixes that game? I don't own that one to test). I guess the issue is balancing the performance impact of doing this check against everything in an attempt to be closer to HW behavior versus the time a contributor might end up spending tracing some glitch back to a FIXME in the shader code.
The fishing game is a different, interesting case that appears to be setting light colors to NaN, so for this second specialized case I think it makes sense to interpret NaN as 1.0 before performing lighting calculations, so lighting is calculated correctly. Perhaps it makes sense to generalize NaNs in the uniform spaces to 1.0 for such cases but I'd have to think more about it. Needs more investigation to understand how hardware is handling that case.
Hopefully @Triticum0 wouldn't mind splitting Lakemasters into a new issue given that it sounds like it's a similar symptom due to a different bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified that the shader works on macOS 12.4 (M1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abaire Lakemaster is fixed with this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1008 is due to a similar cause, INF * 0 should == 0 but is instead NaN (at least on my machine and presumably other nvidia systems since the original report was on a 3060).
In this case the incorrect number causes a chain of errors and catching it at the very end where the outputs are assigned would also lead to incorrect behavior; in the test cases I examined for #1008 the output diffuse should be set to values between 0 and 1 so blindly detecting the NaN and setting to either would do the wrong thing.
24c5c9b
to
dae6f37
Compare
Looks like limiting the NaN fix to fog will fix that regression and align with Matt's suggestion, will update and re-push in a bit. |
The highlight problem seems to be fixed with #1045 |
dae6f37
to
e272865
Compare
I test the lastest build of this PR,and it crash when I run xemu. |
e272865
to
9853c97
Compare
Bad merge, should be fixed now. Thanks for letting me know! |
@abaire could you rebase this pr with the work you did on Match inv_w qualifier to attribute qualifier, Thanks |
9853c97
to
e0de81e
Compare
Can we get another pull request or rebase or however that works so there is a working download available for this branch for testing? The one from July is expired and no longer available. |
e0de81e
to
8b62926
Compare
Any update on when this will be merged? |
When it's ready |
" bvec4 nans = isnan(src);\n" | ||
" if (!any(nans)) {\n" | ||
" return src;\n" | ||
" }\n" | ||
" ivec4 signs = floatBitsToInt(src);\n" | ||
" vec4 negative = vec4(lessThan(signs, ivec4(0)));" | ||
" return mix(src, mix(vec4(1.0), vec4(0.0), negative), vec4(nans));\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" bvec4 nans = isnan(src);\n" | |
" if (!any(nans)) {\n" | |
" return src;\n" | |
" }\n" | |
" ivec4 signs = floatBitsToInt(src);\n" | |
" vec4 negative = vec4(lessThan(signs, ivec4(0)));" | |
" return mix(src, mix(vec4(1.0), vec4(0.0), negative), vec4(nans));\n" | |
" if (isnan(src.x)) {\n" | |
" src.x = step(0.0, sign(src.x));\n" | |
" }\n" | |
" if (isnan(src.y)) {\n" | |
" src.y = step(0.0, sign(src.y));\n" | |
" }\n" | |
" if (isnan(src.z)) {\n" | |
" src.z = step(0.0, sign(src.z));\n" | |
" }\n" | |
" if (isnan(src.w)) {\n" | |
" src.w = step(0.0, sign(src.w));\n" | |
" }\n" | |
" return src;\n" |
Hi @abaire, may a suggest to change the code to what I wrote? I tried your current solution but it didn't fix anything on my PC. It behaves exactly as master. From my tests, my change behaves almost exactly as real hardware. Almost because there's a slight difference in the -NaNQ_NaNQ and -NaNS_NaNS tests cases in the N/A column, but I'm pretty sure that it's safe to ignore.
I tested a lot of games I couldn't find any regressions in any of them. Games that are fixed by this change:
- Otogi
- Otogi 2
- Pro Cast: Sports Fishing Game
- Trigger Man
HW - PC (AMD RX 6600M )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're very welcome to take this one over, I've got some other projects going at the moment and don't have much time for xemu. It'd be good to test on an nvidia card as well, I'm not sure if glsl guarantees NaN behavior so it may be manufacturer or device specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a gtx 3060 and can test nvidia, just need the build and the test program...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's too bad to heard. I will need to open a new PR then but I will link it to this one. Once is merged we can close both. Thanks for your research and I hope that you are back to xemu in the future.
@medievil1 you can download the build once I open the new PR.
Thank you for the PR! Since this work is continuing in #1780 with original author sign off, I'll close this PR now |
Fixes #365 by forcing treatment of NaN values generated in the vertex shader to follow the same pattern as HW.
Unfortunately this does not address the fact that on HW,
+/- NaN
*
0
(or very close to 0) should be0
. This difference in behavior is also responsible for #1008.This also does not address the same issue for
+/- inf
and the shader version currently used by xemu explicitly notes that operations on inf/NaN's are undefined (which may explain why Otogi textures are not blacked out on M1 mac).Tests
HW results