Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

America's Army: Rise of a Soldier: Floor texture Broken #1008

Closed
Triticum0 opened this issue Jun 1, 2022 · 7 comments · Fixed by #1045
Closed

America's Army: Rise of a Soldier: Floor texture Broken #1008

Triticum0 opened this issue Jun 1, 2022 · 7 comments · Fixed by #1045
Labels
bug Something isn't working

Comments

@Triticum0
Copy link

Title

https://xemu.app/titles/55530043/#America-s-Army-Rise-of-a-Soldier

Bug Description

When going in-game the floor textures black
xemu-2022-06-01-19-23-53

Expected Behavior

It should be green

xemu Version

Version: 0.7.25
Branch: master
Commit: 7d6da22
Date: Tue May 31 08:20:36 PM UTC 2022

System Information

Field Value
OS Windows 10
CPU AMD Ryzen 5 2600 Six-Core Processo
Graphics Device NVIDIA GeForce RTX 3060 Ti/PCIe/SSE2
Graphics Driver 4.0.0 NVIDIA 512.95

Additional Context

No response

@Triticum0 Triticum0 added the bug Something isn't working label Jun 1, 2022
@abaire
Copy link
Contributor

abaire commented Jun 4, 2022

There are four textures involved, three are DXT1 (NV097_SET_TEXTURE_FORMAT_COLOR_L_DXT1_A1R5G5B5) and one is NV097_SET_TEXTURE_FORMAT_COLOR_SZ_A8R8G8B8

Relevant combiner:

// Stage 0
ab.rgb = clamp(vec3(dot(t1.rgb, c0_0.rgb)), -1.0, 1.0);
r1.rgb = ab.rgb;
r1.a = ab.b;
// Stage 1
ab.rgb = clamp(vec3((t0.rgb * r1.rgb)), -1.0, 1.0);
ab.a = clamp(((t0.a * r1.a)), -1.0, 1.0);
r0.rgb = ab.rgb;
r0.a = ab.a;
// Stage 2
ab.rgb = clamp(vec3(dot(t1.rgb, c0_2.rgb)), -1.0, 1.0);
r1.rgb = ab.rgb;
r1.a = ab.b;
// Stage 3
mux_sum.rgb = clamp(vec3(((t2.rgb * r1.rgb) + ((1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0)) * r0.rgb))), -1.0, 1.0);
mux_sum.a = clamp((((t2.a * r1.a) + ((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * r0.a))), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
r0.a = mux_sum.a;
// Stage 4
ab.rgb = clamp(vec3(dot(t1.rgb, c0_4.rgb)), -1.0, 1.0);
r1.rgb = ab.rgb;
r1.a = ab.b;
// Stage 5
mux_sum.rgb = clamp(vec3(((t3.rgb * r1.rgb) + ((1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0)) * r0.rgb))), -1.0, 1.0);
mux_sum.a = clamp((((t3.a * r1.a) + ((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * r0.a))), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
r0.a = mux_sum.a;
// Stage 6
ab.rgb = clamp(vec3(((r0.rgb * v0.rgb) * 2.0)), -1.0, 1.0);
ab.a = clamp((((r0.a * v0.a) * 2.0)), -1.0, 1.0);
r0.rgb = ab.rgb;
r0.a = ab.a;
// Final Combiner
fragColor.rgb = max(vec4(0.0).rgb, 0.0) + mix(vec3(max(pFog.rgb, 0.0)), vec3(max(r0.rgb, 0.0)), vec3(max(pFog.aaa, 0.0)));
fragColor.a = max(r0.a, 0.0);

Constants:

c0_0 1.00, 0.00, 0.00, 0.00 
c0_2 0.00, 1.00, 0.00, 0.00
c0_4 0.00, 0.00, 1.00, 0.00

As a quick verification, I set the fragColor to each of the textures and each one produced the output I'd expect, so something is going wrong in the combiner. It does use fog, but I verified that r0.rgb is wrong (and fog is entirely disabled in the frame I captured)

@abaire
Copy link
Contributor

abaire commented Jun 4, 2022

Looks like stage 6 of the combiner is what blacks it out: ab.rgb = clamp(vec3(((r0.rgb * v0.rgb) * 2.0)), -1.0, 1.0); ends up being black as v0 is all black from the vertex shader.

Vertex shader is quite complex:

  /* Slot 0: 0x00000000 0x004F221B 0x18373800 0x2F600000 */
  MUL(R6,xyzw, v1, c[121]);

  /* Slot 1: 0x00000000 0x004F061B 0x38371800 0x2F400000 */
  MUL(R4,xyzw, v3, c[120]);

  /* Slot 2: 0x00000000 0x02F0001B 0x64361BFC 0x2850181C */
  DP4(_temp_vec,x, R6, c[128]);
  MOV(oD0,w, v0.w);
  R5.x = _temp_vec.x;

  /* Slot 3: 0x00000000 0x00F0201B 0x64363800 0x24500000 */
  DP4(R5,y, R6, c[129]);

  /* Slot 4: 0x00000000 0x00F0401B 0x64365800 0x22500000 */
  DP4(R5,z, R6, c[130]);

  /* Slot 5: 0x00000000 0x00F0601B 0x64367800 0x21500000 */
  DP4(R5,w, R6, c[131]);

  /* Slot 6: 0x00000000 0x0071001A 0x6400146A 0x3EB00000 */
  ADD(R11,xyz, R6.xyz, -c[136].xyz);

  /* Slot 7: 0x00000000 0x00712058 0x64001562 0x77A00000 */
  ADD(R10,yzw, R6.yyzx, -c[137].yyzx);

  /* Slot 8: 0x00000000 0x0040001A 0xB4356800 0x2E200000 */
  MUL(R2,xyz, R11.xyz, R11.xyz);

  /* Slot 9: 0x00000000 0x00B0E0AA 0x7C344800 0x28200000 */
  DP3(R2,x, c[135].z, R2.xyz);

  /* Slot 10: 0x00000000 0x0040005B 0xA4B74800 0x27200000 */
  MUL(R2,yzw, R10.yyzw, R10.yyzw);

  /* Slot 11: 0x00000000 0x0071401A 0x6400146A 0xBEB00000 */
  ADD(R11,xyz, R6.xyz, -c[138].xyz);

  /* Slot 12: 0x00000000 0x08EC001B 0x64361800 0x90A88800 */
  DP4(oPos,x, R6, c[96]);
  RSQ(R1,x, R2.x);

  /* Slot 13: 0x00000000 0x00B0E0AA 0x7DB44800 0x24200000 */
  DP3(R2,y, c[135].z, R2.wyz);

  /* Slot 14: 0x00000000 0x00400058 0xB4B16800 0x27B00000 */
  MUL(R11,yzw, R11.yyzx, R11.yyzx);

  /* Slot 15: 0x00000000 0x0071601A 0x6400146A 0xFEA00000 */
  ADD(R10,xyz, R6.xyz, -c[139].xyz);

  /* Slot 16: 0x00000000 0x01000000 0x24002800 0x2E900000 */
  DST(R9,xyz, R2.x, R1.x);

  /* Slot 17: 0x00000000 0x08B0E0AA 0x7DB56954 0x98280000 */
  DP3(_temp_vec,x, c[135].z, R11.wyz);
  RSQ(R1,x, R2.y);
  R2.x = _temp_vec.x;

  /* Slot 18: 0x00000000 0x00400058 0xA4B14800 0x27B00000 */
  MUL(R11,yzw, R10.yyzx, R10.yyzx);

  /* Slot 19: 0x00000000 0x01000055 0x24002800 0x2EA00000 */
  DST(R10,xyz, R2.y, R1.x);

  /* Slot 20: 0x00000000 0x08B0E0AA 0x7DB56800 0x94280000 */
  DP3(_temp_vec,y, c[135].z, R11.wyz);
  RSQ(R1,x, R2.x);
  R2.y = _temp_vec.y;

  /* Slot 21: 0x00000000 0x00B2001A 0x94341800 0x28B00000 */
  DP3(R11,x, R9.xyz, c[144].xyz);

  /* Slot 22: 0x00000000 0x01000000 0x24002800 0x2E800000 */
  DST(R8,xyz, R2.x, R1.x);

  /* Slot 23: 0x00000000 0x08B2201A 0xA4343954 0x94B80000 */
  DP3(_temp_vec,y, R10.xyz, c[145].xyz);
  RSQ(R1,x, R2.y);
  R11.y = _temp_vec.y;

  /* Slot 24: 0x00000000 0x05720055 0x95FE1802 0xD1040000 */
  SLT(_temp_vec,w, R9.y, c[144].w);
  RCP(R1,y, R11.x);
  R0.w = _temp_vec.w;

  /* Slot 25: 0x00000000 0x05000055 0x24002956 0xDEB20000 */
  DST(_temp_vec,xyz, R2.y, R1.x);
  RCP(R1,z, R11.y);
  R11.xyz = _temp_vec.xyz;

  /* Slot 26: 0x00000000 0x00518055 0x14359800 0x2E700000 */
  MUL(R7,xyz, R1.y, c[140].xyz);

  /* Slot 27: 0x00000000 0x00B2401A 0x84345800 0x21B00000 */
  DP3(R11,w, R8.xyz, c[146].xyz);

  /* Slot 28: 0x00000000 0x0040001A 0x75FE0800 0x2E700000 */
  MUL(R7,xyz, R7.xyz, R0.w);

  /* Slot 29: 0x00000000 0x0451A0AA 0x14B1BBFE 0xDBA80000 */
  MUL(_temp_vec,xzw, R1.z, c[141].yyzx);
  RCP(R1,x, R11.w);
  R10.xzw = _temp_vec.xzw;

  /* Slot 30: 0x00000000 0x01722055 0xA5FE3800 0x21000000 */
  SLT(R0,w, R10.y, c[145].w);

  /* Slot 31: 0x00000000 0x00B2601A 0xB4347800 0x28B00000 */
  DP3(R11,x, R11.xyz, c[147].xyz);

  /* Slot 32: 0x00000000 0x00600009 0x08001025 0xDB800000 */
  ADD(R8,xzw, v0.xxzy, R7.xxzy);

  /* Slot 33: 0x00000000 0x044000CA 0xA5FE0802 0xDE740000 */
  MUL(_temp_vec,xyz, R10.wxz, R0.w);
  RCP(R1,y, R11.x);
  R7.xyz = _temp_vec.xyz;

  /* Slot 34: 0x00000000 0x0051C000 0x14B1D800 0x2BB00000 */
  MUL(R11,xzw, R1.x, c[142].yyzx);

  /* Slot 35: 0x00000000 0x01724055 0x85FE5800 0x21000000 */
  SLT(R0,w, R8.y, c[146].w);

  /* Slot 36: 0x00000000 0x0060003A 0x84001069 0xDE800000 */
  ADD(R8,xyz, R8.xwz, R7.xyz);

  /* Slot 37: 0x00000000 0x004000CA 0xB5FE0800 0x2E700000 */
  MUL(R7,xyz, R11.wxz, R0.w);

  /* Slot 38: 0x00000000 0x0051E055 0x14B1F800 0x2BB00000 */
  MUL(R11,xzw, R1.y, c[143].yyzx);

  /* Slot 39: 0x00000000 0x01726055 0xB5FE7800 0x21000000 */
  SLT(R0,w, R11.y, c[147].w);

  /* Slot 40: 0x00000000 0x00EC201B 0x64363800 0x20B04800 */
  DP4(oPos,y, R6, c[97]);

  /* Slot 41: 0x00000000 0x00EC401B 0x64365800 0x28002800 */
  DP4(oPos,z, R6, c[98]);
  DP4(R0,x, R6, c[98]);

  /* Slot 42: 0x00000000 0x00EC601B 0x64367800 0x20A01800 */
  DP4(oPos,w, R6, c[99]);

  /* Slot 43: 0x00000000 0x0060001A 0x84001069 0xDE800000 */
  ADD(R8,xyz, R8.xyz, R7.xyz);

  /* Slot 44: 0x00000000 0x064000CA 0xB5FE0BFF 0x1E780000 */
  MUL(_temp_vec,xyz, R11.wxz, R0.w);
  RCC(R1,x, R12.w);
  R7.xyz = _temp_vec.xyz;

  /* Slot 45: 0x00000000 0x02ED001B 0x44371800 0x1820F82C */
  DP4(_temp_vec,x, R4, c[104]);
  MOV(oFog,xyzw, R0.x);
  R2.x = _temp_vec.x;

  /* Slot 46: 0x00000000 0x00ED201B 0x44373800 0x24200000 */
  DP4(R2,y, R4, c[105]);

  /* Slot 47: 0x00000000 0x00ED401B 0x44375800 0x22200000 */
  DP4(R2,z, R4, c[106]);

  /* Slot 48: 0x00000000 0x00EC801B 0x54369800 0x21B00000 */
  DP4(R11,w, R5, c[100]);

  /* Slot 49: 0x00000000 0x02ECA01B 0x5436B86C 0x94B0E854 */
  DP4(_temp_vec,y, R5, c[101]);
  MOV(oT1,xyz, R2);
  R11.y = _temp_vec.y;

  /* Slot 50: 0x00000000 0x00ECC01B 0x5436D800 0x22B00000 */
  DP4(R11,z, R5, c[102]);

  /* Slot 51: 0x00000000 0x00ED801B 0x54379800 0x28200000 */
  DP4(R2,x, R5, c[108]);

  /* Slot 52: 0x00000000 0x02EDA01B 0x5437BB62 0xD420E84C */
  DP4(_temp_vec,y, R5, c[109]);
  MOV(oT0,xyz, R11.wyzx);
  R2.y = _temp_vec.y;

  /* Slot 53: 0x00000000 0x00EDC01B 0x5437D800 0x22200000 */
  DP4(R2,z, R5, c[110]);

  /* Slot 54: 0x00000000 0x00EE001B 0x54361800 0x21B00000 */
  DP4(R11,w, R5, c[112]);

  /* Slot 55: 0x00000000 0x02EE201B 0x5436386C 0x94B0E85C */
  DP4(_temp_vec,y, R5, c[113]);
  MOV(oT2,xyz, R2);
  R11.y = _temp_vec.y;

  /* Slot 56: 0x00000000 0x00EE401B 0x54365800 0x22B00000 */
  DP4(R11,z, R5, c[114]);

  /* Slot 57: 0x00000000 0x0047401A 0xC4355800 0x20A0E800 */
  MUL(oPos,xyz, R12.xyz, c[58].xyz);

  /* Slot 58: 0x00000000 0x0060001A 0x84001069 0xD0A0E818 */
  ADD(oD0,xyz, R8.xyz, R7.xyz);

  /* Slot 59: 0x00000000 0x00ED601B 0x44377800 0x20B01850 */
  DP4(oT1,w, R4, c[107]);

  /* Slot 60: 0x00000000 0x00ECE01B 0x5436F800 0x20B01848 */
  DP4(oT0,w, R5, c[103]);

  /* Slot 61: 0x00000000 0x00EDE01B 0x5437F800 0x20A01858 */
  DP4(oT2,w, R5, c[111]);

  /* Slot 62: 0x00000000 0x00EE601B 0x54367800 0x20A01860 */
  DP4(oT3,w, R5, c[115]);

  /* Slot 63: 0x00000000 0x02000000 0x08001362 0xD0A0E864 */
  MOV(oT3,xyz, R11.wyzx);

  /* Slot 64: 0x00000000 0x0087601A 0xC400286A 0xF0B0E801 */
  MAD(oPos,xyz, R12.xyz, R1.x, c[59].xyz);

oD0.a comes from v0.w
oD0.rgb comes from a chain of operations

@abaire
Copy link
Contributor

abaire commented Jun 8, 2022

I wrote some tooling to make it easier to process these large shaders.

Here's the (huge) oD0.rgb chain:

; Inputs:
; c[121] = (2.0, 2.0, 2.0, 1.0)
; c[135] = (0.0, 0.5, 1.0, 3.0)
; c[136] = (0.9983897, -0.0101479, -0.0558118, 0.0)
; c[137] = (-0.0567268, -0.178603, -0.9822846, 0.0)
; c[138] = (0.0, 0.9838689, -0.1788911, 0.0)
; c[139] = (31.5401363, -20.6333656, 67.6486282, 1.0)
; c[140] = (0.0, 0.0, 0.0, 0.0)
; c[141] = (0.0, 0.0, 0.0, 0.0)
; c[142] = (0.0, 0.0, 0.0, 0.0)
; c[143] = (0.0, 0.0, 0.0, 0.0)
; c[144] = (0.0, 0.0, 0.0, 0.0)
; c[145] = (1.0, 0.0, 0.0, 0.0)
; c[146] = (0.0, 1.0, 0.0, 0.0)
; c[147] = (0.0, 0.0, 1.0, 0.0)
; v0 = (0.1647059, 0.1647059, 0.1686275, 1.0)
; v1 = (-728.0, -4058.0, 0.0, 0.0)

MUL R6.xyzw, v1, c[121]
ADD R11.xyz, R6.xyz, -c[136].xyz
ADD R10.yzw, R6.yzx, -c[137].yzx
MUL R2.xyz, R11.xyz, R11.xyz
DP3 R2.x, c[135].z, R2.xyz
MUL R2.yzw, R10.yzw, R10.yzw
ADD R11.xyz, R6.xyz, -c[138].xyz
DP4 oPos.x, R6, c[96] + RSQ R1.x, R2.x
DP3 R2.y, c[135].z, R2.wyz
MUL R11.yzw, R11.yzx, R11.yzx
ADD R10.xyz, R6.xyz, -c[139].xyz
DST R9.xyz, R2.x, R1.x
DP3 R2.x, c[135].z, R11.wyz + RSQ R1.x, R2.y
MUL R11.yzw, R10.yzx, R10.yzx
DST R10.xyz, R2.y, R1.x
DP3 R2.y, c[135].z, R11.wyz + RSQ R1.x, R2.x
DP3 R11.x, R9.xyz, c[144].xyz
DST R8.xyz, R2.x, R1.x
DP3 R11.y, R10.xyz, c[145].xyz + RSQ R1.x, R2.y
SLT R0.w, R9.y, c[144].w + RCP R1.y, R11.x
DST R11.xyz, R2.y, R1.x + RCP R1.z, R11.y
MUL R7.xyz, R1.y, c[140].xyz
DP3 R11.w, R8.xyz, c[146].xyz
MUL R7.xyz, R7.xyz, R0.w
MUL R10.xzw, R1.z, c[141].yzx + RCP R1.x, R11.w
SLT R0.w, R10.y, c[145].w
DP3 R11.x, R11.xyz, c[147].xyz
ADD R8.xzw, v0.xzy, R7.xzy
MUL R7.xyz, R10.wxz, R0.w + RCP R1.y, R11.x
MUL R11.xzw, R1.x, c[142].yzx
SLT R0.w, R8.y, c[146].w
ADD R8.xyz, R8.xwz, R7.xyz
MUL R7.xyz, R11.wxz, R0.w
MUL R11.xzw, R1.y, c[143].yzx
SLT R0.w, R11.y, c[147].w
ADD R8.xyz, R8.xyz, R7.xyz
MUL R7.xyz, R11.wxz, R0.w + RCC R1.x, R12.w
ADD oD0.xyz, R8.xyz, R7.xyz

oD0.a is just v0.w

@abaire
Copy link
Contributor

abaire commented Jun 8, 2022

The issue here is similar to #365. In this case, it's a difference in how INF * 0 is evaluated. On my machine (nvidia GTX 1070) it evaluates to NaN. On nv2a, it evaluates to 0.

The key is in

  /* Slot 26: 0x00000000 0x00518055 0x14359800 0x2E700000 */
  MUL(R7,xyz, R1.y, c[140].xyz);

where R1.y is INF (due to a previous reciprocal of 0) and c[140] is vec4(0). On nv2a this results in R7.xyz being set to 0, in xemu it's set to NaN and taints all further calculations until it results in oD0 being calculated as 0 instead of the ~0.164706 that it is meant to be (see HW results).

abaire added a commit to abaire/xemu that referenced this issue Jun 8, 2022
Fixes xemu-project#1008

The nv2a returns 0 for anything multiplied by zero, including exceptional
values such as Inf and NaN. Desktop GPUs do not enforce this, leading to
conditions where NaNs wipe out calculations and lead to erroneous behavior.

[Test](https://github.com/abaire/nxdk_vsh_tests/blob/main/src/tests/americasarmyshader.cpp)
[HW Results](https://github.com/abaire/nxdk_vsh_tests_golden_results/wiki/Results-AmericasArmyShader)
@abaire
Copy link
Contributor

abaire commented Jun 24, 2022

Hey @Triticum0 I assume you didn't mean to close this by merging the #1045 into your own repo?

@Triticum0 Triticum0 reopened this Jun 24, 2022
@Triticum0
Copy link
Author

Didn't know that could happen

abaire added a commit to abaire/xemu that referenced this issue Jun 27, 2022
Fixes xemu-project#1008

The nv2a returns 0 for anything multiplied by zero, including exceptional
values such as Inf and NaN. Desktop GPUs do not enforce this, leading to
conditions where NaNs wipe out calculations and lead to erroneous behavior.

[Test](https://github.com/abaire/nxdk_vsh_tests/blob/main/src/tests/americasarmyshader.cpp)
[HW Results](https://github.com/abaire/nxdk_vsh_tests_golden_results/wiki/Results-AmericasArmyShader)
abaire added a commit to abaire/xemu that referenced this issue Jul 24, 2022
Fixes xemu-project#1008

The nv2a returns 0 for anything multiplied by zero, including exceptional
values such as Inf and NaN. Desktop GPUs do not enforce this, leading to
conditions where NaNs wipe out calculations and lead to erroneous behavior.

[Test](https://github.com/abaire/nxdk_vsh_tests/blob/main/src/tests/americasarmyshader.cpp)
[HW Results](https://github.com/abaire/nxdk_vsh_tests_golden_results/wiki/Results-AmericasArmyShader)
@hilariousman
Copy link

how can i fix my game with this bug?

mborgerson pushed a commit that referenced this issue May 1, 2023
Fixes #1008

The nv2a returns 0 for anything multiplied by zero, including exceptional
values such as Inf and NaN. Desktop GPUs do not enforce this, leading to
conditions where NaNs wipe out calculations and lead to erroneous behavior.

[Test](https://github.com/abaire/nxdk_vsh_tests/blob/main/src/tests/americasarmyshader.cpp)
[HW Results](https://github.com/abaire/nxdk_vsh_tests_golden_results/wiki/Results-AmericasArmyShader)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants