America's Army: Rise of a Soldier: Floor texture Broken #1008

Triticum0 · 2022-06-01T22:22:22Z

Title

https://xemu.app/titles/55530043/#America-s-Army-Rise-of-a-Soldier

Bug Description

When going in-game the floor textures black

Expected Behavior

It should be green

xemu Version

Version: 0.7.25
Branch: master
Commit: 7d6da22
Date: Tue May 31 08:20:36 PM UTC 2022

System Information

Field	Value
OS	Windows 10
CPU	AMD Ryzen 5 2600 Six-Core Processo
Graphics Device	NVIDIA GeForce RTX 3060 Ti/PCIe/SSE2
Graphics Driver	4.0.0 NVIDIA 512.95

Additional Context

No response

The text was updated successfully, but these errors were encountered:

abaire · 2022-06-04T15:25:38Z

There are four textures involved, three are DXT1 (NV097_SET_TEXTURE_FORMAT_COLOR_L_DXT1_A1R5G5B5) and one is NV097_SET_TEXTURE_FORMAT_COLOR_SZ_A8R8G8B8

Relevant combiner:

// Stage 0
ab.rgb = clamp(vec3(dot(t1.rgb, c0_0.rgb)), -1.0, 1.0);
r1.rgb = ab.rgb;
r1.a = ab.b;
// Stage 1
ab.rgb = clamp(vec3((t0.rgb * r1.rgb)), -1.0, 1.0);
ab.a = clamp(((t0.a * r1.a)), -1.0, 1.0);
r0.rgb = ab.rgb;
r0.a = ab.a;
// Stage 2
ab.rgb = clamp(vec3(dot(t1.rgb, c0_2.rgb)), -1.0, 1.0);
r1.rgb = ab.rgb;
r1.a = ab.b;
// Stage 3
mux_sum.rgb = clamp(vec3(((t2.rgb * r1.rgb) + ((1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0)) * r0.rgb))), -1.0, 1.0);
mux_sum.a = clamp((((t2.a * r1.a) + ((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * r0.a))), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
r0.a = mux_sum.a;
// Stage 4
ab.rgb = clamp(vec3(dot(t1.rgb, c0_4.rgb)), -1.0, 1.0);
r1.rgb = ab.rgb;
r1.a = ab.b;
// Stage 5
mux_sum.rgb = clamp(vec3(((t3.rgb * r1.rgb) + ((1.0 - clamp(vec4(0.0).rgb, 0.0, 1.0)) * r0.rgb))), -1.0, 1.0);
mux_sum.a = clamp((((t3.a * r1.a) + ((1.0 - clamp(vec4(0.0).a, 0.0, 1.0)) * r0.a))), -1.0, 1.0);
r0.rgb = mux_sum.rgb;
r0.a = mux_sum.a;
// Stage 6
ab.rgb = clamp(vec3(((r0.rgb * v0.rgb) * 2.0)), -1.0, 1.0);
ab.a = clamp((((r0.a * v0.a) * 2.0)), -1.0, 1.0);
r0.rgb = ab.rgb;
r0.a = ab.a;
// Final Combiner
fragColor.rgb = max(vec4(0.0).rgb, 0.0) + mix(vec3(max(pFog.rgb, 0.0)), vec3(max(r0.rgb, 0.0)), vec3(max(pFog.aaa, 0.0)));
fragColor.a = max(r0.a, 0.0);

Constants:

c0_0 1.00, 0.00, 0.00, 0.00 
c0_2 0.00, 1.00, 0.00, 0.00
c0_4 0.00, 0.00, 1.00, 0.00

As a quick verification, I set the fragColor to each of the textures and each one produced the output I'd expect, so something is going wrong in the combiner. It does use fog, but I verified that r0.rgb is wrong (and fog is entirely disabled in the frame I captured)

abaire · 2022-06-04T15:29:00Z

Looks like stage 6 of the combiner is what blacks it out: ab.rgb = clamp(vec3(((r0.rgb * v0.rgb) * 2.0)), -1.0, 1.0); ends up being black as v0 is all black from the vertex shader.

Vertex shader is quite complex:

  /* Slot 0: 0x00000000 0x004F221B 0x18373800 0x2F600000 */
  MUL(R6,xyzw, v1, c[121]);

  /* Slot 1: 0x00000000 0x004F061B 0x38371800 0x2F400000 */
  MUL(R4,xyzw, v3, c[120]);

  /* Slot 2: 0x00000000 0x02F0001B 0x64361BFC 0x2850181C */
  DP4(_temp_vec,x, R6, c[128]);
  MOV(oD0,w, v0.w);
  R5.x = _temp_vec.x;

  /* Slot 3: 0x00000000 0x00F0201B 0x64363800 0x24500000 */
  DP4(R5,y, R6, c[129]);

  /* Slot 4: 0x00000000 0x00F0401B 0x64365800 0x22500000 */
  DP4(R5,z, R6, c[130]);

  /* Slot 5: 0x00000000 0x00F0601B 0x64367800 0x21500000 */
  DP4(R5,w, R6, c[131]);

  /* Slot 6: 0x00000000 0x0071001A 0x6400146A 0x3EB00000 */
  ADD(R11,xyz, R6.xyz, -c[136].xyz);

  /* Slot 7: 0x00000000 0x00712058 0x64001562 0x77A00000 */
  ADD(R10,yzw, R6.yyzx, -c[137].yyzx);

  /* Slot 8: 0x00000000 0x0040001A 0xB4356800 0x2E200000 */
  MUL(R2,xyz, R11.xyz, R11.xyz);

  /* Slot 9: 0x00000000 0x00B0E0AA 0x7C344800 0x28200000 */
  DP3(R2,x, c[135].z, R2.xyz);

  /* Slot 10: 0x00000000 0x0040005B 0xA4B74800 0x27200000 */
  MUL(R2,yzw, R10.yyzw, R10.yyzw);

  /* Slot 11: 0x00000000 0x0071401A 0x6400146A 0xBEB00000 */
  ADD(R11,xyz, R6.xyz, -c[138].xyz);

  /* Slot 12: 0x00000000 0x08EC001B 0x64361800 0x90A88800 */
  DP4(oPos,x, R6, c[96]);
  RSQ(R1,x, R2.x);

  /* Slot 13: 0x00000000 0x00B0E0AA 0x7DB44800 0x24200000 */
  DP3(R2,y, c[135].z, R2.wyz);

  /* Slot 14: 0x00000000 0x00400058 0xB4B16800 0x27B00000 */
  MUL(R11,yzw, R11.yyzx, R11.yyzx);

  /* Slot 15: 0x00000000 0x0071601A 0x6400146A 0xFEA00000 */
  ADD(R10,xyz, R6.xyz, -c[139].xyz);

  /* Slot 16: 0x00000000 0x01000000 0x24002800 0x2E900000 */
  DST(R9,xyz, R2.x, R1.x);

  /* Slot 17: 0x00000000 0x08B0E0AA 0x7DB56954 0x98280000 */
  DP3(_temp_vec,x, c[135].z, R11.wyz);
  RSQ(R1,x, R2.y);
  R2.x = _temp_vec.x;

  /* Slot 18: 0x00000000 0x00400058 0xA4B14800 0x27B00000 */
  MUL(R11,yzw, R10.yyzx, R10.yyzx);

  /* Slot 19: 0x00000000 0x01000055 0x24002800 0x2EA00000 */
  DST(R10,xyz, R2.y, R1.x);

  /* Slot 20: 0x00000000 0x08B0E0AA 0x7DB56800 0x94280000 */
  DP3(_temp_vec,y, c[135].z, R11.wyz);
  RSQ(R1,x, R2.x);
  R2.y = _temp_vec.y;

  /* Slot 21: 0x00000000 0x00B2001A 0x94341800 0x28B00000 */
  DP3(R11,x, R9.xyz, c[144].xyz);

  /* Slot 22: 0x00000000 0x01000000 0x24002800 0x2E800000 */
  DST(R8,xyz, R2.x, R1.x);

  /* Slot 23: 0x00000000 0x08B2201A 0xA4343954 0x94B80000 */
  DP3(_temp_vec,y, R10.xyz, c[145].xyz);
  RSQ(R1,x, R2.y);
  R11.y = _temp_vec.y;

  /* Slot 24: 0x00000000 0x05720055 0x95FE1802 0xD1040000 */
  SLT(_temp_vec,w, R9.y, c[144].w);
  RCP(R1,y, R11.x);
  R0.w = _temp_vec.w;

  /* Slot 25: 0x00000000 0x05000055 0x24002956 0xDEB20000 */
  DST(_temp_vec,xyz, R2.y, R1.x);
  RCP(R1,z, R11.y);
  R11.xyz = _temp_vec.xyz;

  /* Slot 26: 0x00000000 0x00518055 0x14359800 0x2E700000 */
  MUL(R7,xyz, R1.y, c[140].xyz);

  /* Slot 27: 0x00000000 0x00B2401A 0x84345800 0x21B00000 */
  DP3(R11,w, R8.xyz, c[146].xyz);

  /* Slot 28: 0x00000000 0x0040001A 0x75FE0800 0x2E700000 */
  MUL(R7,xyz, R7.xyz, R0.w);

  /* Slot 29: 0x00000000 0x0451A0AA 0x14B1BBFE 0xDBA80000 */
  MUL(_temp_vec,xzw, R1.z, c[141].yyzx);
  RCP(R1,x, R11.w);
  R10.xzw = _temp_vec.xzw;

  /* Slot 30: 0x00000000 0x01722055 0xA5FE3800 0x21000000 */
  SLT(R0,w, R10.y, c[145].w);

  /* Slot 31: 0x00000000 0x00B2601A 0xB4347800 0x28B00000 */
  DP3(R11,x, R11.xyz, c[147].xyz);

  /* Slot 32: 0x00000000 0x00600009 0x08001025 0xDB800000 */
  ADD(R8,xzw, v0.xxzy, R7.xxzy);

  /* Slot 33: 0x00000000 0x044000CA 0xA5FE0802 0xDE740000 */
  MUL(_temp_vec,xyz, R10.wxz, R0.w);
  RCP(R1,y, R11.x);
  R7.xyz = _temp_vec.xyz;

  /* Slot 34: 0x00000000 0x0051C000 0x14B1D800 0x2BB00000 */
  MUL(R11,xzw, R1.x, c[142].yyzx);

  /* Slot 35: 0x00000000 0x01724055 0x85FE5800 0x21000000 */
  SLT(R0,w, R8.y, c[146].w);

  /* Slot 36: 0x00000000 0x0060003A 0x84001069 0xDE800000 */
  ADD(R8,xyz, R8.xwz, R7.xyz);

  /* Slot 37: 0x00000000 0x004000CA 0xB5FE0800 0x2E700000 */
  MUL(R7,xyz, R11.wxz, R0.w);

  /* Slot 38: 0x00000000 0x0051E055 0x14B1F800 0x2BB00000 */
  MUL(R11,xzw, R1.y, c[143].yyzx);

  /* Slot 39: 0x00000000 0x01726055 0xB5FE7800 0x21000000 */
  SLT(R0,w, R11.y, c[147].w);

  /* Slot 40: 0x00000000 0x00EC201B 0x64363800 0x20B04800 */
  DP4(oPos,y, R6, c[97]);

  /* Slot 41: 0x00000000 0x00EC401B 0x64365800 0x28002800 */
  DP4(oPos,z, R6, c[98]);
  DP4(R0,x, R6, c[98]);

  /* Slot 42: 0x00000000 0x00EC601B 0x64367800 0x20A01800 */
  DP4(oPos,w, R6, c[99]);

  /* Slot 43: 0x00000000 0x0060001A 0x84001069 0xDE800000 */
  ADD(R8,xyz, R8.xyz, R7.xyz);

  /* Slot 44: 0x00000000 0x064000CA 0xB5FE0BFF 0x1E780000 */
  MUL(_temp_vec,xyz, R11.wxz, R0.w);
  RCC(R1,x, R12.w);
  R7.xyz = _temp_vec.xyz;

  /* Slot 45: 0x00000000 0x02ED001B 0x44371800 0x1820F82C */
  DP4(_temp_vec,x, R4, c[104]);
  MOV(oFog,xyzw, R0.x);
  R2.x = _temp_vec.x;

  /* Slot 46: 0x00000000 0x00ED201B 0x44373800 0x24200000 */
  DP4(R2,y, R4, c[105]);

  /* Slot 47: 0x00000000 0x00ED401B 0x44375800 0x22200000 */
  DP4(R2,z, R4, c[106]);

  /* Slot 48: 0x00000000 0x00EC801B 0x54369800 0x21B00000 */
  DP4(R11,w, R5, c[100]);

  /* Slot 49: 0x00000000 0x02ECA01B 0x5436B86C 0x94B0E854 */
  DP4(_temp_vec,y, R5, c[101]);
  MOV(oT1,xyz, R2);
  R11.y = _temp_vec.y;

  /* Slot 50: 0x00000000 0x00ECC01B 0x5436D800 0x22B00000 */
  DP4(R11,z, R5, c[102]);

  /* Slot 51: 0x00000000 0x00ED801B 0x54379800 0x28200000 */
  DP4(R2,x, R5, c[108]);

  /* Slot 52: 0x00000000 0x02EDA01B 0x5437BB62 0xD420E84C */
  DP4(_temp_vec,y, R5, c[109]);
  MOV(oT0,xyz, R11.wyzx);
  R2.y = _temp_vec.y;

  /* Slot 53: 0x00000000 0x00EDC01B 0x5437D800 0x22200000 */
  DP4(R2,z, R5, c[110]);

  /* Slot 54: 0x00000000 0x00EE001B 0x54361800 0x21B00000 */
  DP4(R11,w, R5, c[112]);

  /* Slot 55: 0x00000000 0x02EE201B 0x5436386C 0x94B0E85C */
  DP4(_temp_vec,y, R5, c[113]);
  MOV(oT2,xyz, R2);
  R11.y = _temp_vec.y;

  /* Slot 56: 0x00000000 0x00EE401B 0x54365800 0x22B00000 */
  DP4(R11,z, R5, c[114]);

  /* Slot 57: 0x00000000 0x0047401A 0xC4355800 0x20A0E800 */
  MUL(oPos,xyz, R12.xyz, c[58].xyz);

  /* Slot 58: 0x00000000 0x0060001A 0x84001069 0xD0A0E818 */
  ADD(oD0,xyz, R8.xyz, R7.xyz);

  /* Slot 59: 0x00000000 0x00ED601B 0x44377800 0x20B01850 */
  DP4(oT1,w, R4, c[107]);

  /* Slot 60: 0x00000000 0x00ECE01B 0x5436F800 0x20B01848 */
  DP4(oT0,w, R5, c[103]);

  /* Slot 61: 0x00000000 0x00EDE01B 0x5437F800 0x20A01858 */
  DP4(oT2,w, R5, c[111]);

  /* Slot 62: 0x00000000 0x00EE601B 0x54367800 0x20A01860 */
  DP4(oT3,w, R5, c[115]);

  /* Slot 63: 0x00000000 0x02000000 0x08001362 0xD0A0E864 */
  MOV(oT3,xyz, R11.wyzx);

  /* Slot 64: 0x00000000 0x0087601A 0xC400286A 0xF0B0E801 */
  MAD(oPos,xyz, R12.xyz, R1.x, c[59].xyz);

oD0.a comes from v0.w
oD0.rgb comes from a chain of operations

abaire · 2022-06-08T02:12:56Z

I wrote some tooling to make it easier to process these large shaders.

Here's the (huge) oD0.rgb chain:

; Inputs:
; c[121] = (2.0, 2.0, 2.0, 1.0)
; c[135] = (0.0, 0.5, 1.0, 3.0)
; c[136] = (0.9983897, -0.0101479, -0.0558118, 0.0)
; c[137] = (-0.0567268, -0.178603, -0.9822846, 0.0)
; c[138] = (0.0, 0.9838689, -0.1788911, 0.0)
; c[139] = (31.5401363, -20.6333656, 67.6486282, 1.0)
; c[140] = (0.0, 0.0, 0.0, 0.0)
; c[141] = (0.0, 0.0, 0.0, 0.0)
; c[142] = (0.0, 0.0, 0.0, 0.0)
; c[143] = (0.0, 0.0, 0.0, 0.0)
; c[144] = (0.0, 0.0, 0.0, 0.0)
; c[145] = (1.0, 0.0, 0.0, 0.0)
; c[146] = (0.0, 1.0, 0.0, 0.0)
; c[147] = (0.0, 0.0, 1.0, 0.0)
; v0 = (0.1647059, 0.1647059, 0.1686275, 1.0)
; v1 = (-728.0, -4058.0, 0.0, 0.0)

MUL R6.xyzw, v1, c[121]
ADD R11.xyz, R6.xyz, -c[136].xyz
ADD R10.yzw, R6.yzx, -c[137].yzx
MUL R2.xyz, R11.xyz, R11.xyz
DP3 R2.x, c[135].z, R2.xyz
MUL R2.yzw, R10.yzw, R10.yzw
ADD R11.xyz, R6.xyz, -c[138].xyz
DP4 oPos.x, R6, c[96] + RSQ R1.x, R2.x
DP3 R2.y, c[135].z, R2.wyz
MUL R11.yzw, R11.yzx, R11.yzx
ADD R10.xyz, R6.xyz, -c[139].xyz
DST R9.xyz, R2.x, R1.x
DP3 R2.x, c[135].z, R11.wyz + RSQ R1.x, R2.y
MUL R11.yzw, R10.yzx, R10.yzx
DST R10.xyz, R2.y, R1.x
DP3 R2.y, c[135].z, R11.wyz + RSQ R1.x, R2.x
DP3 R11.x, R9.xyz, c[144].xyz
DST R8.xyz, R2.x, R1.x
DP3 R11.y, R10.xyz, c[145].xyz + RSQ R1.x, R2.y
SLT R0.w, R9.y, c[144].w + RCP R1.y, R11.x
DST R11.xyz, R2.y, R1.x + RCP R1.z, R11.y
MUL R7.xyz, R1.y, c[140].xyz
DP3 R11.w, R8.xyz, c[146].xyz
MUL R7.xyz, R7.xyz, R0.w
MUL R10.xzw, R1.z, c[141].yzx + RCP R1.x, R11.w
SLT R0.w, R10.y, c[145].w
DP3 R11.x, R11.xyz, c[147].xyz
ADD R8.xzw, v0.xzy, R7.xzy
MUL R7.xyz, R10.wxz, R0.w + RCP R1.y, R11.x
MUL R11.xzw, R1.x, c[142].yzx
SLT R0.w, R8.y, c[146].w
ADD R8.xyz, R8.xwz, R7.xyz
MUL R7.xyz, R11.wxz, R0.w
MUL R11.xzw, R1.y, c[143].yzx
SLT R0.w, R11.y, c[147].w
ADD R8.xyz, R8.xyz, R7.xyz
MUL R7.xyz, R11.wxz, R0.w + RCC R1.x, R12.w
ADD oD0.xyz, R8.xyz, R7.xyz

oD0.a is just v0.w

abaire · 2022-06-08T19:23:32Z

The issue here is similar to #365. In this case, it's a difference in how INF * 0 is evaluated. On my machine (nvidia GTX 1070) it evaluates to NaN. On nv2a, it evaluates to 0.

The key is in

  /* Slot 26: 0x00000000 0x00518055 0x14359800 0x2E700000 */
  MUL(R7,xyz, R1.y, c[140].xyz);

where R1.y is INF (due to a previous reciprocal of 0) and c[140] is vec4(0). On nv2a this results in R7.xyz being set to 0, in xemu it's set to NaN and taints all further calculations until it results in oD0 being calculated as 0 instead of the ~0.164706 that it is meant to be (see HW results).

Fixes xemu-project#1008 The nv2a returns 0 for anything multiplied by zero, including exceptional values such as Inf and NaN. Desktop GPUs do not enforce this, leading to conditions where NaNs wipe out calculations and lead to erroneous behavior. [Test](https://github.com/abaire/nxdk_vsh_tests/blob/main/src/tests/americasarmyshader.cpp) [HW Results](https://github.com/abaire/nxdk_vsh_tests_golden_results/wiki/Results-AmericasArmyShader)

abaire · 2022-06-24T13:19:50Z

Hey @Triticum0 I assume you didn't mean to close this by merging the #1045 into your own repo?

Triticum0 · 2022-06-24T13:48:07Z

Didn't know that could happen

Fixes xemu-project#1008 The nv2a returns 0 for anything multiplied by zero, including exceptional values such as Inf and NaN. Desktop GPUs do not enforce this, leading to conditions where NaNs wipe out calculations and lead to erroneous behavior. [Test](https://github.com/abaire/nxdk_vsh_tests/blob/main/src/tests/americasarmyshader.cpp) [HW Results](https://github.com/abaire/nxdk_vsh_tests_golden_results/wiki/Results-AmericasArmyShader)

hilariousman · 2023-04-10T12:34:42Z

how can i fix my game with this bug?

Fixes #1008 The nv2a returns 0 for anything multiplied by zero, including exceptional values such as Inf and NaN. Desktop GPUs do not enforce this, leading to conditions where NaNs wipe out calculations and lead to erroneous behavior. [Test](https://github.com/abaire/nxdk_vsh_tests/blob/main/src/tests/americasarmyshader.cpp) [HW Results](https://github.com/abaire/nxdk_vsh_tests_golden_results/wiki/Results-AmericasArmyShader)

Triticum0 added the bug Something isn't working label Jun 1, 2022

abaire mentioned this issue Jun 8, 2022

nv2a: Adjust NaN handling to be similar to HW #913

Closed

abaire mentioned this issue Jun 8, 2022

nv2a: Make multiplication by 0 match HW behavior. #1045

Merged

Triticum0 closed this as completed Jun 24, 2022

Triticum0 reopened this Jun 24, 2022

Spidy123222 mentioned this issue Dec 7, 2022

nv2a: Make multiplication by 0 match HW behavior. Spidy123222/xemu#1

Open

faha223 mentioned this issue Dec 22, 2022

nv2a: Make multiplication by 0 match HW behavior. faha223/xemu#4

Merged

mborgerson closed this as completed in #1045 May 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

America's Army: Rise of a Soldier: Floor texture Broken #1008

America's Army: Rise of a Soldier: Floor texture Broken #1008

Triticum0 commented Jun 1, 2022

abaire commented Jun 4, 2022 •

edited

Loading

abaire commented Jun 4, 2022 •

edited

Loading

abaire commented Jun 8, 2022 •

edited

Loading

abaire commented Jun 8, 2022

abaire commented Jun 24, 2022

Triticum0 commented Jun 24, 2022

hilariousman commented Apr 10, 2023

America's Army: Rise of a Soldier: Floor texture Broken #1008

America's Army: Rise of a Soldier: Floor texture Broken #1008

Comments

Triticum0 commented Jun 1, 2022

Title

Bug Description

Expected Behavior

xemu Version

System Information

Additional Context

abaire commented Jun 4, 2022 • edited Loading

abaire commented Jun 4, 2022 • edited Loading

abaire commented Jun 8, 2022 • edited Loading

abaire commented Jun 8, 2022

abaire commented Jun 24, 2022

Triticum0 commented Jun 24, 2022

hilariousman commented Apr 10, 2023

abaire commented Jun 4, 2022 •

edited

Loading

abaire commented Jun 4, 2022 •

edited

Loading

abaire commented Jun 8, 2022 •

edited

Loading