Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return To Castle Wolfenstein: Crash in VM due to null pointer exception, possibly triggered by difference in cpuid handling #509

Closed
abaire opened this issue Oct 28, 2021 · 12 comments · Fixed by #514
Labels
bug Something isn't working

Comments

@abaire
Copy link
Contributor

abaire commented Oct 28, 2021

Title

https://xemu.app/titles/41560010/#Return-to-Castle-Wolfenstein-Tides-of-War

Bug Description

RTCW crashes shortly after boot while displaying the copyright screen. The underlying reason for the crash is that the client init function is never called due to the fact that it ends up having an unexpectedly changed value for the EBX register after calling cpuid at 0x00140c39. This leads to the init being skipped which ends up leading to a crash awhile later after initialization.

On XEMU before cpuid:

# eax            0x1                 1
# ecx            0x0                 0
# edx            0x7f355fba          2134204346
# ebx            0x34b9000           55283712

After the call:

# eax            0x68a               1674
# ecx            0x80000000          -2147483648
# edx            0x383f9fd           58980861
# ebx            0x800               2048

On XBOX hardware:

# eax            0x1                 1
# ecx            0x0                 0
# edx            0x7f355fba          2134204346
# ebx            0x35ff000           56619008

After:

# eax            0x68a               1674
# ecx            0x0                 0
# edx            0x383f9ff           58980863
# ebx            0x0                 0

Expected Behavior

The guest machine should not crash and cpuid should emulate XBOX hardware in all modified registers.

xemu Version

0.6.1-24-gc6e05f51b2

System Information

OS: Ubuntu (same crash happens on macOS and Windows)
GPU: GTX 1070

Additional Context

No response

@abaire abaire added the bug Something isn't working label Oct 28, 2021
@mborgerson
Copy link
Member

Very cool!

It would be great if someone could volunteer to build a test bench app that does a full cpuid dump and comparison. I'm not sure it's been done yet.

I seem to recall @GXTX doing something like this before but not sure where the app/results are.

@abaire
Copy link
Contributor Author

abaire commented Oct 28, 2021

I'll probably take a stab at that shortly.

Modifying cpu.c to blindly return the correct values for EAX==1 ends up getting past the copyright screen but it still crashes later. The cpuid is checked in a loop multiple times so it's possible some other values are incorrect for the XBOX variant of the CPU as well.

@abaire
Copy link
Contributor Author

abaire commented Oct 28, 2021

https://github.com/abaire/nxdk_check_cpuid

XEMU:

MODE=0x00000000, EAX=0x00000003, EBX=0x756E6547, ECX=0x6C65746E, EDX=0x49656E69
MODE=0x00000001, EAX=0x0000068A, EBX=0x00000800, ECX=0x80000000, EDX=0x0383F9FD
MODE=0x00000002, EAX=0x00000001, EBX=0x00000000, ECX=0x0000004D, EDX=0x002C307D
MODE=0x00000003, EAX=0x00000000, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000
MODE=0x00000004, EAX=0x00000000, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000
MODE=0x00000005, EAX=0x00000000, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000
MODE=0x80000000, EAX=0x00000000, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000
MODE=0x80000001, EAX=0x00000000, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000
MODE=0x80000002, EAX=0x00000000, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000
MODE=0x80000003, EAX=0x00000000, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000
MODE=0x80000004, EAX=0x00000000, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000
MODE=0x80000005, EAX=0x00000000, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000
MODE=0x80000006, EAX=0x00000000, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000
MODE=0x80000007, EAX=0x00000000, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000
MODE=0x80000008, EAX=0x00000000, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000

XBOX (1.0):

MODE=0x00000000, EAX=0x00000002, EBX=0x756E6547, ECX=0x6C65746E, EDX=0x49656E69
MODE=0x00000001, EAX=0x0000068A, EBX=0x00000000, ECX=0x00000000, EDX=0x0383F9FF
MODE=0x00000002, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x00000003, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x00000004, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x00000005, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x00000006, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x00000007, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x00000009, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x0000000A, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x0000000B, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x0000001F, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x0000000D, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x00000014, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x40000000, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x40000001, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x80000000, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x80000001, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x80000002, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x80000003, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x80000004, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x80000005, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x80000006, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x80000007, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x80000008, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x8000000A, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x8000001D, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x8000001E, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0xC0000000, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0xC0000001, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0xC0000002, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0xC0000003, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0xC0000004, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841
MODE=0x8000001F, EAX=0x03020101, EBX=0x00000000, ECX=0x00000000, EDX=0x0C040841

@abaire
Copy link
Contributor Author

abaire commented Oct 28, 2021

Sadly even with blindly returning the values I retrieved from my XBOX it still crashes later on; will continue debugging.

@abaire
Copy link
Contributor Author

abaire commented Oct 29, 2021

Interestingly, if I copy the game onto the E: and run it from there (via debug dashboard, so e:\DEVKIT\w, not sure if it matters), the game actually does run (at least it gets to the initial menu screen and is still interactive, I haven't tested further).

Running from an iso consistently fails, though this time it seems like it's failing to alloc in a way that causes it to write to 0x0011c49d at EIP=0x001b4665

abaire@12da5ac has the change I applied that makes it work (off disk), but I need to update and redump as I see a note indicating that 4+ are only available if IA32_CR_MISC_ENABLES.BOOT_NT4 is clear, and they're suspiciously all the same value in the test output.

@abaire
Copy link
Contributor Author

abaire commented Oct 29, 2021

Reading more of the docs:

  1. I think the IA32_CR_MISC_ENABLES MSR is only available in Pentium 4 and later.
  2. According to AP-485 cpuid with EAX = 0 will set EAX to the maximum supported leaf, and it's returning "2" which makes me more confident that the duplication is expected (requests for leaf > the max are supposed to return the values of the max supported leaf).

So I think the method can safely be updated to just return the 3 sets of values with a default == mode 2.

@abaire
Copy link
Contributor Author

abaire commented Oct 29, 2021

Following a bit further, this seems to be related to the ATAPI handling in xemu.

It looks like there is a global heap linked list that is getting corrupted when attempting to read from the virtual DVD device.The crash is set up in the depths of NtReadFile where it requests a read of 0x200 bytes into a buffer that happens to be just before the heap linked list.

In the HDD case, the read results in 0x200 bytes being retrieved.
In the DVD case, the read results in 0x800 bytes being retrieved, which is suspiciously also ATAPI_SECTOR_SIZE in xemu. I suspect the real hardware caps the write in a way that the virtual hardware is not, but I'm not familiar enough w/ qemu/xemu internals to tell for sure yet.

This will break gdb at the interesting NtReadFile invocation which should read into a buffer 0x0133c070 - 0x0133c270.

b *0x001b0fe6
ignore $bpnum 2

After awhile it gets into code that I assume is doing the system call to the appropriate device, with paths diverging at
0x80038508 (HDD) versus 0x8003a4aa (DVD)

@mborgerson
Copy link
Member

mborgerson commented Oct 30, 2021

Good work on the CPUID issue!

As for the disc reading stuff:

Where exactly is the game crashing? I am able to get in game without issue, with the patch for CPUID.

As a sanity check before diving into the CDROM stack, how did you create your game ISO? I recommend using extract-xiso and ensuring the build is 32b; 64b builds are known to produce bad ISOs.

@abaire
Copy link
Contributor Author

abaire commented Oct 30, 2021

Interesting, I'll try recreating, I probably used the version of extract-xiso that happened to be at head in the nxdk at the time.

To the end user, the crash comes in the black screen between the copyright screen and the ESRB notice. Hopefully recreating the ISO fixes the issue for me as well - I can see the call to handle_aiocb_rw_linear that has what I think is an incorrect size, but haven't figured out where it's getting set yet and would be happy to stop looking if it already works :)

@abaire
Copy link
Contributor Author

abaire commented Oct 30, 2021

Yup, that was 100% the problem, thanks for saving me from more debugging!

I wonder if there's a way to detect that an xiso was produced with the 64-bit build so that xemu could bomb out early and prevent erroneous error reports?

@mborgerson
Copy link
Member

Glad to hear it!

xemu itself shouldn't care what the disc filesystem looks like. I think the best solution here would be to develop a solid ISO packing tool, and have it be able to do such checks; then suggest every user use that tool

@abaire
Copy link
Contributor Author

abaire commented Oct 30, 2021

Makes sense. I just verified that an image built with 64-bit extract-xiso with XboxDev/extract-xiso/pull/52 patched in works correctly, so maybe this particular issue won't be a problem in the future (assuming that PR gets merged).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants