Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic crashes in the vmware_fs driver #36

Open
volo-droid opened this issue Nov 9, 2021 · 8 comments
Open

Sporadic crashes in the vmware_fs driver #36

volo-droid opened this issue Nov 9, 2021 · 8 comments

Comments

@volo-droid
Copy link
Contributor

The current vmware_fs implementation seems to be quite buggy, and fails under extensive usage:

Crash 1:

PANIC: Unexpected exception "General Protection Exception" occurred in kernel mode! Error code: 0x0

stack trace for thread 2553 "cp"
    kernel stack: 0xffffffff81bd5000 to 0xffffffff81bda000
      user stack: 0x00007f4cad8c1000 to 0x00007f4cae8c1000
frame                       caller             <image>:function + offset
…
11 ffffffff81bd95e0 (+ 240) ffffffff800ab0f7   <kernel_x86_64> panic + 0xb7
12 ffffffff81bd96d0 (+ 224) ffffffff801502b8   <kernel_x86_64> x86_unexpected_exception + 0x168
13 ffffffff81bd97b0 (+ 872) ffffffff8014692c   <kernel_x86_64> int_bottom + 0x80
kernel iframe at 0xffffffff81bd9b18 (end = 0xffffffff81bd9be0)
 rax 0xffffffff81bd3c90    rbx 0xffffffff9a09a520    rcx 0x0
 rdx 0x9                   rsi 0xffffffff9a09a520    rdi 0x64616f6c6e776f64
 rbp 0xffffffff81bd9c00     r8 0x0                    r9 0x6f8
 r10 0x0                   r11 0xffffffff8e19d000    r12 0xffffffff9a09a520
 r13 0xffffffff9a304000    r14 0xffffffff9a35f938    r15 0xffffffff9a304000
 rip 0xffffffff81bc2b0d    rsp 0xffffffff81bd9be0 rflags 0x10202
 vector: 0xd, error code: 0x0
14 ffffffff81bd9b18 (+ 232) ffffffff81bc2b0d   </boot/system/add-ons/kernel/file_systems/vmwfs> VMWNode::~VMWNode() + 0x57
15 ffffffff81bd9c00 (+  32) ffffffff81bc2b27   </boot/system/add-ons/kernel/file_systems/vmwfs> VMWNode::~VMWNode() + 0x11
16 ffffffff81bd9c20 (+  64) ffffffff81bc2e41   </boot/system/add-ons/kernel/file_systems/vmwfs> VMWNode::DeleteChildIfExists(char const*) + 0x55
17 ffffffff81bd9c60 (+  32) ffffffff81bc3e5d   </boot/system/add-ons/kernel/file_systems/vmwfs> vmwfs_remove_vnode(fs_volume*, fs_vnode*, bool) + 0x38
18 ffffffff81bd9c80 (+  64) ffffffff800fad65   <kernel_x86_64> free_vnode(vnode*, bool) + 0x185
19 ffffffff81bd9cc0 (+  80) ffffffff800fb47c   <kernel_x86_64> dec_vnode_ref_count(vnode*, bool, bool) + 0x2ac
20 ffffffff81bd9d10 (+ 144) ffffffff800fd696   <kernel_x86_64> vnode_path_to_vnode(vnode*, char*, bool, int, io_context*, vnode**, long*) + 0x156
21 ffffffff81bd9da0 (+  80) ffffffff800fdd9c   <kernel_x86_64> () + 0x3c
22 ffffffff81bd9df0 (+  48) ffffffff80104d51   <kernel_x86_64> common_path_read_stat(int, char*, bool, stat*, bool) + 0x21
23 ffffffff81bd9e20 (+ 256) ffffffff801094d5   <kernel_x86_64> _user_read_stat + 0x135
24 ffffffff81bd9f20 (+  16) ffffffff80146c38   <kernel_x86_64> x86_64_syscall_entry + 0xfe

Crash 2:

PANIC: vm_page_fault: unhandled page fault in kernel space at 0xffffffff0072616e, ip 0xffffffff81c40e1a

Welcome to Kernel Debugging Land...
Thread 2577 "python3" running on CPU 0
stack trace for thread 2577 "python3"
    kernel stack: 0xffffffff81c53000 to 0xffffffff81c58000
      user stack: 0x00007f86153f9000 to 0x00007f86163f9000
frame                       caller             <image>:function + offset
…
 4 ffffffff81c575d0 (+ 240) ffffffff800ab0f7   <kernel_x86_64> panic + 0xb7
 5 ffffffff81c576c0 (+ 240) ffffffff80132420   <kernel_x86_64> vm_page_fault + 0x260
 6 ffffffff81c577b0 (+  64) ffffffff8014feb0   <kernel_x86_64> x86_page_fault_exception + 0x160
 7 ffffffff81c577f0 (+ 888) ffffffff8014692c   <kernel_x86_64> int_bottom + 0x80
kernel iframe at 0xffffffff81c57b68 (end = 0xffffffff81c57c30)
 rax 0xffffffff99f262d0    rbx 0xffffffff99f262d0    rcx 0xe
 rdx 0x4                   rsi 0xffffffff99f262d0    rdi 0xffffffff9a103758
 rbp 0xffffffff81c57c60     r8 0x0                    r9 0x38
 r10 0x0                   r11 0xffffffff8e19d000    r12 0xffffffff99f262d0
 r13 0x0                   r14 0xffffffff9a103758    r15 0xffffffff00726166
 rip 0xffffffff81c40e1a    rsp 0xffffffff81c57c30 rflags 0x10282
 vector: 0xe, error code: 0x0
 8 ffffffff81c57b68 (+ 248) ffffffff81c40e1a   </boot/system/add-ons/kernel/file_systems/vmwfs> VMWNode::DeleteChildIfExists(char const*) + 0x2e
 9 ffffffff81c57c60 (+  32) ffffffff81c41e5d   </boot/system/add-ons/kernel/file_systems/vmwfs> vmwfs_remove_vnode(fs_volume*, fs_vnode*, bool) + 0x38
10 ffffffff81c57c80 (+  64) ffffffff800fad65   <kernel_x86_64> free_vnode(vnode*, bool) + 0x185
11 ffffffff81c57cc0 (+  80) ffffffff800fb47c   <kernel_x86_64> dec_vnode_ref_count(vnode*, bool, bool) + 0x2ac
12 ffffffff81c57d10 (+ 144) ffffffff800fd6f6   <kernel_x86_64> vnode_path_to_vnode(vnode*, char*, bool, int, io_context*, vnode**, long*) + 0x1b6
13 ffffffff81c57da0 (+  80) ffffffff800fdd9c   <kernel_x86_64> () + 0x3c
14 ffffffff81c57df0 (+  48) ffffffff80104d51   <kernel_x86_64> common_path_read_stat(int, char*, bool, stat*, bool) + 0x21
15 ffffffff81c57e20 (+ 256) ffffffff801094d5   <kernel_x86_64> _user_read_stat + 0x135
16 ffffffff81c57f20 (+  16) ffffffff80146c38   <kernel_x86_64> x86_64_syscall_entry + 0xfe

Crash 3:

vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x8, ip 0xffffffff81fe1ee3, write 0, user 0, thread 0x936
PANIC: vm_page_fault: unhandled page fault in kernel space at 0x8, ip 0xffffffff81fe1ee3


stack trace for thread 2358 "bash"
    kernel stack: 0xffffffff81fc2000 to 0xffffffff81fc7000
      user stack: 0x00007fa1ca357000 to 0x00007fa1cb357000
frame                       caller             <image>:function + offset
…
11 ffffffff81fc6790 (+ 240) ffffffff800ab0f7   <kernel_x86_64> panic + 0xb7
12 ffffffff81fc6880 (+ 240) ffffffff80132420   <kernel_x86_64> vm_page_fault + 0x260
13 ffffffff81fc6970 (+  64) ffffffff8014feb0   <kernel_x86_64> x86_page_fault_exception + 0x160
14 ffffffff81fc69b0 (+ 904) ffffffff8014692c   <kernel_x86_64> int_bottom + 0x80
kernel iframe at 0xffffffff81fc6d38 (end = 0xffffffff81fc6e00)
 rax 0xffffffff9a01e770    rbx 0xffffffff9a01e770    rcx 0x73
 rdx 0x0                   rsi 0xffffffff99ff401a    rdi 0xffffffff9a01e5f0
 rbp 0xffffffff81fc6e30     r8 0x3                    r9 0x38
 r10 0x0                   r11 0x1f4                 r12 0xffffffff99ff401a
 r13 0x0                   r14 0xffffffff9a0c5e18    r15 0xffffffff9a0c5e18
 rip 0xffffffff81fe1ee3    rsp 0xffffffff81fc6e00 rflags 0x10282
 vector: 0xe, error code: 0x0
15 ffffffff81fc6d38 (+ 248) ffffffff81fe1ee3   </boot/system/add-ons/kernel/file_systems/vmwfs> VMWNode::GetChild(char const*) + 0x73
16 ffffffff81fc6e30 (+  80) ffffffff81fe113a   </boot/system/add-ons/kernel/file_systems/vmwfs> vmwfs_read_dir(fs_volume*, fs_vnode*, void*, dirent*, unsigned long, unsigned int*) + 0x79
17 ffffffff81fc6e80 (+  64) ffffffff800fdbae   <kernel_x86_64> dir_read(io_context*, file_descriptor*, dirent*, unsigned long, unsigned int*) + 0x4e
18 ffffffff81fc6ec0 (+  96) ffffffff800ec0d6   <kernel_x86_64> _user_read_dir + 0xd6
19 ffffffff81fc6f20 (+  16) ffffffff80146c38   <kernel_x86_64> x86_64_syscall_entry + 0xfe

The problem seems to be in the VMWNode children management logic, it looks like some nodes are used after their memory has been freed.

@volo-droid
Copy link
Contributor Author

Looking at the code, I would say it requires a major refactoring in a few places like VMWSharedFolders and VMWNode. It might be worth trying to port open-vm-tools instead (or at least their hgfs implementation).

@trungnt2910
Copy link

What kind of refactoring is needed?

It might be worth trying to port open-vm-tools instead (or at least their hgfs implementation).

It's actually too much work as open-vm-tools has a lot of components to port.

@volo-droid
Copy link
Contributor Author

volo-droid commented Jan 29, 2023

What kind of refactoring is needed?

First of all, there's a use-after-free bug somewhere in the code, which I wasn't able to find.
To simplify debugging and fixing of issues like that it makes sense to rewrite the filesystem driver as UserlandFS/FUSE (FUSE implementation from open-vm-tools might provide some help there), also currently the project uses VmWare filesystem version 1 which is slow and a bit limited, so that would also need to be updated (and again, the headers from the open-vm-tools can speed it up).

@diversys
Copy link
Member

There seems to be a working driver from open-vm-tools ported to Haiku here https://github.com/trungnt2910/open-vm-tools

@trungnt2910
Copy link

trungnt2910 commented Oct 23, 2023

The FUSE driver from that port is known to work. That said, there are some certain quirks (I can't remember what since it's been quite a while since I used that driver myself).

The port itself has a few stubs here and there, for example in scripts/haiku/network. There may be some other stubs I have forgotten that might or might not affect the hgfs driver.

@waddlesplash
Copy link
Member

The filesystem driver doesn't appear to use any locking at all. That's probably the source of the crashes.

@volo-droid
Copy link
Contributor Author

The FUSE driver from that port is known to work. That said, there are some certain quirks (I can't remember what since it's been quite a while since I used that driver myself).

Well done! Would you be up for upstreaming it? The open-vm-tool team welcomes porting the project to new operating systems.

@Coldfirex
Copy link

Would @trungnt2910's port be recommended over these add-ons in general at this point? Is a recipe available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants