Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does IOMMU needs to be OFF ? #18

Open
2 tasks done
ArtificialEU opened this issue Sep 26, 2024 · 1 comment
Open
2 tasks done

Why does IOMMU needs to be OFF ? #18

ArtificialEU opened this issue Sep 26, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@ArtificialEU
Copy link

NVIDIA Open GPU Kernel Modules Version

X

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

X

Kernel Release

X

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

NVIDIA RtX 4090

Describe the bug

Hi,

I'm using my GPU Cluster inside a proxmox VM and i'm passing through my devices, so IOMMU off is kinda not possible, what's the reason behind it and the consequences ?

Thanks

To Reproduce

X

Bug Incidence

Once

nvidia-bug-report.log.gz

X

More Info

No response

@ArtificialEU ArtificialEU added the bug Something isn't working label Sep 26, 2024
@legraphista
Copy link

legraphista commented Oct 1, 2024

Through my troubleshooting, I found out that P2P can be enabled with IOMMU on, but it may not work as intended.
When moving data between GPUs via P2P, I was getting IO_PAGE_FAULT and the copied tensor would be allocated but zeroed out.

Disabling IOMMU fixed this behavior in my case.

Note: my SR-IOV was disabled, your mileage may vary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants