-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to run on A100? #31
Comments
I have tried that dockerfile, the torch patch seems not compatible with the pytorch version 7bcf7da3a268b435777fe87c7794c382f444e86d |
Can you provide the patch for a newer pytorch version? That would be helpful. Thanks! |
Hello, thank you for your comment! No, the dockerfile is not ready yet. We are working on open-sourcing a version of Orion compatible with A100 GPUs. The AE fig7 was run on a V100 GPU. I expect the version for A100 GPUs (supporting cuda versions >10.2) will be out in the next few weeks. |
I too got same error on any other GPU's other than v100's maybe. Tried it on 3070 and a100, both same error, (no kernel image available) CUDA Runtime Error at: intercept_temp.cpp:453 kindly @fotstrt please advise whether non docker path works for 3070 and A100, ? TIA. |
This will be addressed in the following 2 weeks. Thank you! |
Hi, I wonder if this is already addressed. I would like to try Orion on CUDA 12.1. Would you please point me to the correct branch? Is it fot/latest_cuda_changes? Thanks! |
Hello, there has been a delay, sorry about that. The branch fot/latest_cuda_changes contains a Dockerfile: https://github.com/eth-easl/orion/blob/fot/latest_cuda_changes/setup/Dockerfile_Cuda12 where i have tested some basic Orion functionality, but not fully tested the system yet (that's why it is not merged). I plan to do more tests and merge soon. |
Thanks for your reply. I tried this Dockerfile to build a CUDA12.1 version image, but it reports many errors. I create the container following the guidance in INSTALL.md bud it does not work. |
Moreover, the reason for these errors is that cuDNN lib is not linked or installed correctly. |
Is the dockerfile in the latest_cuda_changes branch runnable on A100? It seems that the container built with the dockerfile in the main branch has some problems running your AE fig7 script 'python run_orion.py', reporting the error when I run it on my A100 machine:
The text was updated successfully, but these errors were encountered: