Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error "libcudart.so.11.0: cannot open shared object file" when using Docker image #20

Open
tsh11na opened this issue Oct 13, 2022 · 2 comments

Comments

@tsh11na
Copy link

tsh11na commented Oct 13, 2022

I've been trying to train the LIVECELL anchor-based model with my dataset, but the model failed to start learning.

I used Docker image pytorch/pytorch:1.5-cuda10.1-cudnn7-devel to match the versions you mentioned in the paper.
Then I got the error saying "libcudart.so.11.0: cannot open shared object file: No such file or directory".

The error traceback is as follows:

Traceback (most recent call last):
  File "train_net.py", line 27, in <module>
    from detectron2.data import MetadataCatalog
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/data/__init__.py", line 4, in <module>
    from .build import (
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/data/build.py", line 14, in <module>
    from detectron2.structures import BoxMode
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/structures/__init__.py", line 6, in <module>
    from .keypoints import Keypoints, heatmaps_to_keypoints
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/structures/keypoints.py", line 6, in <module>
    from detectron2.layers import interpolate
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/layers/__init__.py", line 3, in <module>
    from .deform_conv import DeformConv, ModulatedDeformConv
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/layers/deform_conv.py", line 10, in <module>
    from detectron2 import _C
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

This is probably because the CUDA toolkit version inside Docker image (10.1) mismatches that of Detecton2-ResNest (11.x?).
Should I specify the version of Detectron2-ResNest?

Environment

Hardware

OS: Ubuntu 20.04.5 LTS on WSL 2
CPU: Intel Core i9-10940X
GPU:NVIDIA TITAN RTX(Turing architecture)
DRAM: 100GB

nvidia-smi

nvidia-smi

@RickardSjogren
Copy link
Contributor

Hi @tsh11na,
It might be the case that it is the version Detectron2-ResNest that is causing problems and I see that the version of it is not specified in the repo.

@nabeelkhalid92, can you help out with which version you used?

@nabeelkhalid92
Copy link
Collaborator

Hi @tsh11na,
You have to install the detectron2 with the same Cuda version i.e., 10.1.
You can find the matching detectron2 versions from here: detectron2 installations
Also, the anchor-based model was implemented using the Python programming language v.3.6.10, the deep learning framework PyTorch v.1.5.0, and the object detection library Detectron2 v.2.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants