Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on GPU driver when setup with container #149

Open
Greyyy-HJC opened this issue Jan 6, 2024 · 0 comments
Open

Error on GPU driver when setup with container #149

Greyyy-HJC opened this issue Jan 6, 2024 · 0 comments

Comments

@Greyyy-HJC
Copy link

When I try to setup with podman, I did two small modifications to make it work.

  1. Specify the RHEL_VERSION at the beginning of Dockerfiles;
  2. add the code below before using groupadd in the Dockerfiles.
FROM rockylinux:${RHEL_VERSION}-minimal

RUN microdnf update -y && \
    microdnf install shadow-utils -y && \
    microdnf clean all

Though I successfully build up the container and get inside, when I run application via "./gaugeFixing", I got the error as below

# [2024-01-06 01:18:29] FATAL: [Rank 0] A GPU error occured: communicationBase_mpi.cpp: Failed to count devices (gpuGetDeviceCount): CUDA driver version is insufficient for CUDA runtime version ( cudaErrorInsufficientDriver )
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 0] A GPU error occured: communicationBase_mpi.cpp: Failed to count devices (gpuGetDeviceCount): CUDA driver version is insufficient for CUDA runtime version ( cudaErrorInsufficientDriver )

and I tried nvidia-smi, I got

[simulateqcd@510d453c6799 applications]$ nvidia-smi
bash: nvidia-smi: command not found
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant