Error on GPU driver when setup with container #149

Greyyy-HJC · 2024-01-06T02:09:37Z

When I try to setup with podman, I did two small modifications to make it work.

Specify the RHEL_VERSION at the beginning of Dockerfiles;
add the code below before using groupadd in the Dockerfiles.

FROM rockylinux:${RHEL_VERSION}-minimal

RUN microdnf update -y && \
    microdnf install shadow-utils -y && \
    microdnf clean all

Though I successfully build up the container and get inside, when I run application via "./gaugeFixing", I got the error as below

# [2024-01-06 01:18:29] FATAL: [Rank 0] A GPU error occured: communicationBase_mpi.cpp: Failed to count devices (gpuGetDeviceCount): CUDA driver version is insufficient for CUDA runtime version ( cudaErrorInsufficientDriver )
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 0] A GPU error occured: communicationBase_mpi.cpp: Failed to count devices (gpuGetDeviceCount): CUDA driver version is insufficient for CUDA runtime version ( cudaErrorInsufficientDriver )

and I tried nvidia-smi, I got

[simulateqcd@510d453c6799 applications]$ nvidia-smi
bash: nvidia-smi: command not found

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on GPU driver when setup with container #149

Error on GPU driver when setup with container #149

Greyyy-HJC commented Jan 6, 2024

Error on GPU driver when setup with container #149

Error on GPU driver when setup with container #149

Comments

Greyyy-HJC commented Jan 6, 2024