Looking for the right way to run a local CUDA container #472
-
I have a system with the NVIDIA drivers, CUDA, and CDI installed, and a hacked up way that appears to be building If I run a nvidia-smi test and pass the devices using the local CUDA container, I can see the GPU
What I can't seem to figure out is how to get the device passed when using ramalama, it doesn't seem to recognize the GPU or try the CUDA container.
I've tried using the --image flag (if I understood the doc right) but that doesn't see the GPU.
I expect there's a configuration somewhere I missed to connect the CDI to ramalama? How do I get ramalama to use this GPU? |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 11 replies
-
@bmahabirbu got this running, dunno if you need Nvidia container toolkit... Let's keep making Nvidia boring (I know the lack of publishing sucks though) |
Beta Was this translation helpful? Give feedback.
-
@nzwulfin Try my fork of ramalama using the nv-simple branch Give it a run and see if it works. Youll need the Nvidia container toolkit package for this to work! (Assuming you already have a cuda driver installed as well). I pushed a build that works into docker hub temporarily for testing purposes. Here is an example of how to run it without installing ramalama
in the meantime run ramalama with the debug flag like so
look for the exec_cmd and copy everything before the /bin/sh command. You can use that command to run the container with proper mounts to the models stored in ramalama. you can change the container name as well to test different ones making debugging easier This is what mine looks like
In general the once inside the container run There is no additional flag on ramalama to get GPU running on ramalama btw! |
Beta Was this translation helpful? Give feedback.
-
Thanks! Disecting the podman run and llama chat commands really helped. For some reason,
|
Beta Was this translation helpful? Give feedback.
-
I also realized from your branch that |
Beta Was this translation helpful? Give feedback.
-
I'm making a lot of assumptions because I only have access to one system with an NVIDIA card and it already has CUDA installed an working, so I don't know how stable or useful this might be for detection. The CUDA driver packages are probably a requirement for any of this to work, and the packages in rpm-fusion include the
On my system with a 3060 this is what the command output looks like
So maybe a test for the binary and then a non-zero response to the specific compute_cap query via an os.cmd? I don't know what that query would give on different cards.
Thoughts? |
Beta Was this translation helpful? Give feedback.
-
Describing what we are trying to do is auto-select the primary GPU in a system to use (but one may manually pick a GPU also). The simplest metric to do this, is the GPU with the most VRAM. We would like to do this with minimal dependencies as possible and if we need to use dependencies in the container image if possible (if you see the AMD implementation, it's all done in python). Also if all the GPUs have < 1GB VRAM, we probably want to just automatically do CPU inferencing. |
Beta Was this translation helpful? Give feedback.
-
@bmahabirbu I'm about to head on vacation for a few days, but I ginned up this detection that works locally, wondering if it works for you. Unlike Eric, I won't be able to see questions on my phone :) I'm running this on Fedora 41 with a 3060 12G, nvidia-container-toolkit-1.17.2-1, xorg-x11-drv-nvidia-565.57.01-3.fc41, and xorg-x11-drv-nvidia-cuda-565.57.01-3 It uses nvidia-smi which may not be an overall good direction. my very elegant test to fail is just renaming /usr/bin/nvidia-smi. Mostly just looking for feedback, I get the detection direction might be headed in different ways. It's been a little while since I've done a lot of python. |
Beta Was this translation helpful? Give feedback.
@nzwulfin Try my fork of ramalama using the nv-simple branch
https://github.com/bmahabirbu/ramalama/tree/nv-simple
Give it a run and see if it works. Youll need the Nvidia container toolkit package for this to work! (Assuming you already have a cuda driver installed as well). I pushed a build that works into docker hub temporarily for testing purposes.
Here is an example of how to run it without installing ramalama
in the meantime run ramalama with the debug flag like so
look for the exec_cmd and copy everything before the /bin/sh command. You …