-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Shared memory not working, results in Segfault #611
Comments
Same for me on NixOS:
I have everything needed for llamafile to compile and run on AMD set using environment variables. Update: just checked and the latest working version for me is llamafile-0.8.13, everything else results in segfault. |
@OEvgeny Are you having low VRAM but large RAM setup? My issue is specifically not being able to offload to system memory when I use large models which Ollama seems to be capable of. |
I have 16/64. Had similar issue, although I built from the latest main and all works great from there. It is pretty easy to do. |
Just piping in some more data here: Same error on versions >=0.8.14 when using gpu. 0.8.13 is last working version for running on gpu. Otherwise, I can run on cpu just fine(slooooow). This is tested on an Arch machine. |
Doesn't 0.8.17 work for you? |
Also if something changed in your system, you should remove |
I have not tried to compile the source. I can do that and get back to ya. |
@JohnnySn0w I checked briefly when v 0.8.17 got released and I think it fixed the issue for me. Here is the link to the release: https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.8.17 There's no need to build from source to fix the problem anymore it seems. |
I'm seeing similar issue on any model on 0.9.0 now (after the gpu support rebuilt from the binary):
|
Contact Details
[email protected]
What happened?
Thank you Justine and team for the llamafile.
I have 16GB VRAM and 96GB RAM in my system (Fedora 41).
When I run
gemma-2-27b-it.Q6_K.llamafile
with-ngl 1
I get Segmentation Fault.The model works fine when I don't use GPU offloading. I use the same model in Ollama all the time, where the VRAM and RAM are shared resulting in better performance. I'm told llama.cpp uses system ram when we try to run model that's more than Vram, Doesn't llamafile do the same?
Version
llamafile v0.8.15
What operating system are you seeing the problem on?
Linux
Relevant log output
The text was updated successfully, but these errors were encountered: