Replies: 1 comment
-
Plugins resolve the third party lib and dependency issues. The others remain, but we simply don't envision having the resource to do anything but "use as-is" in the foreseeable future. We'll go on like this. Closing the discussion |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm gonna be talking about llama.cpp here, but all of this is exactly applicable to whisper.cpp as well (including libcommon)
Problems and drama
C interface
The library provides a C interface, whereas the C++ code is entirely in the .cpp file and not accessible in a conventional manner. This imposes many problems:
unique_ptr<T>
but have to add custom deleters for library types, which invoke the C functions (which internally calldelete
)The last point has a somewhat complicated example: backends. Ggml itself supports multiple backends, but
llama.cpp
supports two. CPU and non-CPU. The latter being whatever from the available models you have first. This is not a huge issue for mobile platforms as there are none (to my knowledge) which support more than one non-CPU backend, but it is on desktop. Say I build ggml and llama.cpp with both Vulkan and CUDA support. Running the code would only allow me access to CUDA (and CPU) because it happens to be first backend from these which is checked by the API. To access Vulkan, I have to build a binary without CUDA support.There are other problems like this: mmap control, file i/o control, global state, vocabulary access. All in the .cpp
To actually support multiple backends one would need to build llama.cpp multiple times and load the required instances of library as a plugin (
dlopen/dlsym
). That's what ollama does.Third party libraries.
llama.cpp uses third party libraries (like stb_image and nlohmann::json). It does so by having copies of them int its codebase. If statically link with llama.cpp (or its helper library
common
which is always static), and happen to use different versions of any of these libraries as well, this will lead to linker errors, or more-likely and terrifyingly, to ODR violations manifesting as crashes. We would ideally link statically on mobile platforms.Moreover it does so without relying on CMake or a build system, but by directly adding the sources to their targets. An
if(TARGET...
approach as the one taken with ggml itself is not applicable here.Possible solutions
llama.cpp
repo and instead selectively pull code for our needs, relying only on ggml.#include "llama.cpp"
)static
funcs with the same name, guarding includes... stuff like that. Definitely won't be as big of a merge blocker as reimplementingI am leaning towards "Only reimplement libcommon and use llama.cpp as a shared library", again through creative copy-pasting where it makes sense. (for now). In the future, once we're past the MVP we should go and reimplement llama.cpp as well, and ditch the llama.cpp repo altogether.
This means that changes to libcommon would have to be manually tracked. This also goes for the main inference code, so it doesn't sound like a huge deal.
Including source files is the worst, and I'm against it.
Comments, questions, concerns, suggestions?
Beta Was this translation helpful? Give feedback.
All reactions