Skip to content

alpaca-core/ilib-llama.cpp

Repository files navigation

llama.cpp

This is a wrapper of llama.cpp implemented as per the discussion Integration of llama.cpp and whisper.cpp:

  • Use the llama.cpp C interface in llama.h
  • Reimplement the common library

As mentioned in the discussion the (maybe distant) future plan is to ditch llama.cpp by reimplementing entirely with vanilla ggml and a C++ interface.

Build

Important

When cloning this repo, don't forget to fetch the submodules.

  • Either: $ git clone https://github.com/alpaca-core/ac-local.git --recurse-submodules
  • Or:
    • $ git clone https://github.com/alpaca-core/ac-local.git
    • $ cd ac-local
    • $ git submodule update --init --recursive

Reimplementation Notes:

  • Better error handling, please
  • GGUF metadata access (llama_model_meta_*) is not great. We should provide a better interface
  • llama_chat_apply_template does not handle memory allocation optimally. There's a lot of room for improvement
    • as a whole, chat management is not very efficient. llama_chat_format_single doing a full chat format for a single message is terrible
  • Chat templates can't be used to escape special tokens. If the user actually enters some, this just messes-up the resulting formatted text.
  • Give vocab more visibility
  • Token-to-text can be handled much more elegantly by using plain ol' string_view instead of copying strings. It's not like tokens are going to be modified once the model is loaded
    • If we don't reimplement, perhaps keeping a parallel array of all tokens to string would be a good idea
  • llama_batch being used for both input and output makes it hard to propagate the constness of the input buffer. This leads to code having to use non-const buffers, even if we know they're not going to be modified. We should bind the buffer constness to the batch struct itself.
  • The low-level llama context currently takes a rng seed (which is only used for mirostat sampling). A reimplemented context should be deterministic. If an operation requires random numbers, a generator should be provided from the outside.
    • For now we will hide the mirostat sampling altogether and ditch the seed
  • As per this discussion we should take into account how we want to deal with asset storage and whether we want to abstract the i/o away.

About

Alpaca Core wrapper and plugin for llama.cpp

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published