llama.cpp

This is a wrapper of llama.cpp implemented as per the discussion Integration of llama.cpp and whisper.cpp:

Use the llama.cpp C interface in llama.h
Reimplement the common library

As mentioned in the discussion the (maybe distant) future plan is to ditch llama.cpp by reimplementing entirely with vanilla ggml and a C++ interface.

Build

Important

When cloning this repo, don't forget to fetch the submodules.

Either: $ git clone https://github.com/alpaca-core/ac-local.git --recurse-submodules
Or:
- $ git clone https://github.com/alpaca-core/ac-local.git
- $ cd ac-local
- $ git submodule update --init --recursive

Reimplementation Notes:

Better error handling, please
GGUF metadata access (llama_model_meta_*) is not great. We should provide a better interface
llama_chat_apply_template does not handle memory allocation optimally. There's a lot of room for improvement
- as a whole, chat management is not very efficient. llama_chat_format_single doing a full chat format for a single message is terrible
Chat templates can't be used to escape special tokens. If the user actually enters some, this just messes-up the resulting formatted text.
Give vocab more visibility
Token-to-text can be handled much more elegantly by using plain ol' string_view instead of copying strings. It's not like tokens are going to be modified once the model is loaded
- If we don't reimplement, perhaps keeping a parallel array of all tokens to string would be a good idea
llama_batch being used for both input and output makes it hard to propagate the constness of the input buffer. This leads to code having to use non-const buffers, even if we know they're not going to be modified. We should bind the buffer constness to the batch struct itself.
The low-level llama context currently takes a rng seed (which is only used for mirostat sampling). A reimplemented context should be deterministic. If an operation requires random numbers, a generator should be provided from the outside.
- For now we will hide the mirostat sampling altogether and ditch the seed
As per this discussion we should take into account how we want to deal with asset storage and whether we want to abstract the i/o away.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
ac-local-plugin		ac-local-plugin
code		code
example		example
llama.cpp @ 0da5d86		llama.cpp @ 0da5d86
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
LICENSE		LICENSE
README.md		README.md
VSOpenFileFromDirFilters.json		VSOpenFileFromDirFilters.json
get-ac-build.cmake		get-ac-build.cmake
get_cpm.cmake		get_cpm.cmake

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama.cpp

Build

Reimplementation Notes:

About

Releases

Packages

Contributors 2

Languages

License

alpaca-core/ilib-llama.cpp

Folders and files

Latest commit

History

Repository files navigation

llama.cpp

Build

Reimplementation Notes:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages