Skip to content

Worse speed and GPU load than pure llama-cpp #1831

Answered by Mushoz
Mushoz asked this question in Q&A
Discussion options

You must be logged in to vote

Managed to find the answer myself. For some reason the logits_all parameter defaults to true and tanks performance. Setting it to false brings the performance on par with pure llama-cpp. Not sure if that's a sensible default, but at least I managed to solve the problem. GPU load is also back to 100% again.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@ExtReMLapin
Comment options

Answer selected by Mushoz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants