Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains:
What is the current behavior? (You can also link to an open issue here)
I wanted to test out scoring using logprob outputs for implementing MMLU. Unfortunately I couldn't do this with any of the local model providers I was testing, so ended up just using the top token rather than the logprobs (see https://github.com/UKGovernmentBEIS/inspect_evals/pull/21/files#diff-5b259e4149ade4c897cf11f70e74fd670ea0e71b6c62449c52808ef440c424f4R14-R18).
I realised there's no support for logprobs from any locally run LLM. This is because:
What is the new behavior?
I added support for the llama-cpp-python OpenAI-compatible server. This is a popular (7.9k stars) library that offers a OpenAI compatible server. It can run models locally, but unlikely Ollama can provide logprobs in its response.*
This was actually fairly straightforward, because it supports the OpenAI spec. So it's basically just another wrapper around OpenAI. I added tests and was able to get them running happily locally.*
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
I don't think this introduces a breaking change.
Other information:
*Unfortunately, while llama-cpp-python supports logprobs it does so for the completions API, not the chat completions API which is used by Inspect. This means that everything is not perfect with logprob support yet.
I have raised a pull request to fix this upstream: abetlen/llama-cpp-python#1788
In the mean time, this PR works fine for using llama-cpp-python, except for the logprobs functionality doesn't (the reason I created it 😅). If you checkout and run the version in my branch for llama-cpp-python everything works fully happily. I think for the sake of not confusing people though, it might be worth keeping this in draft until the upstream PR gets merged and released.