Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added reading and writing of a state-cache #16

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

srogmann
Copy link
Contributor

@srogmann srogmann commented Oct 5, 2024

This PR contains an exemplary implementation for storing the computed states of a system-prompt on disk.

I wrote this implementation a few weeks ago, so it may not be mergeable directly.

@mukel
Copy link
Owner

mukel commented Oct 7, 2024

Hey @srogmann, this this awesome!
I rebased an played with it, there are a some rough edges.

  • The cached prompt must match exactly the given prompt, to improve usability is should start from the largest prefix that matches.
  • Needs some usability polish, it's not clear (no docs) how to cache a prompt and how to use it.

This feature is a must have, and I'd really like caching to be composable, so it can be easily and transparently turned on for e.g. completions API. I've discussed with a colleague how to do caching on disk and we have some ideas, note that KV caches can be also quantized to save memory and disk space.

I'm busy this week, but this is a good start!

@srogmann
Copy link
Contributor Author

Hi @mukel,
the first version of the state-cache was deliberately kept somewhat brief. I added a support of largest prefix and some documentation. I added the name of the gguf-file to avoid confusion.

@mukel mukel mentioned this pull request Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants