-
Notifications
You must be signed in to change notification settings - Fork 15.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add embeddings cache #8976
Add embeddings cache #8976
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high level: are we sure we prefer explicitly wrapping the underlying model vs having a sep global cache? that's what we have for chat and llms, which i think is slightly nicer UX. but definitely see how this makes for cleaner code
return cast(List[float], json.loads(serialized_value.decode())) | ||
|
||
|
||
class CacheBackedEmbedder(Embeddings): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would call CacheBackedEmbeddings and underlying_embeddings. don't disagree that embedder might be clearer name but think consistency is higher priority
Returns: | ||
The embedding for the given text. | ||
""" | ||
# Query is not cached at the moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see doc-string above ^
|
||
def _value_serializer(value: Sequence[float]) -> bytes: | ||
"""Serialize a value.""" | ||
return json.dumps(value).encode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the kind of thing where you should specify the codec or na
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you paraphrase?
the store that goes into the embedder accepts an encoder for the key, and also a serializer (both encoder and decoder) for the value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I see we only have in-memory and file store storage implementations at the moment, so maybe it's too early to provide more guidance here, but I'd imagine users would want some direction on what the recommended approach is to avoid redundantly storing docs in a vector store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you paraphrase a bit? Are you thinking about actual docs in the vectorstore or caches for embeddings.
For the former, I'm planning on PRing something next week to help with that!
This PR is for storing the hashes and the resulting embeddings in the key-value store.
This data won't consume that much space. The larger issue is that it can create a lot of keys, and the key-value store may have some trouble listing the keys (e.g., not a good idea to have 1 million files in a given directory). For the file system store, we should re-write the file system to store the keys in a tree structure when this becomes an issue.
I'm guessing in general once we reach ball park of ~1 million docs that need to be embedded, some of the design decisions will have to be re-evaluated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
@baskaryan will review current llm cache set up |
@baskaryan I think we can add the global cache specification in the future (I think it's complimentary to these changes) -- although I am not sure that I like the idea of doing it -- I think it may encourage large structural issues with user code only to save a few lines of code |
This PR adds the ability to temporarily cache or persistently store embeddings.
A notebook has been included showing how to set up the cache and how to use it
with a vectorstore.