Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for GPU usage (w/ batching) when loading a KB using ElasticsearchQA #373

Open
murali1996 opened this issue Oct 5, 2021 · 0 comments

Comments

@murali1996
Copy link
Contributor

Loading a large KB while using the Elasticsearch Question Answerer with query_type {'embedder', 'embedder_text', 'embedder_keyword'} can be time consuming if the process of obtaining embeddings is not batched or is configured to use GPU when available.

What can be modified in the codebase:
This method def _doc_generator(data_file, embedder_model=None, embedding_fields=None): in the question_answerer.py file can first obtain all the embeddings of the all docs, dump the embeddings cache and then use the transform method on each doc while creating docs for elasticsearch index creation.

Optional comments on memory optimization:
The solution suggested above as well as the current implementation isn't optimized for embeddings held in RAM memory. Meaning, all the embeddings are kept in memory for elasticsearch to query embeddings of each KB doc. Maybe this is something to look into if we think of loading large KBs smoothly (say order of >50K documents).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant