Using LLMs to improve user experience with packages/documentation #75
adamkucharski
started this conversation in
Show and tell
Replies: 1 comment 1 reply
-
Repository moved to here so can add issues and iterate: https://github.com/epiverse-trace/llm-guidance/tree/main
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
One of the things that came up during the Summit is how to help users interact with our documentation and packages. Specifically, if someone wants to do a particular task, what's the fastest, easiest way of getting them the advice they need?
Andree and I - in discussion with Yale Fox of Applied Science - have been exploring the use of large language models to interpret user queries and provide answers based on Epiverse-TRACE documentation, using the OpenAI API. The basic ChatGPT doesn't know anything about our packages, given its training period, and can hallucinate or miss domain-specific nuances that are outlined in our vignettes.
One challenge is the token limit in API request length (4096 tokens for the standard GPT-3.5 model, or about 16000 characters) - so we can't simply put all our documentation in the prompt along with the user question.
Instead, we've been developing a Shiny implementation of that makes use of vector embeddings as well as generative models, e.g. Document Q&A with LangChain. The basic pipeline is as follows:
Some examples that work nicely to illustrate functionality:
"Calculate case fatality risk over time"
"How do I calculate the parameters of a Gamma distribution from the mean and variance?"
"How can I calculate force of infection over time from serological data?"
"How can I clean my data?"
Full repo here: https://github.com/adamkucharski/epiverse-llm - happy to move over to main repo when/if better developed. The pre-processing embedding code is in
R_not_run/generate_doc_embeddings.R
using the functions inhelper_functions.R
. Andree has also put together some useful training questions that we need to incorporate as well.The advantage of above pipeline is it gives a lot of control over what is displayed (i.e. minimises risk of hallucinations or broken generative code). There's also potential to introduce additional chain steps (e.g. summarise key aspects of chunks before passing to the GPT model), and expand to other packages in the ecosystem.
Obviously would eventually be more powerful for users if we could generate whole scripts that could then run (e.g. like https://rtutor.ai/) - but this would have much larger potential for errors to be introduced. Keen to hear thoughts/ideas!
Beta Was this translation helpful? Give feedback.
All reactions