Replies: 1 comment
-
The stateless executor does not track state between prompts. However, it does still do out-of-context handling so that if a single prompt + it's response is > context size it can handle it.
Long term, we want to replace the executors with something more flexible which allows things such as context extension to be plugged in more easily. Short term, it looks like |
Beta Was this translation helpful? Give feedback.
-
When using the Interactive/Instruct executors with chat history once the context limit is reached the error "llama_decode failed: 'NoKvSlot'" will be thrown. This is discussed in the issue #660
Digging through the code I noticed the method "HandleRunOutOfContext" in "LlamaExecutorBase" has some comments from an example in llama.cpp, but does not implement the context switching or self-extension. The stateless executor does implement some of the logic even though it is not supposed to keep track of state. Is that correct?
Are there any plans to add this functionality (infinite context/self-extension) to the interactive/instruct executors.
Is there a way to extend the kvcache with these executors?
How are we supposed to handle context overflow with these executors?
Beta Was this translation helpful? Give feedback.
All reactions