May I ask, is the issue I mentioned a bug? How can it be resolved? #3656
Replies: 3 comments 1 reply
-
In conversation orchestration page, you need to adjust the max token of chosen ollama model. This meant to be the length of output token. It's different with the whole input tokens. Regarding the first question, please read this thread. ollama/ollama#2851 |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Do you hava solved this problem? I'm also meet the same situation.And when I according to the document to adjust the model max token, it not work.God help me ! |
Beta Was this translation helpful? Give feedback.
-
1、Why is the response from calling large models within Ollama through the command line very quick, while it is much slower through Dify, even by a factor of ten to twenty times slower? Additionally, calling Ollama through open-webui is also more than ten times faster than through Dify. What is the reason for this discrepancy?
2、When uploading an image and calling the LLaVA-13B model of Ollama via Dify, the following prompt appears: "Query or prefix prompt is too long, you can reduce the prefix prompt, or shrink the max token, or switch to an LLM with a larger token limit size." However, when calling Ollama through the local command line or open-webui, this issue does not occur. Is there a need to adjust any parameters?
Beta Was this translation helpful? Give feedback.
All reactions