Inconsistent performance from Groq's new tool-calling functionality #19990
Replies: 4 comments
-
It seems you're facing two primary challenges with Groq's new tool-calling functionality: parsing errors and inaccuracies in tool-calling. Here are some targeted strategies to address these issues:
Further strategies include:
These suggestions aim to improve the accuracy and reliability of tool-calling with Groq models. If you encounter further issues or need more specific advice, please provide additional details about the challenges you're facing. Sources
|
Beta Was this translation helpful? Give feedback.
-
facing same issue |
Beta Was this translation helpful? Give feedback.
-
me too, sometimes function calling with llm.with_structured_output(...) works, sometimes not, very weird |
Beta Was this translation helpful? Give feedback.
-
I have the same issue. Sometimes I get a good response, sometimes the response if embedded in the message. For llama3-70b-8192 the tool use is okay but using llama3-8b-8192 shows the problem more. Good example: Bad example: Request: |
Beta Was this translation helpful? Give feedback.
-
Checked other resources
Commit to Help
Example Code
Description
First of all kudos to the langchain team for shipping integration with Groq's updated tool calling support on day one (https://console.groq.com/docs/tool-use).
My use case for fast inference is for tool-calling agents, like the email agent above. So I swapped out the LLM from GPT-4-turbo to groq to test the model's tool calling.
The results are pretty underwhelming -- with mixtral 8x7b, I got frequent parsing errors (usually because the LLM outputs texts like "sure here's your json object" before the actual json). Other times the agent outputs correct formats but fails to understand the instructions, or even lies about whether it used a tool. Overall, I just don't think the OSS models offered by Groq are very well tuned for openai-style function calling at the moment.
I tested in both one-turn and multi-turn settings. The one-turn accuracy is okay if the tool is simple like an extractor, but more complex (api-calling) tools and multi-turn conversations are really bad.
The speed is lightning fast like promised (end to end maybe 5x-10x faster than GPT4 turbo for me), but it's really a shame that this new functionality fails for most non-toy use cases. Would love some suggestions (prompting techniques, etc.) on how to improve tool-calling accuracy on the groq models.
System Info
langchain-groq==0.1.0
langchain==0.1.14
Beta Was this translation helpful? Give feedback.
All reactions