-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Pydantic error breaks MultimodalConversableAgent #383
Comments
I followed your steps in a fresh virtual environment using the following commands: python3.10 -m venv .venv
source .venv/bin/activate
pip install "ag2[lmm]==0.6.1" After setting up the environment, I ran your code snippet, but I was unable to reproduce the error. Everything executed without any issues on my end. To help us identify the issue, could you share the output of the following command: pip list This will give us a clear view of the versions of pydantic and other dependencies installed, which might be contributing to the behavior you're observing. |
Somehow I can't reproduce it from ag2==0.6.1 but able to reproduce it from ag2==0.7.0 from autogen import UserProxyAgent, config_list_from_json
from autogen.agentchat.contrib.multimodal_conversable_agent import MultimodalConversableAgent
from autogen.agentchat.contrib.multimodal_conversable_agent import MultimodalConversableAgent
config_list_4v = config_list_from_json(
"../OAI_CONFIG_LIST",
filter_dict={
"model": ["gpt-4o"],
},
)
image2table_convertor = MultimodalConversableAgent(
name="image2table_convertor",
system_message="""
You are an image to table convertor. You will receive an image of tables. The original table could be in csv, pdf or other format.
You need to the following step in sequence,
1. extract the table content and structure.
2. Make sure the structure is complete.
3. Correct typos in the text fields.
4. In the end, output the table in Markdown.
""",
llm_config={"config_list": config_list_4v, "max_tokens": 300},
human_input_mode="NEVER",
max_consecutive_auto_reply=1,
)
user_proxy = UserProxyAgent(
name="User_proxy",
system_message="A human admin.",
human_input_mode="NEVER", # Try between ALWAYS or NEVER
code_execution_config=False
)
# Ask the question with an image
chat_result = user_proxy.initiate_chat(
image2table_convertor,
message="""Please extract table from the following image and convert it to Markdown.
<img /workspaces/ag2/notebook/agentchat_pdf_rag/parsed_pdf_info/table-52-17.jpg>.""",
) |
Hi @AgentGenie , the issue has been resolved in PR. |
Thanks @rjambrecic |
Describe the bug
I got an pydantic error for 0.6.1 but not for 0.6.0 which breaks MultimodalConversableAgent. __
Steps to reproduce
Model Used
No response
Expected Behavior
No response
Screenshots and logs
Additional Information
No response
The text was updated successfully, but these errors were encountered: