Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: rewrite image reasoning example without multimodal LLM #17148

Merged
merged 2 commits into from
Dec 5, 2024

Conversation

masci
Copy link
Member

@masci masci commented Dec 4, 2024

Description

Rewrite the "image reasoning" multimodal example leveraging the new ChatMessage and using the "plain" OpenAI class instead of the multimodal version.

Part of #15950

Before

openai_mm_llm = OpenAIMultiModal(model="gpt-4o", max_new_tokens=300)

image_documents = load_image_urls(image_urls)
msg = generate_openai_multi_modal_chat_message(
    prompt="Describe the images as an alternative text",
    role="user",
    image_documents=image_documents,
)

response = openai_mm_llm.chat(messages=[msg])

After

openai_llm = OpenAI(model="gpt-4o", max_new_tokens=300)

msg = ChatMessage(
    role=MessageRole.USER,
    blocks=[
        TextBlock(text="Describe the images as an alternative text"),
        ImageBlock(url=image_urls[0]),
        ImageBlock(url=image_urls[1]),
    ],
)
response = openai_llm.chat(messages=[msg])

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Dec 4, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@masci masci changed the title docs: rewrite image reasoning example without multimodalLLM docs: rewrite image reasoning example without multimodal LLM Dec 4, 2024
Copy link
Collaborator

@logan-markewich logan-markewich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! I wonder if we want to note in the example that so far only openai multi modal supports this syntax so far?

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 5, 2024
@masci
Copy link
Member Author

masci commented Dec 5, 2024

Lgtm! I wonder if we want to note in the example that so far only openai multi modal supports this syntax so far?

My only concern would be maintenance, i.e. remembering to update/remove the note as we roll out blocks usage to other providers...

@masci masci merged commit 3c666ea into main Dec 5, 2024
11 checks passed
@masci masci deleted the massi/image-reasoning branch December 5, 2024 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants