Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Ollama Image API Support #11

Open
reyna-abhyankar opened this issue Dec 6, 2024 · 0 comments
Open

[Feat] Ollama Image API Support #11

reyna-abhyankar opened this issue Dec 6, 2024 · 0 comments

Comments

@reyna-abhyankar
Copy link
Contributor

We currently support the OpenAI Vision API, in which messages look like this:

messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],

However, ollama only supports local image paths or Base64 encoded images and seems to break unless queried like so:

messages=[
    {
      'role': 'user',
      'content': 'Whats in this image?',
      'images': [path],
    }
  ],

There's PR 5208 merged in ollama, which should resolve the issue of content being an array instead of a string. However, PR 6680 is currently open for LiteLLM to fix the exact unmarshalling error referenced in #10, so it could be the way they are querying ollama. If this gets merged, we might not need to do anything. Otherwise, we could implement basically the same fix on our end (i.e. flattening the content array, adding an images key, and potentially throwing a more explicit error for web images).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant