[FEATURE] Implement Image Loading Function for Image Search and CLIP Support #3152

mingshl · 2024-10-23T22:42:42Z

Is your feature request related to a problem?
To support CLIP model and image search, We need to implement a function in the Connector level that can load images from URLs or file path similar to using PIL (Python Imaging Library).

This function should support image search capabilities and be compatible with CLIP (Contrastive Language-Image Pre-training) for advanced image-text understanding

What solution would you like?
Similar to:

from PIL import Image
import requests

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

with the image loading, we can use the image as model input for clip model to execute prediction

inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image  # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1)  # we can take the softmax to get the label probabilities

Objectives:

Create a function that takes a URL as input and returns a PIL Image object.
Ensure the function can handle various image formats (JPEG, PNG, etc.).
Implement error handling for invalid URLs or unsupported image types.
Optimize the function for performance, considering potential high-volume usage in image search scenarios.
Ensure compatibility with CLIP for further processing and analysis.

Acceptance Criteria:

The function successfully loads images from valid URLs.
It properly handles errors for invalid URLs or unsupported image types.
The loaded images are compatible with our image search pipeline.
The function's output can be directly used with CLIP models.
Performance tests show the function can handle high-volume requests efficiently.
Code is well-documented and follows our coding standards.
Unit tests are implemented to cover various scenarios (successful loads, error cases, etc.).

Related issue
##3054

The text was updated successfully, but these errors were encountered:

mingshl · 2024-10-23T22:52:03Z

There is an implemented method in connector level toString() method which will convert list/map and other data type to String. This feature can call loadImage(). Please see this PR as reference #2871

brianf-aws · 2024-10-23T22:56:08Z

Hi, this looks interesting could I be assigned this please?

dhrubo-os · 2024-10-28T17:34:41Z

Just a heads up, we might need to talk with Security about this with the implementation plan.

brianf-aws · 2024-10-28T17:37:48Z

Just a heads up, we might need to talk with Security about this with the implementation plan.

Yeah I was talking to @ylwu-amzn who mentioned that its a security issue to have users download from an external site. We may need to have some sort of design review to see ways to defensively implement this feature.

brianf-aws · 2024-11-09T21:54:43Z

Created a ticket with Security to get their advice. Currently we talked to Flow Framework about this and they understood that something like downloading a url within ML-Commons is probably not to be approved to security.

dblock · 2024-11-11T17:11:41Z

[Catch All Triage - 1, 2, 3, 4]

brianf-aws · 2024-11-19T19:39:56Z

Hey everyone talked with security and they mentioned that this would not likely to pass, it would be better off that the client converts the image to base64 and that we provide validation. What we can do now is start these phases

(Accept base64 string validation) client sends base64 image string
(Accept url on safe endpoints) We would only consider this if we really need this functionality. Because security mention its possible to spoof a regex endpoint to point to a malicious url.

They also mentioned that in addition to a malicious script its possible they send over a big file over and stall ML-Commons from doing anything else.

mingshl added enhancement New feature or request untriaged labels Oct 23, 2024

mingshl assigned mingshl and unassigned mingshl Oct 23, 2024

mingshl assigned brianf-aws Oct 23, 2024

brianf-aws mentioned this issue Oct 23, 2024

Improve image search UX opensearch-project/dashboards-flow-framework#431

Open

dhrubo-os added this to ml-commons projects Oct 28, 2024

dhrubo-os moved this to Untriaged in ml-commons projects Oct 28, 2024

brianf-aws mentioned this issue Nov 5, 2024

[META] ML Inference Processor Enhancements III #3054

Open

5 tasks

mingshl moved this from Untriaged to In Progress in ml-commons projects Nov 5, 2024

brianf-aws mentioned this issue Nov 5, 2024

[RFC] Implement register custom sparse tokenizer from local files #3170

Open

dblock removed the untriaged label Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Implement Image Loading Function for Image Search and CLIP Support #3152

[FEATURE] Implement Image Loading Function for Image Search and CLIP Support #3152

mingshl commented Oct 23, 2024 •

edited

Loading

mingshl commented Oct 23, 2024 •

edited

Loading

brianf-aws commented Oct 23, 2024

dhrubo-os commented Oct 28, 2024

brianf-aws commented Oct 28, 2024

brianf-aws commented Nov 9, 2024

dblock commented Nov 11, 2024

brianf-aws commented Nov 19, 2024 •

edited

Loading

[FEATURE] Implement Image Loading Function for Image Search and CLIP Support #3152

[FEATURE] Implement Image Loading Function for Image Search and CLIP Support #3152

Comments

mingshl commented Oct 23, 2024 • edited Loading

mingshl commented Oct 23, 2024 • edited Loading

brianf-aws commented Oct 23, 2024

dhrubo-os commented Oct 28, 2024

brianf-aws commented Oct 28, 2024

brianf-aws commented Nov 9, 2024

dblock commented Nov 11, 2024

brianf-aws commented Nov 19, 2024 • edited Loading

mingshl commented Oct 23, 2024 •

edited

Loading

mingshl commented Oct 23, 2024 •

edited

Loading

brianf-aws commented Nov 19, 2024 •

edited

Loading