-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Florence2SAM2 #24
Conversation
7cc624c
to
d07105c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving in @dillonalaird 's name
|
||
|
||
class BaseTool: | ||
pass | ||
|
||
|
||
DType = TypeVar("DType", bound=np.generic) | ||
|
||
VideoNumpy = Annotated[npt.NDArray[DType], Literal["N", "N", "N", 3]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillonalaird I think this value is already defined inside the file vision_agent_tools/types.py
We should probably merge together the shared_types.py
file that you are moving out of the tools
folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged them. I moved it to this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I thought I deleted the other file but I did not. I will delete the file, but I migrated all the imports to use this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed the changes
self.image_predictor.set_image(np.array(image, dtype=np.uint8)) | ||
annotation_id = 0 | ||
for prompt in prompts: | ||
with torch.autocast(device_type="cuda", dtype=torch.float16): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillonalaird I think this would be better to define a self.device
value that stores the device type instead of hard coding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! done
image, PromptTask.CAPTION_TO_PHRASE_GROUNDING, prompt | ||
)[PromptTask.CAPTION_TO_PHRASE_GROUNDING]["bboxes"] | ||
if return_mask: | ||
with torch.autocast(device_type="cuda", dtype=torch.bfloat16): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
objs = self.get_bbox_and_mask( | ||
Image.fromarray(video[0]).convert("RGB"), prompts, return_mask=False | ||
) | ||
with torch.autocast(device_type="cuda", dtype=torch.bfloat16): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, change to self.device
@validate_call(config={"arbitrary_types_allowed": True}) | ||
@torch.inference_mode() | ||
def __call__( | ||
self, media: Image.Image | VideoNumpy, prompts: list[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillonalaird @camiloaz should we handle the input value as either image or video as separate values as we do here? I think we should be consistent and change one of these two to have the same format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, agree. will do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tests/tools/test_florence2_sam2.py
Outdated
|
||
def test_successful_florence2_sam2_image(): | ||
""" | ||
This test verifies that CLIPMediaSim returns a valid iresponse when passed a target_text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should change CLIPMediaSim
to Florence2SAM2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. thanks.
tests/tools/test_florence2_sam2.py
Outdated
|
||
def test_successful_florence2_sam2_video(): | ||
""" | ||
This test verifies that CLIPMediaSim returns a valid iresponse when passed a target_text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, please change CLIPMediaSim
tests/tools/test_florence2_sam2.py
Outdated
|
||
def test_florence2_sam2_invalid_media(): | ||
""" | ||
This test verifies that CLIPMediaSim raises a ValueError if the media is not a valid type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Note: had to name the file something other than
sam2.py
or else I can naming conflicts