Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Florence2SAM2 #24

Merged
merged 14 commits into from
Aug 13, 2024
Merged

Add Florence2SAM2 #24

merged 14 commits into from
Aug 13, 2024

Conversation

dillonalaird
Copy link
Member

Note: had to name the file something other than sam2.py or else I can naming conflicts

@dillonalaird dillonalaird changed the title Feat/add sam2 Add SAM2 Aug 3, 2024
@camiloaz camiloaz changed the title Add SAM2 Add Florence2SAM2 Aug 13, 2024
camiloaz
camiloaz previously approved these changes Aug 13, 2024
Copy link
Member

@camiloaz camiloaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving in @dillonalaird 's name



class BaseTool:
pass


DType = TypeVar("DType", bound=np.generic)

VideoNumpy = Annotated[npt.NDArray[DType], Literal["N", "N", "N", 3]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillonalaird I think this value is already defined inside the file vision_agent_tools/types.py We should probably merge together the shared_types.py file that you are moving out of the tools folder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged them. I moved it to this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I thought I deleted the other file but I did not. I will delete the file, but I migrated all the imports to use this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed the changes

self.image_predictor.set_image(np.array(image, dtype=np.uint8))
annotation_id = 0
for prompt in prompts:
with torch.autocast(device_type="cuda", dtype=torch.float16):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillonalaird I think this would be better to define a self.device value that stores the device type instead of hard coding it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! done

image, PromptTask.CAPTION_TO_PHRASE_GROUNDING, prompt
)[PromptTask.CAPTION_TO_PHRASE_GROUNDING]["bboxes"]
if return_mask:
with torch.autocast(device_type="cuda", dtype=torch.bfloat16):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

objs = self.get_bbox_and_mask(
Image.fromarray(video[0]).convert("RGB"), prompts, return_mask=False
)
with torch.autocast(device_type="cuda", dtype=torch.bfloat16):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, change to self.device

@validate_call(config={"arbitrary_types_allowed": True})
@torch.inference_mode()
def __call__(
self, media: Image.Image | VideoNumpy, prompts: list[str]
Copy link
Member

@CamiloInx CamiloInx Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillonalaird @camiloaz should we handle the input value as either image or video as separate values as we do here? I think we should be consistent and change one of these two to have the same format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, agree. will do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


def test_successful_florence2_sam2_image():
"""
This test verifies that CLIPMediaSim returns a valid iresponse when passed a target_text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should change CLIPMediaSim to Florence2SAM2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. thanks.


def test_successful_florence2_sam2_video():
"""
This test verifies that CLIPMediaSim returns a valid iresponse when passed a target_text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, please change CLIPMediaSim


def test_florence2_sam2_invalid_media():
"""
This test verifies that CLIPMediaSim raises a ValueError if the media is not a valid type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Member

@CamiloInx CamiloInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@camiloaz camiloaz merged commit b8aab59 into main Aug 13, 2024
1 check passed
@camiloaz camiloaz deleted the feat/add-sam2 branch August 13, 2024 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants