Overview Video
Introducing MyVLM
The Vision Language Models
-We apply MyVLM to various VLM architectures for personalizing captioning, visual question-answering, and referring expression comprehension.
+The Vision Language Models
+We apply MyVLM to various VLM architectures for personalized captioning, visual question-answering, and referring expression comprehension.
Step 3: Communicating the Concept
Results
- For each row, we show sample images of the target concept to the left -
- Images to the right represent the input images passed to MyVLM -
- Personalized responses generated by MyVLM can be seen by hovering each image +
- The remaining images represent the input images passed to MyVLM +
- Personalized responses generated by MyVLM can be seen by hovering over each image
- S* represents our concept's name
Personalized Captioning
-Hover over the images to see the personalized captions!
+Hover over the images to see the personalized captions!
Personalized Captioning
Personalized Visual Question-Answering
+Personalized Visual Question-Answering
Personalized Visual Question-Answering
Personalized Referring Expression Comprehension
-Hover over the images to see the personalized captions!
+Hover over the images to see the personalized captions!
Personalized Referring Expression Comprehension
Acknowledgements
+Acknowledgements
This research was performed while Yuval Alaluf was at Snap.
We would like to thank Assaf Ben-Kish, Or Patashnik, Moran Yanuka, Morris Alper, Yonatan Biton, and Yuwei Fang for their fruitful discussions and valuable input which helped improve this work.
-BibTeX
+BibTeX
@article{}