Implemented an end-to-end transformer based model (DETR) for the task of Panoptic segmentation.
Given Sequence:
Current output of the model:
TODO:
- Extend this model for instance matching over sequences of images, i.e. Video Panoptic segmentation, perhaps using something from ViP DeepLab?