This repository introduces Referring Image Segmentation task, and collects some related works.
Referring Image Segmentation (RIS) is a challenging problem at the intersection of computer vision and natural language processing. Given an image and a natural language expression, the goal is to produce a segmentation mask in the image corresponding to the objects referred by the the natural language expression.
- RefCOCO: It contains 19,994 images with 142,210 referring expressions for 50,000 objects, which are collected from the MSCOCO via a two-player game. The dataset is split into 120,624 train, 10,834 validation, 5,657 test A, and 5,095 test B samples, respectively.
- RefCOCO+: It contains 141,564 language expressions with 49,856 objects in 19,992 images. The datasetis split into train, validation, test A, and test B with 120,624, 10,758, 5,726, and 4,889 samples, respectively. Compared with RefCOCO dataset, some kinds of absolute-location words are excluded from the RefCOCO+ dataset.
- G-Ref: It includes 104,560 referring expressions for 54,822 objects in 26,711 images.
- Expressions in RefCOCO and RefCOCO+ are very succinct (containing 3.5 words on average). In contrast, expressionsin G-Ref are more complex (containing 8.4 words on average). Conversely, RefCOCO and RefCOCO+ tend to have more objects of the same category per image (3.9 on average) compared to G-Ref (1.6 on average).
- overall IoU: It is the total intersection area divided by the total union area, where both intersection area and union area are accumulated over all test samples (each test sample is an image and a referential expression).
- mean IoU: It is the IoU between the prediction and ground truth averaged across all test samples.
- Precision@X: It measures the percentage of test images with an IoU score higher than the threshold X ∈ {0.5, 0.6, 0.7, 0.8, 0.9}.
- MagNet: Mask Grounding for Referring Image Segmentation. in Arxiv 2023.
- MRES: Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation. in Arxiv 2023. code
- Towards Generalizable Referring Image Segmentation via Target Prompt and Visual Coherence. in Arxiv 2023.
- BTMAE: Synchronizing Vision and Language: Bidirectional Token-Masking AutoEncoder for Referring Image Segmentation. in Arxiv 2023.
- MARIS: MARIS: Referring Image Segmentation via Mutual-Aware Attention Features. in Arxiv 2023.
- Omni-RES: Towards Omni-supervised Referring Expression Segmentation. in Arxiv 2023. code
- JMCELN: Referring Image Segmentation via Joint Mask Contextual Embedding Learning and Progressive Alignment Network. in EMNLP 2023. code
- TAS: Text Augmented Spatial-aware Zero-shot Referring Image Segmentation. in EMNLP 2023 Findings.
- CVMN: Unsupervised Domain Adaptation for Referring Semantic Segmentation. in ACM MM 2023. code
- CARIS: CARIS: Context-Aware Referring Image Segmentation. in ACM MM 2023. code
- Shatter and Gather: Shatter and Gather: Learning Referring Image Segmentation with Text Supervision. in ICCV 2023.
- Group-RES: Advancing Referring Expression Segmentation Beyond Single Image. in ICCV 2023. code
- ETRIS: Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation. in ICCV 2023. code
- TRIS: Referring Image Segmentation Using Text Supervision. in ICCV 2023. code
- RIS-DMMI: Beyond One-to-One: Rethinking the Referring Image Segmentation. in ICCV 2023. code
- BKINet: Bilateral Knowledge Interaction Network for Referring Image Segmentation. in TMM 2023. code
- SLViT: SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation. in IJCAI 2023. code
- WiCo: WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation. in IJCAI 2023.
- CM-MaskSD: CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation. in Arxiv 2023.
- CGFormer: Contrastive Grouping with Transformer for Referring Image Segmentation. in CVPR 2023. code
- Partial-RES: Learning to Segment Every Referring Object Point by Point. in CVPR 2023. code
- Zero-shot RIS: Zero-shot Referring Image Segmentation with Global-Local Context Features. in CVPR 2023. code
- MCRES: Meta Compositional Referring Expression Segmentation. in CVPR 2023.
- PolyFormer: PolyFormer: Referring Image Segmentation as Sequential Polygon Generation. in CVPR 2023. project
- GRES: Generalized Referring Expression Segmentation. in CVPR 2023. project
- SADLR: Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation. in AAAI 2023.
- PCAN: Position-Aware Contrastive Alignment for Referring Image Segmentation. in Arxiv 2022.
- CoupAlign: CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation. in NeurIPS 2022. code
- CRSCNet: Cross-Modal Recurrent Semantic Comprehension for Referring Image Segmentation. in TCSVT 2022.
- LGCT: Local-global coordination with transformers for referring image segmentation. in Neurocomputing 2022.
- RES®: A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation. in Arxiv 2022.
- VLT: VLT: Vision-Language Transformer and Query Generation for Referring Segmentation. in TPAMI 2022. code
- Learning From Box Annotations for Referring Image Segmentation. in TNNLS 2022. code
- Instance-Specific Feature Propagation for Referring Segmentation. in TMM 2022.
- SeqTR: SeqTR: A Simple Yet Universal Network for Visual Grounding. in ECCV 2022. code
- LAVT: LAVT: Language-Aware Vision Transformer for Referring Image Segmentation. in CVPR 2022. code
- CRIS: CRIS: CLIP-Driven Referring Image Segmentation. in CVPR 2022. code
- CRIS: CRIS: CLIP-Driven Referring Image Segmentation. in CVPR 2022. code
- ReSTR: ReSTR: Convolution-free Referring Image Segmentation Using Transformers. in CVPR 2022. project
- Bidirectional relationship inferring network for referring image localization and segmentation. in TNNLS 2021.
- RefTR: Referring Transformer: A One-step Approach to Multi-task Visual Grounding. in NeurIPS 2021.
- TV-Net: Two-stage Visual Cues Enhancement Network for Referring Image Segmentation. in ACM MM 2021. code
- VLT: Vision-Language Transformer and Query Generation for Referring Segmentation. in ICCV 2021. code
- MDETR: MDETR - Modulated Detection for End-to-End Multi-Modal Understanding. in ICCV 2021. code
- EFNet: Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation. in CVPR 2021. code
- BUSNet: Bottom-Up Shift and Reasoning for Referring Image Segmentation. in CVPR 2021. code
- LTS: Locate then Segment: A Strong Pipeline for Referring Image Segmentation. in CVPR 2021.
- CGAN: Cascade Grouped Attention Network for Referring Expression Segmentation. in ACM MM 2020.
- LSCM: Linguistic Structure Guided Context Modeling for Referring Image Segmentation. in ECCV 2020.
- CMPC-Refseg: Referring Image Segmentation via Cross-Modal Progressive Comprehension. in CVPR 2020. code
- BRINet: Bi-directional Relationship Inferring Network for Referring Image Segmentation. in CVPR 2020. code
- PhraseCut: PhraseCut: Language-based Image Segmentation in the Wild. in CVPR 2020. code
- MCN: Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation. in CVPR 2020. code
- Dual Convolutional LSTM Network for Referring Image Segmentation. in TMM 2020.
- lang2seg: Referring Expression Object Segmentation with Caption-Aware Consistency. in BMVC 2019. code
- STEP: See-Through-Text Grouping for Referring Image Segmentation. in ICCV 2019.
- CMSA-Net: Cross-Modal Self-Attention Network for Referring Image Segmentation. in CVPR 2019. code
- KWA: Key-Word-Aware Network for Referring Expression Image Segmentation. in ECCV 2018. code
- DMN: Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries. in ECCV 2018. code
- RRN: Referring Image Segmentation via Recurrent Refinement Networks. in CVPR 2018. code
- MAttNet: MAttNet: Modular Attention Network for Referring Expression Comprehension. in CVPR 2018. code
- RMI: Recurrent Multimodal Interaction for Referring Image Segmentation. in ICCV 2017. code
- LSTM-CNN: Segmentation from natural language expressions. in ECCV 2016. code