RIS-Learning-List

Introduction

This repository introduces Referring Image Segmentation task, and collects some related works.

Content

Definition
Dataset
Evaluation Metric
Related Works
Performance
Reference

Definition

Referring Image Segmentation (RIS) is a challenging problem at the intersection of computer vision and natural language processing. Given an image and a natural language expression, the goal is to produce a segmentation mask in the image corresponding to the objects referred by the the natural language expression.

Datsets

RefCOCO: It contains 19,994 images with 142,210 referring expressions for 50,000 objects, which are collected from the MSCOCO via a two-player game. The dataset is split into 120,624 train, 10,834 validation, 5,657 test A, and 5,095 test B samples, respectively.
RefCOCO+: It contains 141,564 language expressions with 49,856 objects in 19,992 images. The datasetis split into train, validation, test A, and test B with 120,624, 10,758, 5,726, and 4,889 samples, respectively. Compared with RefCOCO dataset, some kinds of absolute-location words are excluded from the RefCOCO+ dataset.
G-Ref: It includes 104,560 referring expressions for 54,822 objects in 26,711 images.
Expressions in RefCOCO and RefCOCO+ are very succinct (containing 3.5 words on average). In contrast, expressionsin G-Ref are more complex (containing 8.4 words on average). Conversely, RefCOCO and RefCOCO+ tend to have more objects of the same category per image (3.9 on average) compared to G-Ref (1.6 on average).

Evaluation Metric

overall IoU: It is the total intersection area divided by the total union area, where both intersection area and union area are accumulated over all test samples (each test sample is an image and a referential expression).
mean IoU: It is the IoU between the prediction and ground truth averaged across all test samples.
Precision@X: It measures the percentage of test images with an IoU score higher than the threshold X ∈ {0.5, 0.6, 0.7, 0.8, 0.9}.

Related Works

MagNet: Mask Grounding for Referring Image Segmentation. in Arxiv 2023.
MRES: Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation. in Arxiv 2023. code
Towards Generalizable Referring Image Segmentation via Target Prompt and Visual Coherence. in Arxiv 2023.
BTMAE: Synchronizing Vision and Language: Bidirectional Token-Masking AutoEncoder for Referring Image Segmentation. in Arxiv 2023.
MARIS: MARIS: Referring Image Segmentation via Mutual-Aware Attention Features. in Arxiv 2023.
Omni-RES: Towards Omni-supervised Referring Expression Segmentation. in Arxiv 2023. code
JMCELN: Referring Image Segmentation via Joint Mask Contextual Embedding Learning and Progressive Alignment Network. in EMNLP 2023. code
TAS: Text Augmented Spatial-aware Zero-shot Referring Image Segmentation. in EMNLP 2023 Findings.
CVMN: Unsupervised Domain Adaptation for Referring Semantic Segmentation. in ACM MM 2023. code
CARIS: CARIS: Context-Aware Referring Image Segmentation. in ACM MM 2023. code
Shatter and Gather: Shatter and Gather: Learning Referring Image Segmentation with Text Supervision. in ICCV 2023.
Group-RES: Advancing Referring Expression Segmentation Beyond Single Image. in ICCV 2023. code
ETRIS: Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation. in ICCV 2023. code
TRIS: Referring Image Segmentation Using Text Supervision. in ICCV 2023. code
RIS-DMMI: Beyond One-to-One: Rethinking the Referring Image Segmentation. in ICCV 2023. code
BKINet: Bilateral Knowledge Interaction Network for Referring Image Segmentation. in TMM 2023. code
SLViT: SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation. in IJCAI 2023. code
WiCo: WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation. in IJCAI 2023.
CM-MaskSD: CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation. in Arxiv 2023.
CGFormer: Contrastive Grouping with Transformer for Referring Image Segmentation. in CVPR 2023. code
Partial-RES: Learning to Segment Every Referring Object Point by Point. in CVPR 2023. code
Zero-shot RIS: Zero-shot Referring Image Segmentation with Global-Local Context Features. in CVPR 2023. code
MCRES: Meta Compositional Referring Expression Segmentation. in CVPR 2023.
PolyFormer: PolyFormer: Referring Image Segmentation as Sequential Polygon Generation. in CVPR 2023. project
GRES: Generalized Referring Expression Segmentation. in CVPR 2023. project
SADLR: Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation. in AAAI 2023.
PCAN: Position-Aware Contrastive Alignment for Referring Image Segmentation. in Arxiv 2022.
CoupAlign: CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation. in NeurIPS 2022. code
CRSCNet: Cross-Modal Recurrent Semantic Comprehension for Referring Image Segmentation. in TCSVT 2022.
LGCT: Local-global coordination with transformers for referring image segmentation. in Neurocomputing 2022.
RES&REG: A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation. in Arxiv 2022.
VLT: VLT: Vision-Language Transformer and Query Generation for Referring Segmentation. in TPAMI 2022. code
Learning From Box Annotations for Referring Image Segmentation. in TNNLS 2022. code
Instance-Specific Feature Propagation for Referring Segmentation. in TMM 2022.
SeqTR: SeqTR: A Simple Yet Universal Network for Visual Grounding. in ECCV 2022. code
LAVT: LAVT: Language-Aware Vision Transformer for Referring Image Segmentation. in CVPR 2022. code
CRIS: CRIS: CLIP-Driven Referring Image Segmentation. in CVPR 2022. code
CRIS: CRIS: CLIP-Driven Referring Image Segmentation. in CVPR 2022. code
ReSTR: ReSTR: Convolution-free Referring Image Segmentation Using Transformers. in CVPR 2022. project
Bidirectional relationship inferring network for referring image localization and segmentation. in TNNLS 2021.
RefTR: Referring Transformer: A One-step Approach to Multi-task Visual Grounding. in NeurIPS 2021.
TV-Net: Two-stage Visual Cues Enhancement Network for Referring Image Segmentation. in ACM MM 2021. code
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation. in ICCV 2021. code
MDETR: MDETR - Modulated Detection for End-to-End Multi-Modal Understanding. in ICCV 2021. code
EFNet: Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation. in CVPR 2021. code
BUSNet: Bottom-Up Shift and Reasoning for Referring Image Segmentation. in CVPR 2021. code
LTS: Locate then Segment: A Strong Pipeline for Referring Image Segmentation. in CVPR 2021.
CGAN: Cascade Grouped Attention Network for Referring Expression Segmentation. in ACM MM 2020.
LSCM: Linguistic Structure Guided Context Modeling for Referring Image Segmentation. in ECCV 2020.
CMPC-Refseg: Referring Image Segmentation via Cross-Modal Progressive Comprehension. in CVPR 2020. code
BRINet: Bi-directional Relationship Inferring Network for Referring Image Segmentation. in CVPR 2020. code
PhraseCut: PhraseCut: Language-based Image Segmentation in the Wild. in CVPR 2020. code
MCN: Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation. in CVPR 2020. code
Dual Convolutional LSTM Network for Referring Image Segmentation. in TMM 2020.
lang2seg: Referring Expression Object Segmentation with Caption-Aware Consistency. in BMVC 2019. code
STEP: See-Through-Text Grouping for Referring Image Segmentation. in ICCV 2019.
CMSA-Net: Cross-Modal Self-Attention Network for Referring Image Segmentation. in CVPR 2019. code
KWA: Key-Word-Aware Network for Referring Expression Image Segmentation. in ECCV 2018. code
DMN: Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries. in ECCV 2018. code
RRN: Referring Image Segmentation via Recurrent Refinement Networks. in CVPR 2018. code
MAttNet: MAttNet: Modular Attention Network for Referring Expression Comprehension. in CVPR 2018. code
RMI: Recurrent Multimodal Interaction for Referring Image Segmentation. in ICCV 2017. code
LSTM-CNN: Segmentation from natural language expressions. in ECCV 2016. code

Performance

Reference

MarkMoHR / Awesome-Referring-Image-Segmentation

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
img		img
BibTex.md		BibTex.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RIS-Learning-List

Introduction

Content

Definition

Datsets

Evaluation Metric

Related Works

Performance

Reference

About

Releases

Packages

Huntersxsx/RIS-Learning-List

Folders and files

Latest commit

History

Repository files navigation

RIS-Learning-List

Introduction

Content

Definition

Datsets

Evaluation Metric

Related Works

Performance

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages