Skip to content

Related papers about Referring Image Segmentation (RIS)

Notifications You must be signed in to change notification settings

Huntersxsx/RIS-Learning-List

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

RIS-Learning-List

Introduction

This repository introduces Referring Image Segmentation task, and collects some related works.

Content

Definition

Referring Image Segmentation (RIS) is a challenging problem at the intersection of computer vision and natural language processing. Given an image and a natural language expression, the goal is to produce a segmentation mask in the image corresponding to the objects referred by the the natural language expression.

Datsets

  • RefCOCO: It contains 19,994 images with 142,210 referring expressions for 50,000 objects, which are collected from the MSCOCO via a two-player game. The dataset is split into 120,624 train, 10,834 validation, 5,657 test A, and 5,095 test B samples, respectively.
  • RefCOCO+: It contains 141,564 language expressions with 49,856 objects in 19,992 images. The datasetis split into train, validation, test A, and test B with 120,624, 10,758, 5,726, and 4,889 samples, respectively. Compared with RefCOCO dataset, some kinds of absolute-location words are excluded from the RefCOCO+ dataset.
  • G-Ref: It includes 104,560 referring expressions for 54,822 objects in 26,711 images.
  • Expressions in RefCOCO and RefCOCO+ are very succinct (containing 3.5 words on average). In contrast, expressionsin G-Ref are more complex (containing 8.4 words on average). Conversely, RefCOCO and RefCOCO+ tend to have more objects of the same category per image (3.9 on average) compared to G-Ref (1.6 on average).

Evaluation Metric

  • overall IoU: It is the total intersection area divided by the total union area, where both intersection area and union area are accumulated over all test samples (each test sample is an image and a referential expression).
  • mean IoU: It is the IoU between the prediction and ground truth averaged across all test samples.
  • Precision@X: It measures the percentage of test images with an IoU score higher than the threshold X ∈ {0.5, 0.6, 0.7, 0.8, 0.9}.

Related Works

Performance

Reference

MarkMoHR / Awesome-Referring-Image-Segmentation

Releases

No releases published

Packages

No packages published