A few images have incomplete annotations. #21

ivan-ea · 2022-10-13T12:57:33Z

Hi, thank you for making such an interesting dataset publicly available!

If I'm not mistaken, I think there are a few images with incomplete annotations.
In other words, the json entry only contains the coordinates of the segmentation
for 1 or 2 cells, while the image shows clearly many more cells. Example in the
image below (id = 150535). Files affected in the table.

id	file_name	set	segmented cells
205798	A172_Phase_D7_1_01d20h00m_1.png	train	1
10517	BT474_Phase_B3_1_03d00h00m_3.png	train	1
150535	A172_Phase_A7_1_01d04h00m_3.png	train	1
1494964	SHSY5Y_Phase_D10_1_01d16h00m_4.png	train	1
718286	BV2_Phase_D4_1_00d12h00m_2.png	train	9
628256	BV2_Phase_C4_1_01d16h00m_3.png	train	4
1248961	SkBr3_Phase_H3_2_00d00h00m_1.png	val	1
976048	BV2_Phase_A4_2_00d00h00m_1.png	test	2
1007442	BV2_Phase_A4_2_02d04h00m_3.png	test	2

The list may not be complete, as the threshold I used was 10 segments per image,
but I'm confident some of the images with around 20+ segmented cells are all ok.

The number of images affected seems to be very small, so excluding these images
should solve any issue. Nevertheless, I think this information could be relevant
for people that are interested in using the dataset for their DL models.

StefanBaar · 2022-10-28T05:48:00Z

It sesms, that most images (I only checked SHSY5Y) have missing annotations:

For example:

SHSY5Y_Phase_D10_2_02d16h00m_2.tif: missing annotations are indicated with green ellipses.

Further, many annotations appear fragmented and inaccurate. I wonder, if that is due to the rasteration (vector -> pixel) during the annotation process?

Example:
Annotations 403, 257, 117 and 162 of SHSY5Y_Phase_D10_2_02d16h00m_2

Are the vector masks available somewhere?

RickardSjogren · 2022-11-01T15:14:09Z

Hi @ivan-ea,
Thank you for looking into LIVECell! It certainly looks like some images are missing annotations and have simply slipped through QA when they shouldn't. Excluding these images from training is probably the best option. It is also great for us to be aware of these images so that we can fix them in a potential follow-up release.

RickardSjogren · 2022-11-01T15:19:55Z

Hi @StefanBaar,
The missing cells that you are pointing to are described in the paper and are due to the design choice of not trying to segment cells in locations where cell boundaries are not clearly visible. We made this choice to limit the risk of introducing bias on how to split ambigous locations like the ones you marked. To help us make the call, we worked with an experienced cell biologist to minimize the risk of doing it incorrectly. It is not always possible to delineate single cells in these types of images and this choice is in line with other published datasets, such as EVICAN.

I believe that the broken visualizations are due to the software you use to make the masks. LIVECell is annotated using polygon annotations and stored in COCO-annotation format, so these fragmented masks are not coming from the raw annotations. When converting polygons to masks, the main challenge will be thin structures like the ones you show and your rendering looks like it needs some tweaking. Cell 117 seems to be missing a protrusion on the bottom left though.

StefanBaar · 2022-11-02T02:13:50Z

Hi @RickardSjogren,
Thank you very much for the detailed response and explanation.
In general, I agree with your first paragraph. However, in regards to the second paragraph

I believe that the broken visualizations are due to the software you use to make the masks

I apologize, but I don't think this is correct. It appears, that in your dataset, the annotations are stored as RLE and not as polygons. This means each annotation is stored as pixel mask and not as a list of coordinates. The renders, I have produced above, are as true to the data (provided in the dataset) as possible.

This is how I did it in python:

import numpy as np
from tqdm import tqdm
import torchvision.datasets as dset

an_path = "LIVECell_single_cells/shsy5y/train.json"

coco    = dset.CocoDetection(root    = im_path,
                                                  annFile = an_path)
### get list of image ids
im_ids  = list(sorted(coco.coco.imgs.keys()))
### choose image with id=100
imid0   = im_ids[100]

### load the annotation data for each annotation
annos   = np.asarray(coco.coco.loadAnns(coco.coco.getAnnIds(imid0)))

### only load fractured annotations 
fracids = []
for i in tqdm(range(len(annos))):
    ### converts RLE (1D) to binary mask data (2D)
    ma  = coco.coco.annToMask(annos[i])
    if len(np.unique(ma)) > 2: 
        fracids.append(i)

when looking at the data of the first annotation ( annos[0] ), we get the following output:

{'segmentation': [[696.92,
   0.5,
   697.48,
   3.89,
   697.48,
   10.68,
   696.35,
   19.16,
   694.09,
   22.55,
   694.09,
   25.94,
   698.61,
   33.29,
   703.14,
   41.77,
   704.0,
   40.08,
   704.0,
   0.0]],
 'area': 274.53349999999045,
 'iscrowd': 0,
 'image_id': 1457250,
 'bbox': [694.09, 0.0, 9.909999999999968, 41.77],
 'category_id': 1,
 'id': 1457251}

which looks like RLE and not polygon. I also confirmed the content of the raw json file, in which I could not find any polygon data.

Am I doing something wrong here? If the polygon data is contained somewhere within the json file and if you have some time to spare, could you please elaborate on how to retrieve the polygon data?

Could you provide the polygon data?

Thank you very much for your time.

audreyeternal · 2022-12-09T09:46:59Z

@StefanBaar The annotation you provided I think is a polygon. you can refer to COCO data format explain. When iscrowd=0, the annos are in the polygon format. Also seems decimals don't appear in RLE format.

StefanBaar · 2022-12-15T05:30:55Z

ok, I got it.
The problem is not the format (RLE or polygon) but what the coco internal function coco.coco.annToMask does with the polygon data. This can be seen in the following image, where on the left, are the RLE and polygon coordinates provided by the coco dataset and on the right is the pixel mask, which is produced by coco.coco.annToMask using the polygon data.

In the case of this dataset, coco.coco.annToMask is not an optimal solution to produce pixel masks because the polygon points are often not sufficiantly spaced.

I am curious ... what is the intended method to convert the polygon data into pixel masks?

Further, I don't really understand why one would use polygons (basically a set of straight lines) to annotate images of round objects.
I think for cell images, it would be better to produce pixel annotations (which is usually faster and more precise).

RickardSjogren · 2022-12-15T07:45:47Z

Thanks for looking into this @StefanBaar . It sure seems as annToMask is not optimal for cells with protrusions like the SH-SY5Y in LIVECell. We observed that those cells are particularly difficult to segment and a better decoding method may be a partial solution.

For the models trained in the paper we used Detectron2, which has its own parsers for COCO-datasets. Under the hood, they us pycocotools.mask.decode to convert the encoding to numpy-masks, which is the same as annToMask.

Regarding the choice of polygons. This is the standard way of annotating instances in most fields. There are certainly some drawbacks depending on the vertex-density you use and so on. Even though pixelmasks are more precise, they are far more time consuming to annotate. This is something we have experimented quite a bit with and polygons provide in most cases sufficient precision while being much more budget-friendly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few images have incomplete annotations. #21

A few images have incomplete annotations. #21

ivan-ea commented Oct 13, 2022

StefanBaar commented Oct 28, 2022

RickardSjogren commented Nov 1, 2022

RickardSjogren commented Nov 1, 2022

StefanBaar commented Nov 2, 2022 •

edited

Loading

audreyeternal commented Dec 9, 2022 •

edited

Loading

StefanBaar commented Dec 15, 2022 •

edited

Loading

RickardSjogren commented Dec 15, 2022

A few images have incomplete annotations. #21

A few images have incomplete annotations. #21

Comments

ivan-ea commented Oct 13, 2022

StefanBaar commented Oct 28, 2022

RickardSjogren commented Nov 1, 2022

RickardSjogren commented Nov 1, 2022

StefanBaar commented Nov 2, 2022 • edited Loading

audreyeternal commented Dec 9, 2022 • edited Loading

StefanBaar commented Dec 15, 2022 • edited Loading

RickardSjogren commented Dec 15, 2022

StefanBaar commented Nov 2, 2022 •

edited

Loading

audreyeternal commented Dec 9, 2022 •

edited

Loading

StefanBaar commented Dec 15, 2022 •

edited

Loading