COCO sample previews show multiple sample masks #26

Mahi-Mai · 2019-10-16T15:49:58Z

I'm not sure what the deal is, but I set up my dataset as described in your guide. The json file generates fine, but when I generate several random previews, some of my samples have some technicolor nonsense going on.

By that I mean 11 masks (they happen to be consecutive masks in the index) are drawn/assigned over the one sample. I've checked the json file, and it looks like all these annotations have been assigned to the one sample for some reason. Each sample is named according to the suggested format, so I'm not sure what's up.

For example:

image filename: DSC_5409_1.jpg
annotation filename: DSC_5409_1_mask_1.png

image: MVI_0155_1107_140.jpg
annotation: MVI_0155_1107_140_mask_1.png

There's the same number of unique filenames under each directory, and I've stripped the extensions and "_mask_1" from the annotation filenames to ensure they match.

Anyone else have this problem?

===

To explain further, I have a source image, let's say MVI_0155_1107.jpg. This is sampled using a sliding window technique to produce x number of samples saved as MVI_0155_1107_#.jpg, with x going as high as 350 in some cases.

...ah, that's the problem. When it reaches images xx_3.jpg, and starts looking for annotations, it's going to pick up ANY annotations with 'xx_3' in the filename. So, if I have annotations with: xx_3, xx_30, xx_31, xx_32, xx_33, xx_34, xx_35, xx_36, xx_37, xx_38, and xx_39, that leaves me with 11 annotations assigned to one sample.

So it's a matter of improving how the script parses annotation filenames for comparison. I'll try something like this...

#  I'm going to index out what I need.  To do that...

words_to_strip = annotation_filename.split('_')[-2:]  # in our case we'll get ['mask', '1.png']

char_sum = 0
for word in words_to_strip:
    char_sum += len(word)

# We still need to add 2 to account for the underscore before and after 'mask'.

char_sum +=2

# Use the sum to index out the part of the filename we want to compare

annotation_filename_match = annotation_filename[:-char_sum]

Now we can use the image filename to match to the appropriate annotation regardless of naming scheme.

There are probably better ways to do this, so please feel free to suggest! I'm not closing this yet because I haven't tested my method. :)

The text was updated successfully, but these errors were encountered:

Mahi-Mai · 2019-10-18T16:03:18Z

So to implement my solution, I used a function for listing files in a directory, and I wrote a small function for stripping the annotation filename:

def strip_annotation_filename(file):
    # Capture filename without extension
    annot_filename = file.split('/')[-1][:-4]

    # Capture words to strip
    words_to_strip = annot_filename.split('_')[-2:]  # in our case we'll get ['mask', '1']

    # Get number of characters
    char_sum = 2  # We still need to add 2 to account for the underscore before and after 'mask'.
    for word in words_to_strip:
        char_sum += len(word)

    # Use the sum to index out the part of the filename we want to compare
    annotation_filename_match = annot_filename[:-char_sum]

    return annotation_filename_match

This results in my new base code looking like this:

INFO = {
    "description": "Corn cobs on conveyor belt.",
    "url": "https://github.com/waspinator/pycococreator",
    "version": "0.1.0",
    "year": 2019,
    "contributor": "Leanne Canessa",
    "date_created": datetime.datetime.utcnow().isoformat(' ')
}

LICENSES = [
    {
        "id": 1,
        "name": "PROPRIETARY",
        "url": "PROPRIETARY"
    }
]

CATEGORIES = [
    {
        'id': 1,
        'name': 'mask',
        'supercategory': 'vegetable',
    }
]

coco_output = {
    "info": INFO,
    "licenses": LICENSES,
    "categories": CATEGORIES,
    "images": [],
    "annotations": []
}

image_id = 1
segmentation_id = 1

image_files = list_files('...validation/cob_detector2019')
annotation_files = list_files('...validation/annotations')

# go through each image
for image_filename in image_files:
    image = Image.open(image_filename)
    image_info = pycococreatortools.create_image_info(image_id, os.path.basename(image_filename), image.size)

    coco_output["images"].append(image_info)

    # gather each associated annotation
    image_basename = image_filename.split('/')[-1][:-4]  # grab image filename without extension
    for annotation_filename in annotation_files:
        # grab annotation filename without category, ID descriptors, and extension
        annotation_basename = strip_annotation_filename(annotation_filename)
        
        # if they match, gather info
        if annotation_basename == image_basename:
            class_id = [x['id'] for x in CATEGORIES if x['name'] in annotation_filename][0]
            category_info = {'id': class_id, 'is_crowd': 'crowd' in image_filename}
            binary_mask = np.asarray(Image.open(annotation_filename)
                .convert('1')).astype(np.uint8)

            annotation_info = pycococreatortools.create_annotation_info(
                segmentation_id, image_id, category_info, binary_mask,
                image.size, tolerance=2)

            if annotation_info is not None:
                coco_output["annotations"].append(annotation_info)

            segmentation_id = segmentation_id + 1

    image_id = image_id + 1

with open('.../validation/instances_validation2019.json', 'w') as output_json_file:
    json.dump(coco_output, output_json_file)
    
print('Done!')

However, now I have a new problem that may be related. When I look at my resulting json files, I have fewer annotations than I do images, despite confirming that each image has a matching annotation file (see attachment).

@waspinator Do you think this is a naming/file matching thing again? Or is this an expected behavior I'm not quite getting?

Thanks!

anmspro · 2020-07-01T11:45:01Z

@Mahi-Mai, Hey, can you please take a look at this issue - #34
I need a little help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COCO sample previews show multiple sample masks #26

COCO sample previews show multiple sample masks #26

Mahi-Mai commented Oct 16, 2019 •

edited

Loading

Mahi-Mai commented Oct 18, 2019 •

edited

Loading

anmspro commented Jul 1, 2020

COCO sample previews show multiple sample masks #26

COCO sample previews show multiple sample masks #26

Comments

Mahi-Mai commented Oct 16, 2019 • edited Loading

Mahi-Mai commented Oct 18, 2019 • edited Loading

anmspro commented Jul 1, 2020

Mahi-Mai commented Oct 16, 2019 •

edited

Loading

Mahi-Mai commented Oct 18, 2019 •

edited

Loading