Annotation fixes and enhancement #4

bpadovese · 2022-04-28T19:49:59Z

There is an issue when importing the annotation and creating the segments.

The annotations themselves are imported correctly however the created segments are not. There are to issues here.

The first annotation groups are concatenated into a single segment. So if i have 3 annotations, i expect to create 3 segments. Instead, the 3 annotations are grouped into one segment with length starting from the start time of the first annotation and end time of the second annotation. This could be because all 3 annotations are close to each other (but not overlapping).
Segments created by the annotation are padded with one second before the start and one second after the end of the annotation.

See images below for the issues:

We could consider adding the segments separate from the annotation.

bpadovese · 2022-05-02T13:21:30Z

I would like to expand on the first point, after using the app more, it seems very inconsistent when annotations are grouped into segments. I could not identify a pattern. Sometimes it creates 1 segment for each annotation, sometime it doesnt.

fsfrazao · 2022-05-02T13:38:38Z

@bpadovese, is this problem happening only when importing annotations to create the segments?

bpadovese · 2022-05-02T13:40:19Z

Yes. only when importing annotations to create the segments.

yue-su · 2022-05-03T19:58:07Z

The algorithm for importing annotations is designed like this:

Neighboring annotations are aggregated into a group and mapped to a segment(to be created). Considering the processing time, each segment is no more than 60 seconds.
There will be situations where a group has only one annotation because its neighbors are all far away from it (they cannot be grouped as the segment would be too long to process).
A 1-second padding is added at the beginning and end of each segment, just so that the boundaries of annotation and the boundary of a segment do not overlap when the annotations are displayed in the spectrogram.

@fsfrazao @bpadovese The algorithm can be adjusted, but you guys need to let me know what exactly is the final result you expect.

fsfrazao · 2022-05-03T20:19:06Z

I think a better way would be to separate the import "annotations action" from the creation of segments (I believe this is what Bruno was suggesting above)

The workflow could be something like this:

Import annotations. At this point, no segments are created.
Create segments. This could be done through any of the available routes (manual creation, auto-generate or import segments).
With both annotations and segments available, a batch can be created. In the batch creation step, there could be an option for the model developer to choose if he wants to create a blank batch or match the selected segments to existing annotations, in which case an algorithm very similar to what you described could be used.

In this case, when importing annotations the annotator field would be set to the developer who is importing them.
If necessary to keep things simple, in step 3 the developer could be limited only to matching their own annotations (i.e.: those that have their username in the "annotator" field.

Does that make sense?
Do you think it would require a lot of change?

fsfrazao · 2022-05-03T20:39:14Z

I guess the main point is that the model developer would like more control over what the segments look like.
The "import segments" option provides that: the model developer can create segments however they like. The only missing part is a way to link these segments to annotations.

I think what I suggested above would provide a flexible option, but other ways to achieve the same result could be fine too.

yue-su · 2022-05-03T21:49:32Z

I guess the main point is that the model developer would like more control over what the segments look like. The "import segments" option provides that: the model developer can create segments however they like. The only missing part is a way to link these segments to annotations.

I think what I suggested above would provide a flexible option, but other ways to achieve the same result could be fine too.

@fsfrazao I hope you can explain a little more, I do not quite understand the role of the segment.

Why the model developer needs to control the creation of segments in the import process. It seems to me that segments just provide an intermediate means between files to annotations, which is just to help ketos processing.

Does it matter to the annotator how segments are generated? What is the difference for the annotator whether each annotation is exclusively mapped to a single segment or several annotations mapped to a shared segment? or any other combinations? Wouldn't the latter be more effective?

The current algorithm is very effective in automatically generating the necessary segments, what is the difference compared to the segments created manually by the model developer? Maybe the latter can customize the length of segments, but does this have any particular purpose?

fsfrazao · 2022-05-04T14:52:26Z

Well, the segment is also a way for the model developer to control what they want to expose to the annotator in terms of temporal context.

In a validation scenario where the developer wants the annotator to verify existing annotations (generated by a model or another anotator), the model developer might ask for different tasks from the annotators, such as:

From all the detections produced by a machine learning model, which ones were false positives?
Did the model miss any kw call in this entire recording?
Can you verify periods without any kw calls immediately before and after each model-generated annotation?

All of these can be use the same imported annotations, but the second and third tasks would be impossible if the segment does not extend beyond the annotation. This is the kind of thing Bruno is trying to do. His detector-generated outputs are always 3 seconds, , but he might want to show 1 minute long segments to the annotators so they can not only confirm if the model detection was correct but also if it missed something in the vicinity of each detection.

yue-su · 2022-05-04T15:42:03Z

@fsfrazao
Thanks for the info! This is a really useful context that could help with MAPIL's design as well.

Right now, It is relatively easy to provide some controls for generating segments in the import annotation interface, such as the length of the segment, whether to map multiple annotations to a single segment or to generate a single segment per annotation. There will be some edge cases, e.g. some annotations may cross the boundary of a segment, so it is also possible to control whether there is some overlap between segments. How these segments are generated depends on how the user wants to use the feature eventually. It is possible to implement such control on HALLO with the current model.

However, To implement the scenario you described earlier, for example importing annotations and segments separately and then associating them, would require major changes to the database and backend. In the current database design, annotation, batch, and segment are in a one-to-many relationship, so each annotation must exist in a batch and must be associated with a segment. the scenario you mentioned requires a many-to-many association in the database, and separate tables to track the association between segment, batch, and annotation. This design would increase the overall complexity of data processing and the user interface would need to be adjusted accordingly. I feel this can be developed as a requirement for MAPIL, or just redevelop HALLO for the 2.0 version.

bpadovese · 2022-05-04T17:48:04Z

I think what you suggested in the first paragraph is just fine for now. So putting it into bullet points, would it be reasonably easy to change the algorithm to this?:

Neighboring annotations are aggregated into a group and mapped to a segment(to be created). Each segment would have a length defined by the user.
In situations where a group only has one annotation, the length of this segment will be the same as the above.
No padding is added if possible.
In edge cases where an annotation falls between two segments we can just extend a little bit the segment windows to include the annotation.

However, I think for MAIPL, what you described in the second paragraph would be ideal. For the user, it wouldnt make a difference, the functionality would be the same. But I believe it would add flexibility for any future feature we would like to add to MAIPL if we treat annotations/batches/segments independently and associated them in a may-to-many relation.

How does this sound?

yue-su · 2022-05-05T16:50:02Z

@bpadovese

Yes, I can adjust the algorithm to achieve this requirement. I don't know if I understand this accurately:

For example, if the user specifies that each segment is 60 seconds long, then the file will be segmented into 60s long segments and annotations will be mapped to the corresponding segments.

If the end time of an annotation exceeds the end time of a segment, the end time of the segment needs to be extended accordingly.

No padding means the last segment can be ended at the end of the file and does not need to be extended to the 60s length, right?

bpadovese · 2022-05-12T17:25:40Z

Yes, that is what i mean. Does this sound good to you as well? @fsfrazao

fsfrazao · 2022-05-13T18:15:48Z

That sounds good to me too.

Just to confirm that I understand it correctly, if there's an annotation table listing only one 3s annotation for one file that is 15min long and the user specifies 60s segments, will the batch contain 15 long segments with a duration of 60s each, and one of them will have the annotation?

yue-su · 2022-05-18T00:38:41Z

That sounds good to me too.

Just to confirm that I understand it correctly, if there's an annotation table listing only one 3s annotation for one file that is 15min long and the user specifies 60s segments, will the batch contain 15 long segments with a duration of 60s each, and one of them will have the annotation?

Yes, this is the updated algorithm.

bpadovese · 2022-06-22T17:01:08Z

Opening this issue again. The algorithm and interface are think are much better and easier to understand now. I have just one request. Currently the algorithm is working more or less in the following way:

Check which files contain annotations
Create segments of length X (defined by user) for each of these files. So for instance, if I set length to 60 s and the file duration is 600 s, then it would create 10 segments of length 60 s.

I think this is good as is if the model developer want the annotator to check for additional false positives in the file. However, would it be possible to add an additional option (a checkbox) that would delete segments that did not contain any annotations?

Therefore the end result would be a batch where each segment of length X contained an annotation. The algorithm would essentially be the same with one extra step:

Check which files contain annotations
Create segments of length X (defined by user) for each of these files. So for instance, if I set length to 60 s and the file duration is 600 s, then it would create 10 segments of length 60 s.

IF checkbox is checked

Delete segments that do not contain an annotation

the use case of this would be similar to mine, where I just want to validate my models detection but I don't care about False positives.

Does this make sense to you guys?

fsfrazao · 2022-06-22T17:33:13Z

That makes sense, Bruno.
A "Remove segments without annotations" checkbox would be a good idea.

yue-su · 2022-06-23T03:25:28Z

yup, it's possible. Just need to filter the segments and delete those ones that don't have annotations on them after the importation.

bpadovese added bug Something isn't working enhancement New feature or request labels Apr 28, 2022

yue-su self-assigned this May 2, 2022

yue-su linked a pull request May 6, 2022 that will close this issue

2 remove offset field from annotation tables #7

Merged

yue-su removed a link to a pull request May 6, 2022

2 remove offset field from annotation tables #7

Merged

yue-su linked a pull request May 13, 2022 that will close this issue

4 annotation fixes and enhancement #8

Merged

fsfrazao closed this as completed in #8 May 20, 2022

bpadovese reopened this Jun 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotation fixes and enhancement #4

Annotation fixes and enhancement #4

bpadovese commented Apr 28, 2022

bpadovese commented May 2, 2022

fsfrazao commented May 2, 2022

bpadovese commented May 2, 2022

yue-su commented May 3, 2022

fsfrazao commented May 3, 2022 •

edited

Loading

fsfrazao commented May 3, 2022

yue-su commented May 3, 2022

fsfrazao commented May 4, 2022

yue-su commented May 4, 2022

bpadovese commented May 4, 2022

yue-su commented May 5, 2022

bpadovese commented May 12, 2022

fsfrazao commented May 13, 2022

yue-su commented May 18, 2022 •

edited

Loading

bpadovese commented Jun 22, 2022

fsfrazao commented Jun 22, 2022

yue-su commented Jun 23, 2022

Annotation fixes and enhancement #4

Annotation fixes and enhancement #4

Comments

bpadovese commented Apr 28, 2022

bpadovese commented May 2, 2022

fsfrazao commented May 2, 2022

bpadovese commented May 2, 2022

yue-su commented May 3, 2022

fsfrazao commented May 3, 2022 • edited Loading

fsfrazao commented May 3, 2022

yue-su commented May 3, 2022

fsfrazao commented May 4, 2022

yue-su commented May 4, 2022

bpadovese commented May 4, 2022

yue-su commented May 5, 2022

bpadovese commented May 12, 2022

fsfrazao commented May 13, 2022

yue-su commented May 18, 2022 • edited Loading

bpadovese commented Jun 22, 2022

fsfrazao commented Jun 22, 2022

yue-su commented Jun 23, 2022

fsfrazao commented May 3, 2022 •

edited

Loading

yue-su commented May 18, 2022 •

edited

Loading