Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotation fixes and enhancement #4

Open
bpadovese opened this issue Apr 28, 2022 · 17 comments · Fixed by #8
Open

Annotation fixes and enhancement #4

bpadovese opened this issue Apr 28, 2022 · 17 comments · Fixed by #8
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@bpadovese
Copy link

There is an issue when importing the annotation and creating the segments.

The annotations themselves are imported correctly however the created segments are not. There are to issues here.

  1. The first annotation groups are concatenated into a single segment. So if i have 3 annotations, i expect to create 3 segments. Instead, the 3 annotations are grouped into one segment with length starting from the start time of the first annotation and end time of the second annotation. This could be because all 3 annotations are close to each other (but not overlapping).
  2. Segments created by the annotation are padded with one second before the start and one second after the end of the annotation.

See images below for the issues:

Screenshot from 2022-04-28 16-27-01
Screenshot from 2022-04-28 16-26-54

We could consider adding the segments separate from the annotation.

@bpadovese bpadovese added bug Something isn't working enhancement New feature or request labels Apr 28, 2022
@bpadovese
Copy link
Author

I would like to expand on the first point, after using the app more, it seems very inconsistent when annotations are grouped into segments. I could not identify a pattern. Sometimes it creates 1 segment for each annotation, sometime it doesnt.

@fsfrazao
Copy link
Collaborator

fsfrazao commented May 2, 2022

@bpadovese, is this problem happening only when importing annotations to create the segments?

@bpadovese
Copy link
Author

Yes. only when importing annotations to create the segments.

@yue-su yue-su self-assigned this May 2, 2022
@yue-su
Copy link
Collaborator

yue-su commented May 3, 2022

The algorithm for importing annotations is designed like this:

  1. Neighboring annotations are aggregated into a group and mapped to a segment(to be created). Considering the processing time, each segment is no more than 60 seconds.

  2. There will be situations where a group has only one annotation because its neighbors are all far away from it (they cannot be grouped as the segment would be too long to process).

  3. A 1-second padding is added at the beginning and end of each segment, just so that the boundaries of annotation and the boundary of a segment do not overlap when the annotations are displayed in the spectrogram.

@fsfrazao @bpadovese The algorithm can be adjusted, but you guys need to let me know what exactly is the final result you expect.

@fsfrazao
Copy link
Collaborator

fsfrazao commented May 3, 2022

I think a better way would be to separate the import "annotations action" from the creation of segments (I believe this is what Bruno was suggesting above)

The workflow could be something like this:

  1. Import annotations. At this point, no segments are created.
  2. Create segments. This could be done through any of the available routes (manual creation, auto-generate or import segments).
  3. With both annotations and segments available, a batch can be created. In the batch creation step, there could be an option for the model developer to choose if he wants to create a blank batch or match the selected segments to existing annotations, in which case an algorithm very similar to what you described could be used.

In this case, when importing annotations the annotator field would be set to the developer who is importing them.
If necessary to keep things simple, in step 3 the developer could be limited only to matching their own annotations (i.e.: those that have their username in the "annotator" field.

Does that make sense?
Do you think it would require a lot of change?

@fsfrazao
Copy link
Collaborator

fsfrazao commented May 3, 2022

I guess the main point is that the model developer would like more control over what the segments look like.
The "import segments" option provides that: the model developer can create segments however they like. The only missing part is a way to link these segments to annotations.

I think what I suggested above would provide a flexible option, but other ways to achieve the same result could be fine too.

@yue-su
Copy link
Collaborator

yue-su commented May 3, 2022

I guess the main point is that the model developer would like more control over what the segments look like. The "import segments" option provides that: the model developer can create segments however they like. The only missing part is a way to link these segments to annotations.

I think what I suggested above would provide a flexible option, but other ways to achieve the same result could be fine too.

@fsfrazao I hope you can explain a little more, I do not quite understand the role of the segment.

Why the model developer needs to control the creation of segments in the import process. It seems to me that segments just provide an intermediate means between files to annotations, which is just to help ketos processing.

Does it matter to the annotator how segments are generated? What is the difference for the annotator whether each annotation is exclusively mapped to a single segment or several annotations mapped to a shared segment? or any other combinations? Wouldn't the latter be more effective?

The current algorithm is very effective in automatically generating the necessary segments, what is the difference compared to the segments created manually by the model developer? Maybe the latter can customize the length of segments, but does this have any particular purpose?

@fsfrazao
Copy link
Collaborator

fsfrazao commented May 4, 2022

Well, the segment is also a way for the model developer to control what they want to expose to the annotator in terms of temporal context.

In a validation scenario where the developer wants the annotator to verify existing annotations (generated by a model or another anotator), the model developer might ask for different tasks from the annotators, such as:

  • From all the detections produced by a machine learning model, which ones were false positives?
  • Did the model miss any kw call in this entire recording?
  • Can you verify periods without any kw calls immediately before and after each model-generated annotation?

All of these can be use the same imported annotations, but the second and third tasks would be impossible if the segment does not extend beyond the annotation. This is the kind of thing Bruno is trying to do. His detector-generated outputs are always 3 seconds, , but he might want to show 1 minute long segments to the annotators so they can not only confirm if the model detection was correct but also if it missed something in the vicinity of each detection.

@yue-su
Copy link
Collaborator

yue-su commented May 4, 2022

@fsfrazao
Thanks for the info! This is a really useful context that could help with MAPIL's design as well.

Right now, It is relatively easy to provide some controls for generating segments in the import annotation interface, such as the length of the segment, whether to map multiple annotations to a single segment or to generate a single segment per annotation. There will be some edge cases, e.g. some annotations may cross the boundary of a segment, so it is also possible to control whether there is some overlap between segments. How these segments are generated depends on how the user wants to use the feature eventually. It is possible to implement such control on HALLO with the current model.

However, To implement the scenario you described earlier, for example importing annotations and segments separately and then associating them, would require major changes to the database and backend. In the current database design, annotation, batch, and segment are in a one-to-many relationship, so each annotation must exist in a batch and must be associated with a segment. the scenario you mentioned requires a many-to-many association in the database, and separate tables to track the association between segment, batch, and annotation. This design would increase the overall complexity of data processing and the user interface would need to be adjusted accordingly. I feel this can be developed as a requirement for MAPIL, or just redevelop HALLO for the 2.0 version.

@bpadovese
Copy link
Author

I think what you suggested in the first paragraph is just fine for now. So putting it into bullet points, would it be reasonably easy to change the algorithm to this?:

  1. Neighboring annotations are aggregated into a group and mapped to a segment(to be created). Each segment would have a length defined by the user.
  2. In situations where a group only has one annotation, the length of this segment will be the same as the above.
  3. No padding is added if possible.
  4. In edge cases where an annotation falls between two segments we can just extend a little bit the segment windows to include the annotation.

However, I think for MAIPL, what you described in the second paragraph would be ideal. For the user, it wouldnt make a difference, the functionality would be the same. But I believe it would add flexibility for any future feature we would like to add to MAIPL if we treat annotations/batches/segments independently and associated them in a may-to-many relation.

How does this sound?

@yue-su
Copy link
Collaborator

yue-su commented May 5, 2022

@bpadovese

Yes, I can adjust the algorithm to achieve this requirement. I don't know if I understand this accurately:

For example, if the user specifies that each segment is 60 seconds long, then the file will be segmented into 60s long segments and annotations will be mapped to the corresponding segments.

If the end time of an annotation exceeds the end time of a segment, the end time of the segment needs to be extended accordingly.

No padding means the last segment can be ended at the end of the file and does not need to be extended to the 60s length, right?

@yue-su yue-su linked a pull request May 6, 2022 that will close this issue
@bpadovese
Copy link
Author

Yes, that is what i mean. Does this sound good to you as well? @fsfrazao

@yue-su yue-su linked a pull request May 13, 2022 that will close this issue
@fsfrazao
Copy link
Collaborator

That sounds good to me too.

Just to confirm that I understand it correctly, if there's an annotation table listing only one 3s annotation for one file that is 15min long and the user specifies 60s segments, will the batch contain 15 long segments with a duration of 60s each, and one of them will have the annotation?

@yue-su
Copy link
Collaborator

yue-su commented May 18, 2022

That sounds good to me too.

Just to confirm that I understand it correctly, if there's an annotation table listing only one 3s annotation for one file that is 15min long and the user specifies 60s segments, will the batch contain 15 long segments with a duration of 60s each, and one of them will have the annotation?

Yes, this is the updated algorithm.

@bpadovese
Copy link
Author

Opening this issue again. The algorithm and interface are think are much better and easier to understand now. I have just one request. Currently the algorithm is working more or less in the following way:

  1. Check which files contain annotations
  2. Create segments of length X (defined by user) for each of these files. So for instance, if I set length to 60 s and the file duration is 600 s, then it would create 10 segments of length 60 s.

I think this is good as is if the model developer want the annotator to check for additional false positives in the file. However, would it be possible to add an additional option (a checkbox) that would delete segments that did not contain any annotations?

Therefore the end result would be a batch where each segment of length X contained an annotation. The algorithm would essentially be the same with one extra step:

  1. Check which files contain annotations
  2. Create segments of length X (defined by user) for each of these files. So for instance, if I set length to 60 s and the file duration is 600 s, then it would create 10 segments of length 60 s.

IF checkbox is checked

  1. Delete segments that do not contain an annotation

the use case of this would be similar to mine, where I just want to validate my models detection but I don't care about False positives.

Does this make sense to you guys?

@bpadovese bpadovese reopened this Jun 22, 2022
@fsfrazao
Copy link
Collaborator

That makes sense, Bruno.
A "Remove segments without annotations" checkbox would be a good idea.

@yue-su
Copy link
Collaborator

yue-su commented Jun 23, 2022

yup, it's possible. Just need to filter the segments and delete those ones that don't have annotations on them after the importation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants