Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Label Assignment for Multiple Objects in Label Studio ML Backend #664

Open
Buckler89 opened this issue Nov 9, 2024 · 2 comments

Comments

@Buckler89
Copy link

Description:

I discovered the problem when using Label Studio ML Backend to label a video with multiple objects to track. This issue occurs consistently with all videos that contain multiple objects, regardless of their complexity or duration. All objects in a task receive the same label in the user interface, regardless of the distinct labels assigned by the model.

Key Issue:
The core problem is that when updating annotations via the model's response, the Label Studio interface is expected to display distinct labels for each object. However, it ends up displaying identical labels for all objects, even if the labels are supposed to be different. This happens both when creating new annotations and updating existing ones.

Request:

  • Ensure that when multiple objects are present in the predictions, all of them are correctly displayed in the UI with their respective labels.

Steps to Reproduce:

  1. Start by using a label config template for video object tracking: https://labelstud.io/templates/video_object_detector.

  2. Configure Label Studio ML Backend and return the following dummy prediction. A 'dummy prediction' in this context refers to a sample output generated by the ML backend to simulate how predictions would look for multiple objects being tracked. This example demonstrates how the ML backend is expected to return prediction data for multiple tracked objects. The goal is to ensure that each object is uniquely labeled, which aligns with the intended workflow of providing distinct labels for different objects in the Label Studio UI:

    def predict(...):
        import json
        prediction = json.dumps({
            "model_version": None,
            "score": 0.0,
            "result": [
                {
                    "value": {
                        "framesCount": 43,
                        "duration": 4.3,
                        "sequence": [
                            {
                                "frame": 1,
                                "x": 51.42,
                                "y": 50.35,
                                "width": 3.41,
                                "height": 1.39,
                                "enabled": True,
                                "rotation": 0,
                                "time": 0.0
                            },
                            {
                                "frame": 2,
                                "x": 50.43,
                                "y": 50.35,
                                "width": 3.41,
                                "height": 1.39,
                                "enabled": True,
                                "rotation": 0,
                                "time": 0.1
                            }
                        ],
                        "labels": [
                            "Woman"
                        ]
                    },
                    "from_name": "box",
                    "to_name": "video",
                    "type": "videorectangle",
                    "origin": "manual",
                    "id": "mGjkBhvZpv"
                },
                {
                    "value": {
                        "framesCount": 43,
                        "duration": 4.3,
                        "sequence": [
                            {
                                "frame": 1,
                                "x": 0.0,
                                "y": 50.35,
                                "width": 78.98,
                                "height": 49.31,
                                "enabled": True,
                                "rotation": 0,
                                "time": 0.0
                            },
                            {
                                "frame": 2,
                                "x": 0.0,
                                "y": 50.87,
                                "width": 78.55,
                                "height": 48.78,
                                "enabled": True,
                                "rotation": 0,
                                "time": 0.1
                            }
                        ],
                        "labels": [
                            "Man"
                        ]
                    },
                    "from_name": "box",
                    "to_name": "video",
                    "type": "videorectangle",
                    "origin": "manual",
                    "id": "dssMyWHsv_"
                }
            ]
        })
        return ModelResponse(predictions=[json.loads(prediction)])
  3. Observe that all objects in the UI are incorrectly assigned the label of the first object from the predictions, instead of displaying their respective unique labels as provided by the model.

@makseq
Copy link
Member

makseq commented Nov 12, 2024

Sounds a bit weird..
You can check this yolo video detection as reference:
https://github.com/HumanSignal/label-studio-ml-backend/blob/master/label_studio_ml/examples/yolo/control_models/video_rectangle.py#L88-L126

Last time I tested it, this worked as expected.

@Buckler89
Copy link
Author

Hi Max,

I tested YOLO based on your suggestion. I can confirm the following:

  • It works only if no boxes are present before triggering the model's prediction. If the model is called on task loading (when you select the task, it opens, and the prediction starts automatically without any intervention), it works as expected.
  • However, if you manually label an object and then click on the class label to trigger the prediction, when it returns, all objects are displayed with the same label, even though the ML backend is functioning correctly.

This seems to work for models like YOLO that ignore user input, but it does not work at all for models like SAM2, which require one or more prompts as initial input.

Additionally, even for YOLO, it doesn't work if the user adds boxes that the model cannot detect on its own, as the problem reoccurs.

To reproduce with YOLO:

  1. Disable auto-annotation
  2. Delete all video annotations if present
  3. Enable auto-annotation
  4. Draw a box and assign a label
  5. Wait for the model to finish prediction
  6. Verify that all labels are identical

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants