Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a price_tags table to store predictions #611

Closed
raphael0202 opened this issue Dec 8, 2024 · 7 comments · Fixed by #628, #629, #630, #632 or #631
Closed

Add a price_tags table to store predictions #611

raphael0202 opened this issue Dec 8, 2024 · 7 comments · Fixed by #628, #629, #630, #632 or #631

Comments

@raphael0202
Copy link
Contributor

raphael0202 commented Dec 8, 2024

In #526, we've talked about how to speed up price addition using ML/AI (#526 (comment)).

For price tags, the workflow seems now clear:

  1. user upload their proof
  2. an object detection model identifies individual price tags. The object detection model returns the coordinates of each detected bounding box, along with a confidence score.
  3. we extract information from the price tags: price, EAN or product category, price type, organic or not
  4. we let the user (and later other users) fix and validate the prediction

Step (2) is currently in development. Once the proof is uploaded, we would ideally run all ML models as an async job.
For each detected price tag, we need to store some information:

  • the bounding box of the price tag with respect to the original proof
  • the detected price
  • the detected category and/or EAN
  • organic/non-organic
  • the price per (per kg or per unit)
  • origins (for raw products)

I suggest we create a new price_tags table. Storing intermediate data in a new table is necessary to allow performing async processing. The user won't have to wait for models to run, and we can distribute workload (=price validation) between contributors. Besides, it allows having gold truth data evaluating and training models that extract information from price tags.

Schema

  • id: ID of the price tag
  • proof_id: FK to the proof
  • created: creation datetime
  • updated: datetime of last update
  • bounding_box: the coordinates of the bounding box, in relative coordinated as (y_min, x_min, y_max, x_max). The origin is the top-left corner of the image.
  • status: The status of the price tag: either a price is already linked for this price tag (status=1), it may be waiting for approval or completion (status=null), or it may be invalid (status=0, the information cannot be read or is hidden). Only price_tags with null status will be suggested in a "Hunger Games-like" games.

Fields pre-filled by the extraction model (currently Gemini):

  • predicted_product_code
  • predicted_category_tag
  • predicted_price
  • predicted_price_per
  • predicted_price_is_discounted
  • price_without_discount
  • predicted_currency
  • predicted_labels_tags
  • predicted_origins_tags

Fields that are derived from predictions, validated by the user. These fields are null before the price tag is validated by the user.

  • price
  • price_is_discounted
  • price_without_discount
  • price_per
  • product_code
  • category_tag
  • currency
  • labels_tags
  • origins_tags
  • models_info (JSONB): extra information about the version of the models that generated the prediction, about which model was used for what, etc.

We can add a nullable price_tag_id in the prices table to keep track of the individual price tag that is behind a price.

Workflow

When a new proof is uploaded, the price tag object detector model is run on the image of the proof. We create one element in the price_tags table for each detected price tag (above a fixed threshold).
For each detected price tag, we run the Gemini model on it and save the results in the predicted_* fields.
The status of the price_tag is null by default. Users can validate the extracted data by calling an endpoint to retrieve all price_tags with null status. Once validated, a new price is created linked with the original price_tag using the price_tag_id foreign key.

@raphodn
Copy link
Member

raphodn commented Dec 11, 2024

Thanks for the detailed issue !

A few remarks:

  • after having created a ProofPrediction table (in Create a proof_prediction table to store predictions from ML models #511), isn't there similarities we could reuse here ? it might be complexifying, but I see things in steps :
    • first create the PriceTag table that stores a bounding box and a status. that could already be fed to the user in the frontend to help crowdsource prices (needs to type the barcode & price)
    • then create a PriceTagPrediction that stores the result of 1 or multiple models that we run on the image. that we could re-run as well in the future if we improve the model, etc. And we use it to improve the UI given to the user (no need to type the barcode & price anymore, it's just validation)
  • I see this (and the already-created ProofPrediction) in a dedicated ml sub-app. The backend should be able to run without these AI. Even if it will become "core" to speeding up price collection, we shouldn't make it mandatory.

@raphael0202
Copy link
Contributor Author

then create a PriceTagPrediction that stores the result of 1 or multiple models that we run on the image. that we could re-run as well in the future if we improve the model, etc. And we use it to improve the UI given to the user (no need to type the barcode & price anymore, it's just validation)

I get your point, but to me there is a difference between the proof_predictions table and the price_tags one: we have several types of proofs (receipt, price tag), for which we run different models on:

  • one to detect bounding box for price tags
  • one to classify the proof
  • (later) one to extract all values for receipts

The fact we have different models specific to different type of proofs was the reason we created a generic proof_predictions table.
Here, for price tags, the extraction model (currently Gemini) will only deal with price tags. I find it more convenient to have all data in a single table, as otherwise we have to deal in the backend (and the front-end) with possibly multiple predictions of the same model type.
It's something we do in Robotoff, for good reasons (as we can extract the same information type from multiple images), but at the cost of greater complexity.

I see this (and the already-created ProofPrediction) in a dedicated ml sub-app. The backend should be able to run without these AI. Even if it will become "core" to speeding up price collection, we shouldn't make it mandatory.

I agree that the AI should be optional from the backend side. And the user should be allowed to draw bounding boxes manually using the web app to create new price tags.

@raphodn
Copy link
Member

raphodn commented Dec 12, 2024

Here, for price tags, the extraction model (currently Gemini) will only deal with price tags. I find it more convenient to have all data in a single table, as otherwise we have to deal in the backend (and the front-end) with possibly multiple predictions of the same model type.

But I don't understand how with your current price_tags model proposal you can store multiple predictions ? It's missing a data JSONField, thus we could simply plug in the ProofPrediction model (or a dedicated PricePrediction)

@raphael0202
Copy link
Contributor Author

raphael0202 commented Dec 12, 2024

But I don't understand how with your current price_tags model proposal you can store multiple predictions ? It's missing a data JSONField, thus we could simply plug in the ProofPrediction model (or a dedicated PricePrediction)

We don't store multiple predictions (ex: we don't store 2 price prediction by 2 different models). On Robotoff, it turns out after a couple of years that we never needed predictions from 2 models at the same time: when a new model is trained and tested, I just delete all the predictions associated with this model and relaunch the model on all images.

edit: to make thing clearer, we can store in the current schema of the price_tags table predictions coming from two different models that do different things. Ex, I plan to add a predicted_blurriness field, that will be predicted by a different model than Gemini.

@raphodn
Copy link
Member

raphodn commented Dec 12, 2024

We don't store multiple predictions (ex: we don't store 2 price prediction by 2 different models)

Ok but I would be in favor to have the flexibility to do any number of predictions, for instance ones coming from Gemini, and another coming from our own model, and have both show up in the UI to help the user, or help us test/compare while we transition out of GenAI, no ? That's why I like the JSONField where we can have any number of predictions :)

@raphael0202
Copy link
Contributor Author

If you want to keep the flexibility to have any number of predictions of the same type, it's better to have a PriceTagProofPrediction as you suggested!
I'm down for creating this new table then, if we plan to implement model comparison in the front-end :)

@raphael0202
Copy link
Contributor Author

raphael0202 commented Dec 16, 2024

Updated schema, after the discussions above:

Schemas

price_tags table

  • id: ID of the price tag
  • proof_id: FK to the proof
  • price_id: FK to the price created from this price tag, can be null.
  • created: creation datetime
  • updated: datetime of last update
  • bounding_box: the coordinates of the bounding box, in relative coordinated as (y_min, x_min, y_max, x_max). The origin is the top-left corner of the image. Cannot be null.
  • status: The status of the price tag: either a price is already linked for this price tag (status=1), it may be waiting for approval or completion (status=null), the price or the barcode cannot be read (status=2), the object was deleted by a user (status=0). Only price_tags with null status will be suggested in a "Hunger Games-like" games.
  • model_version: the version of the object detector model that created this price tag. If it was created by a human, this field is null.
  • created_by: the name of the user who created the price tag. If the price tag was created automatically after object detection, this field is null.
  • updated_by: the name of the user who updated the price tag coordinates. If the price tag was created automatically and never updated, this field is null.

price_tag_predictions table

  • id: ID of the price tag prediction
  • price_tag_id: the ID of the price tag (FK)
  • type: type of the prediction. Currently, only one value is supported: price_tag_extraction
  • model_name: name of the model. Currently, there is only one model: gemini
  • model_version: version of the model. Currently, there is only one version: gemini-1.5-flash
  • data: JSONB containing prediction data returned by the model. The schema of the dictionary is specific to the model.
  • created: creation datetime

Workflow

When a new proof is uploaded, the price tag object detector model is run on the image of the proof. We create one element in the price_tags table for each detected price tag (above a fixed threshold).
For each detected price tag, we run the Gemini model on it and create a new PriceTagPrediction object in DB linked to the PriceTag.
The status of the price_tag is null by default. Users can validate the extracted data by calling an endpoint to retrieve all price_tags with null status. Once validated, a new price is created linked with the original price_tag using the price_tag.price_id foreign key.

raphael0202 added a commit that referenced this issue Dec 16, 2024
raphael0202 added a commit that referenced this issue Dec 16, 2024
raphael0202 added a commit that referenced this issue Dec 16, 2024
@raphodn raphodn moved this from Backlog to In progress in 💸 Open Prices Dec 17, 2024
@raphodn raphodn linked a pull request Dec 17, 2024 that will close this issue
@raphodn raphodn changed the title Add a price_tags table Add a price_tags table to store predictions Dec 17, 2024
@raphodn raphodn closed this as completed Dec 19, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in 💸 Open Prices Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment