Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New price validation assistant page (extracted and predicted from owned proofs) #1137

Open
raphodn opened this issue Dec 18, 2024 · 11 comments · Fixed by #1145, #1158, #1152, #1159 or #1177
Open

New price validation assistant page (extracted and predicted from owned proofs) #1137

raphodn opened this issue Dec 18, 2024 · 11 comments · Fixed by #1145, #1158, #1152, #1159 or #1177

Comments

@raphodn
Copy link
Member

raphodn commented Dec 18, 2024

Story

Following a new major feature in the backend (see openfoodfacts/open-prices#611), we can start showing user's individual price tags that have been extracted from proofs, and predicted thanks to Gemini AI. This data is stored in the backend, and available via API.

Anyone could add prices on any predicted PRICE_TAG proof, but to start with we can restrict the UI to only display price tags predicted from owned proofs.

Users would upload single (or multiple) PRICE_TAG proofs in one page, and validate the extracted prices in another (to be developed).

How

To be defined :)

@raphodn raphodn changed the title New price validation page (extracted and predicted from proofs) New price validation page (extracted and predicted from owned proofs) Dec 18, 2024
@monsieurtanuki
Copy link
Collaborator

Users would upload single (or multiple) PRICE_TAG proofs on one hand, and validate the extracted prices on the other.

As already evoked with @raphael0202 in openfoodfacts/smooth-app#6043 (comment), the tricky part, UX-wise, is the "is my proof AI-compliant enough" question? If not, the user needs to upload another image, this time "good enough".
In the current scenario, we don't know that for sure until the proof image is uploaded, do we?
In Smoothie we're probably able to detect the barcodes and let the user decide "ok, those 3 barcodes are correct, let's upload the proof". That's only part of what the server computes, but that would be a start. Of course that doesn't include "category prices", like "tomatoes per kg" (btw some countries use imperial mesures, and I'm not sure Prices support that).

That said, the process described here is - correct me if I'm wrong:

  • the user takes a picture, uploads it, and gets a proof ID (optional if we're talking about already existing proofs)
  • there's a new API that you call with the proof ID as a parameter, that returns the list of prices it detected from the proof image
  • the typical use-case would be online users that just confirm prices after proof upload and API call
  • a good way to test how relevant Gemini AI is would be to run the API on all current prices tag proofs and check how many differences we spot between the prediction and the actual prices

Are you going to provide data for history proofs, or only new proofs?

@raphodn
Copy link
Member Author

raphodn commented Dec 18, 2024

The process is described here: openfoodfacts/open-prices#526 (comment)
This issue is regarding the implementation of "Step 3": only price validation.

For now only on new price tags (generated from new proofs. or at the very least proofs with 0 prices).
Historical proofs (with prices) will need some extra time (we would need to match their generated price tags with existing prices, and surface only the "missing" price tags) so will be dealt later.

@raphael0202
Copy link
Contributor

raphael0202 commented Dec 18, 2024

detect the barcode

We're not detecting the barcode per se, rather the EAN (digits) on the image. So a good rule of thumb for considering an image as correct is, "can we read the EAN of individual price tags?"
Either the use takes a photo close to the price tags, in which case we will most probably get all the data we need, but we only get one price. Or he/she takes a step back and takes a picture that encloses several price tags, at the risk of some price tags being blurry or not readable.
Note that the user who takes the picture is not necessarily the one validating the price : some people can only take pictures in stores and some other can validate prices, that's two forms on contributions that are complementary.
This new feature is more geared towards heavy contributors: taking dozen of photos only takes a few minutes in a store, but can lead to 100 prices added. We favor quantity here, if a few price tags are not readable, that's ok.

That's only part of what the server computes, but that would be a start. Of course that doesn't include "category prices", like "tomatoes per kg" (btw some countries use imperial mesures, and I'm not sure Prices support that).

Category prices are supported too.

a good way to test how relevant Gemini AI is would be to run the API on all current prices tag proofs and check how many differences we spot between the prediction and the actual prices

We ran this morning the extraction on all proofs fyi, it's in progress.

Are you going to provide data for history proofs, or only new proofs?

For historical proofs as well.

That said, the process described here is - correct me if I'm wrong:

You can look at the parent issue for how I considered things, it's not limited to the current proof. It would be more like a hunger game like game.
How we integrate this to the main workflow is an open question though!

@monsieurtanuki
Copy link
Collaborator

If it's more specifically about a "price validation page", I have 2 distinct suggestions:

  1. like with some online bank accounts, you see a list of lines (in our case, barcode+price), and a check button at the end of each line. Assuming that the user can see the whole proof image too.
  2. a more "hunger games" option with individual "price box"es, typically a cropped price tag image, together with predicted barcode+price, and "correct/incorrect/don't know" buttons

Of course, as you don't really scan barcodes but just read barcode numbers, a barcode number check would make sense.

@raphodn raphodn moved this from Backlog to In progress in 💸 Open Prices Dec 19, 2024
@raphodn raphodn linked a pull request Dec 19, 2024 that will close this issue
raphodn added a commit that referenced this issue Dec 19, 2024
raphodn added a commit that referenced this issue Dec 19, 2024
@serpico
Copy link

serpico commented Dec 20, 2024

Sometimes there's a weird/price/kg value being calculated/displayed which is confusing, here's the product's weight is obviously known but the price per kg is wrong

Image

@monsieurtanuki
Copy link
Collaborator

Sometimes there's a weird/price/kg value being calculated/displayed which is confusing, here's the product's weight is obviously known but the price per kg is wrong

Perhaps similar to openfoodfacts/smooth-app#6043 (comment)

@raphodn
Copy link
Member Author

raphodn commented Dec 20, 2024

Fyi the current page is "alpha", there's a few things to fix

@raphodn
Copy link
Member Author

raphodn commented Dec 27, 2024

  • the price per unit bug should be fixed (was only happening in the *Assistant pages)
  • the number of prices to validate in the "Price Validator Assistant" page should have decreased by a big factor, we now only surface prices coming from proofs uploaded via the "Proof add single" & "Proof add multiple" pages

Big pain-point remaining:

  • displaying the price tag image when fixing the barcode (type dialog)

@raphodn
Copy link
Member Author

raphodn commented Dec 31, 2024

How/When do we open it up to the community ? (allowing anyone to validate price, not just the proof owner)

  • my main concern is avoiding duplicate prices
    • we shouldn't load the latest prices by default, but some kind of random
    • maybe display prices 1 by 1 instead of by batch ?
  • would be nice to have a toggle to go between "all the prices" and "my proof's prices" ?
  • also we should restrict "ready_for_price_tag_validation" proofs from being used elsewhere then in the Assistants, no ?

edit: opened a PR #1219 (that opens the page to everyone + filter toggle)

@TTalex
Copy link
Collaborator

TTalex commented Jan 3, 2025

[Avoiding duplicates]

Fetching randomly is a good idea, other than that:

  • Hard/Silly way: show random price tags and "lock" them for a minute or until the user leaves the page. Locked price tags cannot be shown to other users. (similar to reserving your train seat before paying)
  • Medium way: Only allow one price per price tag in the db, and replace it with the latest added price if duplicates happens
  • Easy way: Ignore duplicates, it's not a big deal if the price is added multiple times, and it can be cleaned up with scripts later on. (Panoramax does that)

@raphodn
Copy link
Member Author

raphodn commented Jan 4, 2025

Todo:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment