Agents duplication on ontology parsing #644

galviset · 2024-12-17T15:35:12Z

Describe the bug
When parsing a new submission of an ontology, it sometimes create duplicate Agents objects with the same name.
The conditions for it to happen are not always clear, but people described with a string concatenating more than just names in the ontology file (e.g. "Guillaume Alviset https://orcid.org/0009-0004-4295-6593") will trigger that behavior.

Screenshots

The text was updated successfully, but these errors were encountered:

syphax-bouazzouni · 2024-12-18T07:47:04Z

Here are two possible solutions:

Disable the extraction of Agents
Disable the extraction of Agents if the submission has already Agents set in the previous submission.

jonquet · 2024-12-19T08:16:01Z

I will rephrase expression only one "proposed" solution :
Enable agent extraction when parsing the first submission of an ontology (if submissionId=1) and disable it for subsequent submissions (submissionId >1). Make this setting (enabled/not enabled) accessible to ontology admins in their admin panel.

syphax-bouazzouni · 2024-12-19T08:25:09Z

Enable agent extraction when parsing the first submission of an ontology (if submissionId=1) and disable it for subsequent submissions (submissionId >1).

OK

Make this setting (enabled/not enabled) accessible to ontology admins in their admin panel.

Not really possible for now as we don't have a configuration workflow in the UI ontoportal-lirmm/bioportal_web_ui#836

jonquet · 2024-12-19T08:30:33Z

In fact, I did not meant to have this in a "general" admin panel. But in an ontology specific panel, the one we are talking about doing to split the "Edit submission" page into 2 main part: (i) one related to metadata and (ii) one related to how AgroPortal deal with the ontology.

So typically, this would go in the second "part".

And for the moment, this perspective to seperate Edit submission into 2 part is a UI only contribition, which means all of these would still be based on properties of a submlsison.

In other words, we only have to create a boolean property extractAgentsFromSourceFile and then use it in the processing workflow to skip or not the exclusion of agent extraction.

syphax-bouazzouni · 2024-12-19T08:41:33Z

In other words, we only have to create a boolean property extractAgentsFromSourceFile and then use it in the processing workflow to skip or not the exclusion of agent extraction.
In summary to do that we need:

Add the attribute in the submission model
Add the metadata of that attribute to the .yml file to explain what it is.
Update the metadata extract to read it and implement the logic
Update the UI to add the property
Test all of this

Bilelkihal · 2025-01-02T10:42:48Z

In other words, we only have to create a boolean property extractAgentsFromSourceFile and then use it in the processing workflow to skip or not the exclusion of agent extraction.
In summary to do that we need:

Add the attribute in the submission model

Add the metadata of that attribute to the .yml file to explain what it is.

Update the metadata extract to read it and implement the logic

Update the UI to add the property

Test all of this

I don't see the need to overcomplicate things for such a small feature, but if we plan to add more options for controlling how AgroPortal handles each ontology separately, then why not (to be discussed in the next meet).

I also don't prefer the solution of extracting only from the first submission.
Why not simply add a heuristic to detect if the agent already exists, and if so, avoid creating it again?

jonquet · 2025-01-03T14:43:03Z

The feature was enabled in ontoportal-lirmm/ontologies_linked_data#154

The current code is here: https://github.com/ontoportal-lirmm/ontologies_linked_data/blob/master/lib/ontologies_linked_data/services/submission_process/operations/submission_extract_metadata.rb#L276

Discussed today:
We shall reformulate the Syphax's proposition : disable the extraction of Agents if the submission has already Agents set in the previous submission.
to
Disable the extraction of any "person and organization" properties properties if the ontology has already some values set in the previous submission.

We accept the consequence that extraction of an agent in ontology2 could recreate an agent that exists already for ontology1. In other words: any parsing with extraction of agents need a curation of the agents.

When implementing the new ontology parsing report: we shall list the agents extracted.

This solution allows to implement a solution independant from the ontology and not relying on a parameter (global or ontology specific).

Note: the behviour proposed for "person and organization" category is the opposite of the default behaviour which consists to always give the priority to what is in the file compared to what we have in the metadata record.

jonquet · 2025-01-03T14:50:50Z

This solution allows to implement a solution independant from the ontology and not relying on a parameter (global or ontology specific).

Note: the behviour proposed for "person and organization" category is the opposite of the default behaviour which consists to always give the priority to what is in the file compared to what we have in the metadata record.

jonquet · 2025-01-03T14:56:33Z

Another solution would consist of remembering the fact that a "agent string" has already been extracted ... for instance by :

keeping a record/ temp file of all the extracted "agent string"
create an ID based on a hash generated from the "agent string"

Solution not preferred as this will require some curation again of things that have been already curated if the "agent string" would change in any ways.

syphax-bouazzouni added the bug label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agents duplication on ontology parsing #644

Agents duplication on ontology parsing #644

galviset commented Dec 17, 2024

syphax-bouazzouni commented Dec 18, 2024

jonquet commented Dec 19, 2024

syphax-bouazzouni commented Dec 19, 2024 •

edited

Loading

jonquet commented Dec 19, 2024 •

edited by syphax-bouazzouni

Loading

syphax-bouazzouni commented Dec 19, 2024

Bilelkihal commented Jan 2, 2025

jonquet commented Jan 3, 2025 •

edited

Loading

jonquet commented Jan 3, 2025

jonquet commented Jan 3, 2025

Agents duplication on ontology parsing #644

Agents duplication on ontology parsing #644

Comments

galviset commented Dec 17, 2024

syphax-bouazzouni commented Dec 18, 2024

jonquet commented Dec 19, 2024

syphax-bouazzouni commented Dec 19, 2024 • edited Loading

jonquet commented Dec 19, 2024 • edited by syphax-bouazzouni Loading

syphax-bouazzouni commented Dec 19, 2024

Bilelkihal commented Jan 2, 2025

jonquet commented Jan 3, 2025 • edited Loading

jonquet commented Jan 3, 2025

jonquet commented Jan 3, 2025

syphax-bouazzouni commented Dec 19, 2024 •

edited

Loading

jonquet commented Dec 19, 2024 •

edited by syphax-bouazzouni

Loading

jonquet commented Jan 3, 2025 •

edited

Loading