Add new provider for `hgnc.symbol`: `cbgda` #1225

nagutm · 2024-10-24T03:15:12Z

This resource does create unique identifiers in the format CB1, CB104, etc. as seen in the image below. However, the unique identifiers for each entity are not resolvable through this format of identifiers but rather the name of the gene for cbgda.gene and the name of the disease for cbgda.disease.

src/bioregistry/data/bioregistry.json

cthoyt · 2024-10-24T07:13:02Z

src/bioregistry/data/bioregistry.json

+      "orcid": "0009-0009-5240-7463"
+    },
+    "description": "This collection represents diseases linked to host genes identified via genome-wide CRISPR screens. It includes detailed disease classifications, gene-disease associations, and integrated data on genetic factors contributing to disease development.",
+    "example": "Glioblastoma",


We should have a discussion about if this qualifies as a semantic space or not. Combine with the fact that it's not a notable (or, from the UI design, high quality) resource, maybe we should come up with some criteria for skipping this kind of resource. This is similar to the fact that we don't import all of the Bioportal ontologies because a lot of them are junk

cthoyt

Have discussion about updating policy on criteria for notability + correctness
Update curation guidelines for semi-automated literature curation accordingly
Check if the gene namespace is just a provider for hgnc.symbol

bgyori · 2024-10-24T18:35:50Z

I agree with the general idea of using discretion to determine notability. We should probably come up with a curation tag in the paper curation tsv which expresses something like "relevant but not notable" (this would be a positive training sample for machine learning purposes but something that ended up not being added to the Bioregistry).

In this case, this is a resource published in Database (Oxford) with a working website so I wouldn't dismiss it apriori as not notable, and the content seems to be pretty useful. Overall, curating it simply as a provider for hgnc.symbol would be appropriate.

codecov · 2024-10-24T20:53:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 43.49%. Comparing base (8950e70) to head (e35d8f7).
Report is 130 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1225      +/-   ##
==========================================
+ Coverage   42.51%   43.49%   +0.97%     
==========================================
  Files         117      118       +1     
  Lines        8327     8190     -137     
  Branches     1963     1346     -617     
==========================================
+ Hits         3540     3562      +22     
+ Misses       4582     4464     -118     
+ Partials      205      164      -41

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nagutm · 2024-10-24T20:53:32Z

I think that not_notable should work as a tag to describe these types of resources for the future. If you agree, I can move forward with updating the relevancy vocabulary in the necessary files.
@bgyori

src/bioregistry/data/bioregistry.json

bgyori · 2024-10-28T14:53:09Z

I think that not_notable should work as a tag to describe these types of resources for the future. If you agree, I can move forward with updating the relevancy vocabulary in the necessary files. @bgyori

Yes, we can define a tag like that on a separate PR and use it in the future whenever appropriate.

…curation workflow (#1236) This pull request adds the `not_notable` tag to the CurationRelevance vocabulary as a way to mark papers that are relevant for machine learning training but do not meet the threshold for inclusion in the Bioregistry. While curating papers, there have been a few instances of entries that provide new identifier information but aren't notable enough, or well-maintained enough for inclusion in the bioregistry (#1225). Rather than curating these as subpar prefixes, tagging them as `not_notable` allows us to retain them as positive training samples without cluttering the bioregistry with less impactful entries. Co-authored-by: Mufaddal Naguthanawala <[email protected]>

Add cbgda.gene and cbgda.disease

b9b935c

cthoyt reviewed Oct 24, 2024

View reviewed changes

src/bioregistry/data/bioregistry.json Outdated Show resolved Hide resolved

cthoyt reviewed Oct 24, 2024

View reviewed changes

cthoyt requested changes Oct 24, 2024

View reviewed changes

Mufaddal Naguthanawala added 2 commits October 24, 2024 16:27

Remove cbgda.gene and cbgda.disease

5c80cfa

Update providers for hgnc.symbol

42a2a82

nagutm changed the title ~~Add prefix: cbgda.gene and cbgda.disease~~ Add new provider for hgnc.symbol: cbgda Oct 24, 2024

bgyori reviewed Oct 28, 2024

View reviewed changes

src/bioregistry/data/bioregistry.json Outdated Show resolved Hide resolved

bgyori reviewed Oct 28, 2024

View reviewed changes

src/bioregistry/data/bioregistry.json Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/main' into crispr

afe906d

Re-apply typo fixes

e35d8f7

This was referenced Oct 28, 2024

Add prefix: gmmid #1196

Closed

Add not_notable to CurationRelevance vocabulary for semi-automated curation workflow #1236

Merged

cthoyt merged commit 84581d5 into biopragmatics:main Nov 2, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new provider for `hgnc.symbol`: `cbgda` #1225

Add new provider for `hgnc.symbol`: `cbgda` #1225

nagutm commented Oct 24, 2024

cthoyt Oct 24, 2024

cthoyt left a comment •

edited

Loading

bgyori commented Oct 24, 2024

codecov bot commented Oct 24, 2024 •

edited

Loading

nagutm commented Oct 24, 2024

bgyori commented Oct 28, 2024

Add new provider for hgnc.symbol: cbgda #1225

Add new provider for hgnc.symbol: cbgda #1225

Conversation

nagutm commented Oct 24, 2024

cthoyt Oct 24, 2024

Choose a reason for hiding this comment

cthoyt left a comment • edited Loading

Choose a reason for hiding this comment

bgyori commented Oct 24, 2024

codecov bot commented Oct 24, 2024 • edited Loading

Codecov Report

nagutm commented Oct 24, 2024

bgyori commented Oct 28, 2024

Add new provider for `hgnc.symbol`: `cbgda` #1225

Add new provider for `hgnc.symbol`: `cbgda` #1225

cthoyt left a comment •

edited

Loading

codecov bot commented Oct 24, 2024 •

edited

Loading