-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new provider for hgnc.symbol
: cbgda
#1225
Conversation
"orcid": "0009-0009-5240-7463" | ||
}, | ||
"description": "This collection represents diseases linked to host genes identified via genome-wide CRISPR screens. It includes detailed disease classifications, gene-disease associations, and integrated data on genetic factors contributing to disease development.", | ||
"example": "Glioblastoma", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a discussion about if this qualifies as a semantic space or not. Combine with the fact that it's not a notable (or, from the UI design, high quality) resource, maybe we should come up with some criteria for skipping this kind of resource. This is similar to the fact that we don't import all of the Bioportal ontologies because a lot of them are junk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Have discussion about updating policy on criteria for notability + correctness
- Update curation guidelines for semi-automated literature curation accordingly
- Check if the gene namespace is just a provider for hgnc.symbol
I agree with the general idea of using discretion to determine notability. We should probably come up with a curation tag in the paper curation tsv which expresses something like "relevant but not notable" (this would be a positive training sample for machine learning purposes but something that ended up not being added to the Bioregistry). In this case, this is a resource published in Database (Oxford) with a working website so I wouldn't dismiss it apriori as not notable, and the content seems to be pretty useful. Overall, curating it simply as a provider for |
cbgda.gene
and cbgda.disease
hgnc.symbol
: cbgda
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1225 +/- ##
==========================================
+ Coverage 42.51% 43.49% +0.97%
==========================================
Files 117 118 +1
Lines 8327 8190 -137
Branches 1963 1346 -617
==========================================
+ Hits 3540 3562 +22
+ Misses 4582 4464 -118
+ Partials 205 164 -41 ☔ View full report in Codecov by Sentry. |
I think that |
Yes, we can define a tag like that on a separate PR and use it in the future whenever appropriate. |
…curation workflow (#1236) This pull request adds the `not_notable` tag to the CurationRelevance vocabulary as a way to mark papers that are relevant for machine learning training but do not meet the threshold for inclusion in the Bioregistry. While curating papers, there have been a few instances of entries that provide new identifier information but aren't notable enough, or well-maintained enough for inclusion in the bioregistry (#1225). Rather than curating these as subpar prefixes, tagging them as `not_notable` allows us to retain them as positive training samples without cluttering the bioregistry with less impactful entries. Co-authored-by: Mufaddal Naguthanawala <[email protected]>
This resource does create unique identifiers in the format CB1, CB104, etc. as seen in the image below. However, the unique identifiers for each entity are not resolvable through this format of identifiers but rather the name of the gene for
cbgda.gene
and the name of the disease forcbgda.disease
.