-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend iedb
prefix to multiple semantic subspaces
#1238
Conversation
iedb
prefix to multiple sematic subspacesiedb
prefix to multiple semantic subspaces
Per #1217 (comment), I duplicated the publications entry in all subspaces |
The test failure is due to the fact that according to strategy 1 outlined by @nagutm above, the parent |
After discussing with @bgyori we decided that strategy 2 would be better suited for the following reasons:
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1238 +/- ##
==========================================
+ Coverage 42.51% 43.45% +0.94%
==========================================
Files 117 118 +1
Lines 8327 8257 -70
Branches 1963 1357 -606
==========================================
+ Hits 3540 3588 +48
+ Misses 4582 4501 -81
+ Partials 205 168 -37 ☔ View full report in Codecov by Sentry. |
This PR addresses #1204 by extending the
iedb
prefix into several specific subprefixes to better capture distinct semantic spaces. The subprefixes introduced in this PR are:iedb.reference
,iedb.assay
,iedb.epitope
,iedb.receptor
, andiedb.mhc
.Two main approaches were evaluated for organizing these semantic subspaces:
Retain
iedb
as a parent prefix: Under this approach, iedb serves as a source of general information, while each subprefix is curated for specific semantic spaces. This structure preserves generic mappings and publication data under the parent iedb prefix, reducing duplication and creating a single reference point for general resources. Each subprefix has auri_format
andpattern
associated with it while the parent prefix does not.Eliminate the parent prefix and duplicate information: Alternatively, we could remove the iedb parent prefix entirely and duplicate general information, such as mappings and publications, across each subprefix. While this approach ensures each subprefix is independently comprehensive, it introduces a high amount of redundancy.
This PR implements the first approach, maintaining
iedb
as a parent prefix. The following chages were made:fairsharing
,integbio
,re3data
are retained under the parent, while subprefix-specific mappings (miriam
,prefixcommons
) are assigned to its related sematic subprefix.iedb.antigen
is curated separately as a provider for UniProt and ChEBIOne drawback of this approach is that it is not clear what sematic space the parent perfix represents, however as the bioregistry continues to grow larger in the future, I felt that an approach that reduces overall redundancy would be more useful.