Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend iedb prefix to multiple semantic subspaces #1238

Merged
merged 4 commits into from
Nov 14, 2024

Conversation

nagutm
Copy link
Collaborator

@nagutm nagutm commented Oct 31, 2024

This PR addresses #1204 by extending the iedb prefix into several specific subprefixes to better capture distinct semantic spaces. The subprefixes introduced in this PR are: iedb.reference, iedb.assay, iedb.epitope, iedb.receptor, and iedb.mhc.

Two main approaches were evaluated for organizing these semantic subspaces:

  1. Retain iedb as a parent prefix: Under this approach, iedb serves as a source of general information, while each subprefix is curated for specific semantic spaces. This structure preserves generic mappings and publication data under the parent iedb prefix, reducing duplication and creating a single reference point for general resources. Each subprefix has a uri_format and pattern associated with it while the parent prefix does not.

  2. Eliminate the parent prefix and duplicate information: Alternatively, we could remove the iedb parent prefix entirely and duplicate general information, such as mappings and publications, across each subprefix. While this approach ensures each subprefix is independently comprehensive, it introduces a high amount of redundancy.

This PR implements the first approach, maintaining iedb as a parent prefix. The following chages were made:

  • General mappings such as fairsharing, integbio, re3data are retained under the parent, while subprefix-specific mappings (miriam, prefixcommons) are assigned to its related sematic subprefix.
  • Publications that apply broadly across IEDB are preserved under the parent prefix. Any future publications relevant only to specific semantic spaces could be added to the appropriate subprefix as needed.
  • iedb.antigen is curated separately as a provider for UniProt and ChEBI

One drawback of this approach is that it is not clear what sematic space the parent perfix represents, however as the bioregistry continues to grow larger in the future, I felt that an approach that reduces overall redundancy would be more useful.

@bgyori bgyori changed the title Extend iedb prefix to multiple sematic subspaces Extend iedb prefix to multiple semantic subspaces Oct 31, 2024
@nagutm
Copy link
Collaborator Author

nagutm commented Oct 31, 2024

Per #1217 (comment), I duplicated the publications entry in all subspaces

@bgyori
Copy link
Contributor

bgyori commented Nov 2, 2024

The test failure is due to the fact that according to strategy 1 outlined by @nagutm above, the parent iedb entry is preserved but it doesn't come with an example which is required per the test. This all relates to the more general issue at #1222, whether parent entries that don't correspond to a specific semantic space (which would have a specific pattern, uri_format, example, etc.) should be a thing, or in a situation like this with iedb, the parent entry should be completely dissolved.

@nagutm nagutm mentioned this pull request Nov 5, 2024
@nagutm
Copy link
Collaborator Author

nagutm commented Nov 11, 2024

After discussing with @bgyori we decided that strategy 2 would be better suited for the following reasons:

  • Clear semantic spaces: Each subprefix clearly defines a sematic space without any ambiguity as to what the parent prefix represents
  • Alignment with current testing standards: This method prevents us from having to modify current established testing standards that require certain fields to be present for a new prefix such as example

Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 43.45%. Comparing base (8950e70) to head (de79ace).
Report is 146 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1238      +/-   ##
==========================================
+ Coverage   42.51%   43.45%   +0.94%     
==========================================
  Files         117      118       +1     
  Lines        8327     8257      -70     
  Branches     1963     1357     -606     
==========================================
+ Hits         3540     3588      +48     
+ Misses       4582     4501      -81     
+ Partials      205      168      -37     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bgyori bgyori merged commit 8962331 into biopragmatics:main Nov 14, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants