Extend `iedb` prefix to multiple semantic subspaces #1238

nagutm · 2024-10-31T13:46:09Z

This PR addresses #1204 by extending the iedb prefix into several specific subprefixes to better capture distinct semantic spaces. The subprefixes introduced in this PR are: iedb.reference, iedb.assay, iedb.epitope, iedb.receptor, and iedb.mhc.

Two main approaches were evaluated for organizing these semantic subspaces:

Retain iedb as a parent prefix: Under this approach, iedb serves as a source of general information, while each subprefix is curated for specific semantic spaces. This structure preserves generic mappings and publication data under the parent iedb prefix, reducing duplication and creating a single reference point for general resources. Each subprefix has a uri_format and pattern associated with it while the parent prefix does not.
Eliminate the parent prefix and duplicate information: Alternatively, we could remove the iedb parent prefix entirely and duplicate general information, such as mappings and publications, across each subprefix. While this approach ensures each subprefix is independently comprehensive, it introduces a high amount of redundancy.

This PR implements the first approach, maintaining iedb as a parent prefix. The following chages were made:

General mappings such as fairsharing, integbio, re3data are retained under the parent, while subprefix-specific mappings (miriam, prefixcommons) are assigned to its related sematic subprefix.
Publications that apply broadly across IEDB are preserved under the parent prefix. Any future publications relevant only to specific semantic spaces could be added to the appropriate subprefix as needed.
iedb.antigen is curated separately as a provider for UniProt and ChEBI

One drawback of this approach is that it is not clear what sematic space the parent perfix represents, however as the bioregistry continues to grow larger in the future, I felt that an approach that reduces overall redundancy would be more useful.

nagutm · 2024-10-31T15:22:50Z

Per #1217 (comment), I duplicated the publications entry in all subspaces

src/bioregistry/data/bioregistry.json

bgyori · 2024-11-02T14:08:44Z

The test failure is due to the fact that according to strategy 1 outlined by @nagutm above, the parent iedb entry is preserved but it doesn't come with an example which is required per the test. This all relates to the more general issue at #1222, whether parent entries that don't correspond to a specific semantic space (which would have a specific pattern, uri_format, example, etc.) should be a thing, or in a situation like this with iedb, the parent entry should be completely dissolved.

nagutm · 2024-11-11T14:33:27Z

After discussing with @bgyori we decided that strategy 2 would be better suited for the following reasons:

Clear semantic spaces: Each subprefix clearly defines a sematic space without any ambiguity as to what the parent prefix represents
Alignment with current testing standards: This method prevents us from having to modify current established testing standards that require certain fields to be present for a new prefix such as example

codecov · 2024-11-14T05:24:14Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 43.45%. Comparing base (8950e70) to head (de79ace).
Report is 146 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1238      +/-   ##
==========================================
+ Coverage   42.51%   43.45%   +0.94%     
==========================================
  Files         117      118       +1     
  Lines        8327     8257      -70     
  Branches     1963     1357     -606     
==========================================
+ Hits         3540     3588      +48     
+ Misses       4582     4501      -81     
+ Partials      205      168      -37

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Restructure iedb prefix

0fd1e79

bgyori changed the title ~~Extend iedb prefix to multiple sematic subspaces~~ Extend iedb prefix to multiple semantic subspaces Oct 31, 2024

Add publications for subspaces

faeb39e

cthoyt reviewed Nov 2, 2024

View reviewed changes

src/bioregistry/data/bioregistry.json Show resolved Hide resolved

nagutm mentioned this pull request Nov 5, 2024

Add epawaste subprefixes #1256

Merged

Remove parent prefix

cd0922e

Add prefix-specific names and reviewer info

de79ace

bgyori merged commit 8962331 into biopragmatics:main Nov 14, 2024
15 checks passed

bgyori mentioned this pull request Nov 27, 2024

Extend iedb to multiple semantic spaces #1204

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend `iedb` prefix to multiple semantic subspaces #1238

Extend `iedb` prefix to multiple semantic subspaces #1238

nagutm commented Oct 31, 2024

nagutm commented Oct 31, 2024

bgyori commented Nov 2, 2024

nagutm commented Nov 11, 2024

codecov bot commented Nov 14, 2024 •

edited

Loading

Extend iedb prefix to multiple semantic subspaces #1238

Extend iedb prefix to multiple semantic subspaces #1238

Conversation

nagutm commented Oct 31, 2024

nagutm commented Oct 31, 2024

bgyori commented Nov 2, 2024

nagutm commented Nov 11, 2024

codecov bot commented Nov 14, 2024 • edited Loading

Codecov Report

Extend `iedb` prefix to multiple semantic subspaces #1238

Extend `iedb` prefix to multiple semantic subspaces #1238

codecov bot commented Nov 14, 2024 •

edited

Loading