Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to normalize wrapped objects #138

Merged
merged 13 commits into from
Jun 18, 2024
Merged

Add the ability to normalize wrapped objects #138

merged 13 commits into from
Jun 18, 2024

Conversation

gaurav
Copy link
Member

@gaurav gaurav commented Jan 17, 2024

Some wrapped objects -- in particular TaxonomicUnitWrappers, TaxonConceptWrappers and TaxonNameWrappers -- may have multiple representations that result in identical definitions -- for example, a TaxonNameWrapper may wrap a taxon name in the form {"nameComplete": "Caiman crocodilus", ...} while another represents this as {"nameComplete": "Caiman crocodilus", "genus": "Caiman", "specificEpithet": "crocodilus", ...}. This is the underlying cause of phyloref/klados#263.

While we could fix this specifically for the three wrappers listed above in Klados, it seems like a better idea to add static methods for PhyxWrapper.normalize(phyxDocument), which could recursively normalize its subcomponents. This would keep the normalize code next to their wrappers, which will make it easier to update in the future if/when the Phyx format changes. It also adds a test for normalizing phylorefs with multiple normalization files -- every phyloreference that ends with _same is expected to be different but to normalize to the same phyloreference, while every phyloreference that ends with _different is expected to normalize to a different phyloreference. Phyx files and Phylogenies are not currently tested, because they don't cause phyloref/klados#263 and so are not a priority for us right now. I've filed an issue to write those tests as #140.

Also includes a few minor changes needed to make this PR work and pass testing:

  • There is a known issue with Node 21 and 22 and esm (see regression in 21.4.0 when using esm package nodejs/node#51081), so I've added a .node-version file to indicate that we should use Node 20 until that issue is fixed.
  • Includes fixes for some typos in TaxonConceptWrapper.js.
  • Increases the JPhyloRef version from 0.4.0 to 1.1.1 to support more modern versions of Java.
  • The normalization checks are in the test/examples/correct directory, which causes a discrepancy in the expected and actual count of the phyx2owl.js check. This can be fixed by calculating the expected file count with the recursion flag.

@gaurav gaurav requested a review from hlapp May 8, 2024 05:56
@gaurav gaurav marked this pull request as ready for review May 8, 2024 05:56
@gaurav gaurav requested review from hlapp and removed request for hlapp June 3, 2024 21:16
Copy link
Member

@hlapp hlapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay. Looks fine to me. The one hesitation that comes to mind is that with enumerating the properties in creating the "normalized" object every added property will have be replicated in the normalization code too. But perhaps that's for now just unavoidable? And if someone were to forget to add a new property to the normalization code, would the test(s) catch it?

@gaurav
Copy link
Member Author

gaurav commented Jun 18, 2024

Sorry for the long delay. Looks fine to me. The one hesitation that comes to mind is that with enumerating the properties in creating the "normalized" object every added property will have be replicated in the normalization code too. But perhaps that's for now just unavoidable? And if someone were to forget to add a new property to the normalization code, would the test(s) catch it?

Hm, that's a good point. There are two kinds of normalization code here: those that keep all un-normalized properties in the normalized object, and those that only use a specific subset of properties to generate the normalized object. I have some ideas on testing these that I've just written up (#141). If I understand your comment correctly, we're okay with the first category (although a few explicit tests would be nice), but the second category needs some additional tests to make sure we handle the use-case where someone adds a new property to, say, SpecimenWrapper (which we plan to do for UUIDs, see phyloref/klados#320). I'm not sure if this will catch the case where we add a new property, but it will make it explicit in the tests which properties we preserve -- would that address your question? Are there other tests we need to add?

@gaurav gaurav merged commit abed61c into master Jun 18, 2024
4 checks passed
@gaurav gaurav deleted the add-normalize branch June 18, 2024 06:07
@gaurav gaurav mentioned this pull request Jun 19, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants