Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JOSS review reply draft #578

Closed
slayoo opened this issue Jul 6, 2021 · 1 comment
Closed

JOSS review reply draft #578

slayoo opened this issue Jul 6, 2021 · 1 comment

Comments

@slayoo
Copy link
Member

slayoo commented Jul 6, 2021

@piotrbartman @trontrytel @Golui @rlhycd @abulenok, here is a draft of the JOSS review reply, please provide any feedback. The two remaining TODOs are to improve docstring coverage at least in the physics folder and to add a section on how to contribute to the README file.


Let us also provide a point-by-point reply to Daniel Rothenberg's review.

First of all, we would like to express again our appreciation for the feedback. We have been working on addressing the points raised in the review, and a lot is already reflected in the recent 1.4 release of PySDM (https://github.com/atmos-cloud-sim-uj/PySDM/releases/tag/v1.4).

PySDM is a new open-source package for particle-resolved modeling of the physics and chemistry of cloud and aerosol microphysics. It provides high-performance reference implementations of state-of-the-science process models in a flexible, highly-configurable package that is relatively easy to manipulate and modify to suit a wide variety of cloud and aerosol modeling tasks. It also ships with a back-end for running certain processes on GPUs, although this functionality does not seem to be fully implemented for all configurations which a user may seek to leverage.

As correctly pointed out by the reviewer, at the time of the initial submission, the GPU support has been limited to the coalescence process. As of version 1.4, this is extended to all but aqueous chemistry processes, in particular supporting condensational growth/evaporation of particles (yet without adaptive timestepping which features logic less applicable in the GPU SIMT context). Parcel and single-column examples work OK on the GPU, and testing of the GPU code (through the FakeUnits mocking layer) is part of the Continuous Integration workflow.

Overall this is a high quality and valuable contribution to the field. Many researchers leverage simple box and parcel models for critical work into fundamental aerosol/cloud microphysics, and PySDM provides extremely useful tooling for exploring this domain and for replicating the results of other researchers. However, the library is lacking in documentation, both formally (in clearly defining the complex structure of the objects which comprise the code and how they interact) and informally (through docstrings and other comments/annotations in the code which may allow users to better understand arguments to key functions and how to manipulate and extend them). At present, the authors provide a nice "Hello world" example with some prose on how to configure it, as well as a basic section on package structure and API. But a more comprehensive listing of the arguments, attributes, and parameters that the user should know about in order to configure or extend this example would go a very long way to promote practical usability.

The steps taken so far to improve the documentation include:

  • deployment of pdoc-generated API manual to GitHub pages at https://atmos-cloud-sim-uj.github.io/PySDM/;
  • addition of a parcel-framework condensation-only example to the README file (with code snippets in Matlab, Julia and Python, as for the earlier available coalescence-only example);
  • cleanup in docstrings in numerous files in the project.

I would encourage the authors to improve or augment this documentation before the final publication of this software paper. However, in all other aspects I find this to be a comprehensive and uniquely valuable software contribution to the cloud and aerosol microphysical modeling community and an ideal candidate for publication in JOSS.

Extending the documentation is going to be a priority for the upcoming releases and the growing user-base will help us to fulfill this commitment.

Please note the handful of detailed comments below on specific changes necessary to meet JOSS standards.
General Checks

Two authors on the paper (Grzegorz Lazarski and Aleksandra Talar) do not seem to appear as contributors to the repo and aren't listed in "Author Contributions." I don't really believe there is any issue here - for long-incubated software projects like this with lots of research interfaces there have to be many contributors over time! But I do want to note this for the editors.

As pointed out above (openjournals/joss-reviews#3219 (comment)), contributions of both Grzegorz Łazarski and Aleksandra Talar can be traced back on github, yet the commit authorship was partly lost as a part of refactors when handling their contributions.

Functionality

A simple install recipe using PyPI is provided, but given the capability to run on the GPU a conda-forge recipe or something which can help readers understand requirements - or provide a turnkey installation with necessary dependencies - would be very helpful! (Issue 494)

We are actively working on addressing it:

Installation using the documented instructions misses out on installing pystrict, a required dependency (Issue/PR 495)

As elaborated on in the issue (#495):

  • the single pystrict use was removed;
  • in order to more clearly distinguish between run-time and test-time requirements (listed in setup.py and requirements.txt file, respectively), the requirements.txt file was renamed to test-time-requirements.txt;
  • package installation via pip on a fresh system was added to the CI workflow to detect such issues automatically.
    Thank you for pointing this out!

Documentation

The automated tests seem to work out of the box, but the smoke tests seem to take a very long time to run. Even one of the unit tests (namely attributes/chemistry/test_pH.py) took a disproportionate amount of time compared to the others. Please note that I was not able to run the GPU tests as I did not have immediate access to a GPU-powered machine.

The aqueous chemistry code has received a major refactor and cleanup as a part of developments leading to v1.4. The timing of test_pH.py has improved. Part of resource usage in test_pH.py is taken by chempy - an external package used from within the test to compare PySDM results with an independent code.

While there is extensive, excellent demonstration and example code available from the authors in related, clearly-linked repositories, the API documentation leaves much to be desired. There is a high degree of customizability and configuration exposed to the user, but very little in the way of curated documentation for the APIs and data structures in the software, or docstrings and annotations. As a motivating example, consider the spectra module; in this module, a nice base class for building probability distributions is exposed, but there are no docstrings describing the required and optional arguments for either the base class or any of the sub-classes which implement it. An effort to improve the documentation with basic docstrings for all the core methods and functionality would greatly improve the quality of the software and the user experience when adapting it for a specialized research task. Ideally, a simple Sphinx-based documentation for the key API and configurations exposed to users would follow very plainly from such an effort, greatly improving the ease of access of the software.

Besides following the suggestion of using a Sphinx-like tool (pdoc3 in this case) and setting up automatic deployment of up-to-date API docs on Github pages as pointed above, we have also reorganised the physics-related code (which is likely to be the first a user would intend to study and extend) and grouped it in the physics directory/subpackage with the following structure:

    PySDM.physics.aqueous_chemistry
    PySDM.physics.coalescence_kernels
    PySDM.physics.condensation_coordinate
    PySDM.physics.constants
    PySDM.physics.diffusion_kinetics
    PySDM.physics.diffusion_thermics
    PySDM.physics.dimensional_analysis
    PySDM.physics.drop_growth
    PySDM.physics.formulae
    PySDM.physics.hydrostatics
    PySDM.physics.hygroscopicity
    PySDM.physics.impl
    PySDM.physics.latent_heat
    PySDM.physics.particle_advection
    PySDM.physics.saturation_vapour_pressure
    PySDM.physics.spectra
    PySDM.physics.state_variable_triplet
    PySDM.physics.surface_tension
    PySDM.physics.terminal_velocity
    PySDM.physics.trivia
    PySDM.physics.ventilation

TODO

Community / contribution guidelines are not provided with the repo; the JOSS reviewer guidelines suggest that such documentattion should clearly indicate how third-parties may contribute to the software, report issues, and seek support should they need to do so.

TODO

Software Paper

The software paper does not include a section which explicitly explains a "Statement of Need." However, I do believe that the prose in the paper covers this topic with the necessary detail. I will defer to the editors whether a stand-alone "Statement of Need" section is required to meet JOSS guidelines.

While the "Statement of Need" section title is not used, the Introduction section was meant to cover it.

It's much appreciated that the authors include links to open source software which attempts to tackle subsets of the broad functionality that PySDM provides. I would strongly encourage the author to include a brief discussion of each package (particularly pyrcel and PyBox) and compare/contrast the unique or improved functionalities of PySDM relative to each. My take (as the author of one of those libraries, pyrcel!) is that PySDM offers a substantial improvement over each in terms of the breadth of calculations and science/research that can be performed with it as well as the cleanliness of its interface and design for future improvements and extensions. These are all very positive things for PySDM and I would strongly encourage the authors to lean into the fantastic work here to promote the capabilities and utility of their library relative to the competition!

While wholeheartedly agreeing that a comparison of features of different parcel models available in the Python ecosystem would be of great value for the community, we argue that it falls out of scope of the PySDM JOSS paper for three reasons:

  • none of the author on the paper has experience with pyrcel or PyBox,
  • PySDM focus, as the name suggests, is on the Monte-Carlo representation of coalescence (condensation/evaporation are of course of equal importance to use it for modelling clouds, yet from the perspective of paper composition, a tour of features of other packages should certainly cover tools beyond parcel-condensation models),
  • the paper already exceeds JOSS suggested length of 250-1000 words.

We thus suggest to follow up together with the other package developers and carry out such comparison together.

Last but not least, let me note that we have added Oleksii Bulenok (@abulenok) to the paper author list. Oleksii has worked on the export mechanisms in PySDM and his VTK exporter PR has been merged before the v1.4 release offering the PySDM users to smoothly save and view multi-dimensional simulation state in such tools as Paraview. This is now mentioned in the text.
Let me thus ask the editor for confirmation of consent to extend the author list.

@slayoo slayoo changed the title JOSS review reply JOSS review reply draft Jul 6, 2021
@slayoo
Copy link
Member Author

slayoo commented Jul 7, 2021

with no comments here, I've just posted the reply in the JOSS review issue

@slayoo slayoo closed this as completed Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant