Skip to content

Commit

Permalink
Merge pull request #331 from ImperialCollegeLondon/271-move-pyrealm_b…
Browse files Browse the repository at this point in the history
…uild_data-documentation-into-an-autodoc-style-setup

Move `pyrealm_build_data` documentation into an autodoc style setup
  • Loading branch information
davidorme authored Oct 17, 2024
2 parents f2380be + 1d5fc6f commit 396c553
Show file tree
Hide file tree
Showing 13 changed files with 327 additions and 314 deletions.
178 changes: 66 additions & 112 deletions docs/source/development/pyrealm_build_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,138 +22,92 @@ language_info:
version: 3.11.9
---

# The `pyrealm_build_data` package

The `pyrealm` repository includes both the `pyrealm` package and the
`pyrealm_build_data` package. The `pyrealm_build_data` package contains datasets that
are used in the `pyrealm` build and testing process. This includes:

* Example datasets that are used in the package documentation, such as simple spatial
datasets for showing the use of the P Model.
* "Golden" datasets for regression testing `pyrealm` implementations against the outputs
of other implementations. These datasets will include a set of input data and then
output predictions from other implementations.
* Datasets for providing profiling of `pyrealm` code and for benchmarking new versions
of the package code against earlier implementations to check for performance issues.

Note that `pyrealm_build_data` is a source distribution only (`sdist`) component of
`pyrealm`, so is not included in binary distributions (`wheel`) that are typically
installed by end users. This means that files in `pyrealm_build_data` are not available
if a user has simply used `pip install pyrealm`: please *do not* use
`pyrealm_build_data` within the main `pyrealm` code.

## Package contents

The package is organised into submodules that reflect the data use or previous
implementation.

### The `bigleaf` submodule

This submodule contains benchmark outputs from the `bigleaf` package in `R`, which has
been used as the basis for core hygrometry functions. The `bigleaf_conversions.R` R
script runs a set of test values through `bigleaf`. The first part of the file prints
out some simple test values that have been used in package doctests and then the second
part of the file generates more complex benchmarking inputs that are saved, along with
`bigleaf` outputs as `bigleaf_test_values.json`.

Running `bigleaf_conversions.R` requires an installation of R along with the `jsonlite`
and `bigleaf` packages, and the script can then be run from within the submodule folder
as:

```sh
Rscript bigleaf_conversions.R
```

### The `rpmodel` submodule

This submodule contains benchmark outputs from the `rpmodel` package in `R`, which has
been used as the basis for initial development of the standard P Model.

#### Test inputs
# The {mod}`~pyrealm_build_data` module

The `generate_test_inputs.py` file defines a set of constants for running P Model
calculations and then defines a set of scalar and array inputs for the forcing variables
required to run the P Model. The array inputs are set of 100 values sampled randomly
across the ranges of plausible forcing value inputs in order to benchmark the
calculations of the P Model implementation. All of these values are stored in the
`test_inputs.json` file.

It requires `python` and the `numpy` package and can be run as:

```sh
python generate_test_inputs.py
```{eval-rst}
.. automodule:: pyrealm_build_data
:autosummary:
:members:
:special-members: __init__
```

#### Simple `rpmodel` benchmarking

The `test_outputs_rpmodel.R` contains R code to run the test input data set, and store
the expected predictions from the `rpmodel` package as `test_outputs_rpmodel.json`. It
requires an installation of `R` and the `rpmodel` package and can be run as:
## The `bigleaf` submodule

```sh
Rscript test_outputs_rpmodel.R
```{eval-rst}
.. automodule:: pyrealm_build_data.bigleaf
:autosummary:
:members:
:special-members: __init__
```

#### Global array test
## The `community` submodule

The remaining files in the submodule are intended to provide a global test dataset for
benchmarking the use of `rpmodel` on a global time-series, so using 3 dimensional arrays
with latitude, longitude and time coordinates. It is currently not used in testing
because of issues with the `rpmodel` package in version 1.2.0. It may also be replaced
in testing with the `uk_data` submodule, which is used as an example dataset in the
documentation.
```{eval-rst}
.. automodule:: pyrealm_build_data.community
:autosummary:
:members:
:special-members: __init__
```

The files are:
## The `rpmodel` submodule

* pmodel_global.nc: An input global NetCDF file containing forcing variables at 0.5°
spatial resolution and for two time steps.
* test_global_array.R: An R script to run `rpmodel` using the dataset.
* rpmodel_global_gpp_do_ftkphio.nc: A NetCDF file containing `rpmodel` predictions using
corrections for temperature effects on the `kphio` parameter.
* rpmodel_global_gpp_no_ftkphio.nc: A NetCDF file containing `rpmodel` predictions with
fixed `kphio`.
```{eval-rst}
.. automodule:: pyrealm_build_data.rpmodel
:autosummary:
:members:
:special-members: __init__
```

To generate the predicted outputs again requires an R installation with the `rpmodel`
package:
## The `sandoval_kphio` submodule

```sh
Rscript test_global_array.R
```{eval-rst}
.. automodule:: pyrealm_build_data.sandoval_kphio
:autosummary:
:members:
:special-members: __init__
```

### The `subdaily` submodule
## The `splash` submodule

At present, this submodule only contains a single file containing the predictions for
the `BE_Vie` fluxnet site from the original implementation of the `subdaily` module,
published in {cite}`mengoli:2022a`. Generating these predictions requires an
installation of R and then code from the following repository:
```{eval-rst}
.. automodule:: pyrealm_build_data.splash
:autosummary:
:members:
:special-members: __init__
```

[https://github.com/GiuliaMengoli/P-model_subDaily](https://github.com/GiuliaMengoli/P-model_subDaily)
## The `subdaily` submodule

TODO - This submodule should be updated to include the required code along with the
settings files and a runner script to reproduce this code. Or possibly to checkout the
required code as part of a shell script.
```{eval-rst}
.. automodule:: pyrealm_build_data.subdaily
:autosummary:
:members:
:special-members: __init__
```

### The `t_model` submodule
## The `t_model` submodule

The `t_model.r` contains the original implementation of the T Model calculations in R
{cite:p}`Li:2014bc`. The `rtmodel_test_outputs.r` script sources this file and then
generates some simple bencmarking predictions, which are saved as `rtmodel_output.csv`.
```{eval-rst}
.. automodule:: pyrealm_build_data.t_model
:autosummary:
:members:
:special-members: __init__
```

To generate the predicted outputs again requires an R installation
## The `two_leaf` submodule

```sh
Rscript rtmodel_test_outputs.r
```{eval-rst}
.. automodule:: pyrealm_build_data.two_leaf
:autosummary:
:members:
:special-members: __init__
```

### The `uk_data` submodule

This submodule contains the Python script `create_2D_uk_inputs.py`, which is used to
generate the NetCDF output file `UK_WFDE5_FAPAR_2018_JuneJuly.nc`. This contains P Model
forcings for the United Kingdom at 0.5° spatial resolution and hourly temporal
resolution over 2 months (1464 temporal observations). It is used for demonstrating the
use of the subdaily P Model.
## The `uk_data` submodule

The script is currently written with a hard-coded set of paths to key source data - the
WFDE5 v2 climate data and a separate source of interpolated hourly fAPAR. This should
probably be rewritten to generate reproducible content from publically available sources
of these datasets.
```{eval-rst}
.. automodule:: pyrealm_build_data.uk_data
:autosummary:
:members:
:special-members: __init__
```
25 changes: 21 additions & 4 deletions pyrealm_build_data/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,21 @@
"""The pyrealm_build_data package is an sdist only package used to store build data
shared between the docs and testing. Making it a package allows it to be accessed using
importlib.resources().
""" # noqa: D205
"""The ``pyrealm`` repository includes both the ``pyrealm`` package and the
``pyrealm_build_data`` package. The ``pyrealm_build_data`` package contains datasets
that are used in the ``pyrealm`` build and testing process. This includes:
* Example datasets that are used in the package documentation, such as simple spatial
datasets for showing the use of the P Model.
* "Golden" datasets for regression testing ``pyrealm`` implementations against the
outputs of other implementations. These datasets will include a set of input data and
then output predictions from other implementations.
* Datasets for providing profiling of ``pyrealm`` code and for benchmarking new versions
of the package code against earlier implementations to check for performance issues.
The package is organised into submodules that reflect the data use or previous
implementation.
Note that ``pyrealm_build_data`` is a source distribution only (``sdist``) component of
``pyrealm``, so is not included in binary distributions (``wheel``) that are typically
installed by end users. This means that files in ``pyrealm_build_data`` are not
available if a user has simply used ``pip install pyrealm``: please *do not* use
``pyrealm_build_data`` within the main ``pyrealm`` code.
""" # noqa: D205, D415
17 changes: 16 additions & 1 deletion pyrealm_build_data/bigleaf/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,16 @@
"""Validation data from the bigleaf package in R."""
"""This submodule contains benchmark outputs from the ``bigleaf`` package in ``R``,
which has been used as the basis for core hygrometry functions. The
``bigleaf_conversions.R`` R script runs a set of test values through `bigleaf`. The
first part of the file prints out some simple test values that have been used in package
doctests and then the second part of the file generates more complex benchmarking inputs
that are saved, along with `bigleaf` outputs as `bigleaf_test_values.json`.
Running ``bigleaf_conversions.R`` requires an installation of ``R`` along with the
``jsonlite`` and ``bigleaf`` packages, and the script can then be run from within the
submodule folder as:
.. code:: sh
Rscript bigleaf_conversions.R
""" # noqa: D205
5 changes: 5 additions & 0 deletions pyrealm_build_data/community/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""The :mod:`pyrealm_build_data.community` submodule provides a set of input files for
the :mod:`pyrealm.demography` module that are used both in unit testing for the module
and as inputs for generating documentation of the module. The files provide definitions
of plant functional types and plant communities in a range of formats.
""" # noqa: D205
59 changes: 58 additions & 1 deletion pyrealm_build_data/rpmodel/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,58 @@
"""Validation data from the rpmodel package in R."""
"""This submodule contains benchmark outputs from the ``rpmodel`` package in ``R``,
which has been used as the basis for initial development of the standard P Model.
Test inputs
===========
The ``generate_test_inputs.py`` file defines a set of constants for running P Model
calculations and then defines a set of scalar and array inputs for the forcing variables
required to run the P Model. The array inputs are set of 100 values sampled randomly
across the ranges of plausible forcing value inputs in order to benchmark the
calculations of the P Model implementation. All of these values are stored in the
``test_inputs.json`` file.
It requires ``python`` and the ``numpy`` package and can be run as:
.. code:: sh
python generate_test_inputs.py
Simple `rpmodel` benchmarking
=============================
The ``test_outputs_rpmodel.R`` contains R code to run the test input data set, and store
the expected predictions from the ``rpmodel`` package as ``test_outputs_rpmodel.json``.
It requires an installation of ``R`` and the ``rpmodel`` package and can be run as:
.. code:: sh
Rscript test_outputs_rpmodel.R
Global array test
=================
The remaining files in the submodule are intended to provide a global test dataset for
benchmarking the use of ``rpmodel`` on a global time-series, so using 3 dimensional
arrays with latitude, longitude and time coordinates. It is currently not used in
testing because of issues with the ``rpmodel`` package in version 1.2.0. It may also be
replaced in testing with the ``uk_data`` submodule, which is used as an example dataset
in the documentation.
The files are:
* ``pmodel_global.nc``: An input global NetCDF file containing forcing variables at 0.5°
spatial resolution and for two time steps.
* ``test_global_array.R``: An R script to run ``rpmodel`` using the dataset.
* ``rpmodel_global_gpp_do_ftkphio.nc``: A NetCDF file containing ``rpmodel`` predictions
using corrections for temperature effects on the `kphio` parameter.
* ``rpmodel_global_gpp_no_ftkphio.nc``: A NetCDF file containing ``rpmodel`` predictions
with fixed ``kphio``.
To generate the predicted outputs again requires an R installation with the ``rpmodel``
package:
.. code:: sh
Rscript test_global_array.R
""" # noqa: D205
14 changes: 14 additions & 0 deletions pyrealm_build_data/sandoval_kphio/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
r"""This submodule contains benchmark outputs from the ``calc_phi0.R`` script, which is
an experimental approach to calculating the :math:`\phi_0` parameter for the P Model
with modulation from climatic aridity and growing degree days and the current
temperature. The calculation is implemented in ``pyrealm`` as
:class:`~pyrealm.pmodel.quantum_yield.QuantumYieldSandoval`.
The files are:
* ``calc_phi0.R``: The original implementation and parameterisation.
* ``create_test_inputs.R``: A script to run the original implementation with a range of
inputs and save a file of test values.
* ``sandoval_kphio.csv``: The resulting test values.
""" # noqa: D205
Loading

0 comments on commit 396c553

Please sign in to comment.