Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect net_cdf serialisation for top level attributes #614

Closed
1 of 3 tasks
irm-codebase opened this issue Jun 22, 2024 · 4 comments · Fixed by #627
Closed
1 of 3 tasks

Incorrect net_cdf serialisation for top level attributes #614

irm-codebase opened this issue Jun 22, 2024 · 4 comments · Fixed by #627
Assignees
Labels
bug v0.7 (upcoming) version 0.7

Comments

@irm-codebase
Copy link
Contributor

What happened?

When saving models into net_cdf files, our serialization algorithm will convert single list elements at the top level into strings.

Replication steps:

import calliope

model = calliope.examples.national_scale()

model_file = "outputs/model.nc"
out_file = "outputs/output.nc"

model._model_data.attrs["applied_math"] = ["a single value"]
model.to_netcdf(model_file)

test = calliope.read_netcdf(model_file)

print(model._model_data.attrs["applied_math"])

Here, "applied_math" will be a string, not a list.

Which operating systems have you used?

  • macOS
  • Windows
  • Linux

Version

v0.7

Relevant log output

No response

@irm-codebase irm-codebase changed the title Incorrect net_cdf serilaisation for top level attributes Incorrect net_cdf serialisation for top level attributes Jun 22, 2024
@irm-codebase irm-codebase added the v0.7 (upcoming) version 0.7 label Jun 22, 2024
@irm-codebase irm-codebase self-assigned this Jun 22, 2024
@irm-codebase
Copy link
Contributor Author

irm-codebase commented Jun 22, 2024

Did some extra digging around. The core of the issue is that the netCDF standard does not really consider attributes as anything more than single values per key (https://cfconventions.org/cf-conventions/cf-conventions.html#_attributes).

xarray does not support saving data in the attributes of datasets beyond single values either, which is why we have our own serialization.

I'm playing around with using pre-defined encoders for this, but so far the produced string is too big for xarray...
The encoding works perfectly fine, but the saving step does not.

Here's an example:

import codecs
import pickle

import calliope

model = calliope.examples.national_scale()
attrs_orignial = model._model_data.attrs.copy()

pickled_attrs = codecs.encode(pickle.dumps(attrs_orignial), "base64").decode()
model._model_data.attrs.clear()
model._model_data.attrs["encoded"] = pickled_attrs
unpickled_attrs = pickle.loads(codecs.decode(pickled_attrs.encode(), "base64"))

assert attrs_orignial == unpickled_attrs

@irm-codebase
Copy link
Contributor Author

irm-codebase commented Jun 22, 2024

xarray provides interfaces for other formats, and it seems like zarr fits our use-case the best: it fully separates data and metadata, and is able to represent the second. By comparison, netcdf is less flexible because of the limited typing of attrs (no dictionary or list support).

Saving a full calliope model is as easy as:

import calliope
import xarray

model = calliope.examples.national_scale()
model._model_data.to_netcdf("outputs/model.zarr")
test = xarray.open_zarr("outputs/model.zarr")

In this case, test contains the full model definition, with 0 errors.

It seems like zarr is newer (as in, has been around for almost a decade, against 30+ years for netcdf). It also comes from the geospatial / climate science field:

@brynpickering
Copy link
Member

https://help.marine.copernicus.eu/en/articles/8176692-how-to-choose-between-netcdf-and-zarr-format-using-the-toolbox

Summary of summary table: zarr is better than netcdf in every way 😄

netcdf storing single list elements as strings is a bit of a pain. We manage all the other formats that need converting before storing in io.py (dictionaries, sets, etc.). We could easily just add a serialiser for lists so that it catches single list items. We probably should.

Happy for you to add functionality to convert to .zarr and to deprecate saving to .nc. Don't remove it entirely as it should be phased out at a later point - just have a deprecation warning with a recommendation to use .zarr. And then we just add this serialiser for single element lists so that they are returned as single element lists.

@irm-codebase
Copy link
Contributor Author

I agree on keeping netCDF for a while, its too popular. I suspect our current serialisation will be enough once #619 is fixed, but it's a toss.

I'll come back to this one afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug v0.7 (upcoming) version 0.7
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants