Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to save model.results directly #684

Closed
1 of 3 tasks
irm-codebase opened this issue Sep 11, 2024 · 8 comments
Closed
1 of 3 tasks

Unable to save model.results directly #684

irm-codebase opened this issue Sep 11, 2024 · 8 comments
Labels

Comments

@irm-codebase
Copy link
Contributor

What happened?

Problem

This is related to other problems (#619), but it's closer to an actual user case, so I'll document it here.

Due to our approach of saving everything in the .attrs of the xarray, we cannot directly save model results.
This is a reasonable user-case, as saving results is semantically intuitive.

Solution

Not saving dictionaries or arrays in the .attrs

Which operating systems have you used?

  • macOS
  • Windows
  • Linux

Version

v0.7.0.dev4

Relevant log output

model.results.to_netcdf(results_path / "results.nc")
Traceback (most recent call last):
  File "/home/ivanruizmanuel/Documents/git/euro-calliope-modular/build/model/run_save.py", line 14, in <module>
    model.results.to_netcdf(results_path / "results.nc")
  File "/home/ivanruizmanuel/miniforge3/envs/ec-model-v07/lib/python3.12/site-packages/xarray/core/dataset.py", line 2298, in to_netcdf
    return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ivanruizmanuel/miniforge3/envs/ec-model-v07/lib/python3.12/site-packages/xarray/backends/api.py", line 1292, in to_netcdf
    _validate_attrs(dataset, invalid_netcdf=invalid_netcdf and engine == "h5netcdf")
  File "/home/ivanruizmanuel/miniforge3/envs/ec-model-v07/lib/python3.12/site-packages/xarray/backends/api.py", line 203, in _validate_attrs
    check_attr(k, v, valid_types)
  File "/home/ivanruizmanuel/miniforge3/envs/ec-model-v07/lib/python3.12/site-packages/xarray/backends/api.py", line 195, in check_attr
    raise TypeError(
TypeError: Invalid value for attr 'scenario': None. For serialization to netCDF files, its value must be of one of the following types: str, Number, ndarray, number, list, tuple
@sjpfenninger
Copy link
Member

What is the use case that makes this a bug? In other words, why can't you use model.to_netcdf()?

@irm-codebase
Copy link
Contributor Author

irm-codebase commented Sep 12, 2024

In this case, I wanted to just save the results to have a smaller file size. I did save with model.to_netcdf().

If you have a stable model build and run many scenarios (e.g., SPORES), saving only the model results, and some metadata on the side, would reduce data use.

Generally, though, just saving the results or inputs makes parsing the data a bit easier. Depending on the mode, some things might be inputs or results (e.g., flow_cap in operate mode). Users might not have the math fully memorized, so just saving the results is a bit more 'human friendly'.

@brynpickering
Copy link
Member

Maybe this is why we should simply pickle our data - then it would be clear that you need to load it back in using calliope 😅.

But in all seriousness, our preferred method is that data is always loaded in using calliope (read_netcdf). This way, you automatically get to see what is an input/result, as they are attached as respective attributes of the model object. If we made it easy to save results to file then we'd only come across the opposite problem in future: users trying to load a netcdf file using calliope and then finding it doesn't contain any of the necessary metadata to actually instatiate the model object.

Generally, input data has a much smaller memory footprint than results. Still, in the use-case you have given (individual SPORES runs), it's clearly preferable not to save the input data every single time a new set of results is generated. How about calliope.to_netcdf(include_inputs=False)?

@irm-codebase
Copy link
Contributor Author

irm-codebase commented Sep 17, 2024

@brynpickering I would not pickle the data. Pickle files are not always translatable between python or library versions.
It would decrease stability in the long run 😅😅...

I'd actually prefer to just keep calliope.to_netcdf() without anything. Adding extra things would just decrease maintainability instead of fixing the root of the issue, which is that we are not following netCDF4 specifications.

I suppose it would introduce the chance of someone trying to re-load results... but that is user misconception, not calliope code breaking compatibility with data formats. I do not see it as a big loss, but I get the point.

Should I close this as 'not planned'?

@brynpickering
Copy link
Member

I'd actually prefer to just keep calliope.to_netcdf() without anything

Except you also want to add calliope.results.to_netcdf, right?

@irm-codebase
Copy link
Contributor Author

irm-codebase commented Sep 17, 2024

I meant that I'm ok with just using calliope.to_netcdf() if it means we do not sideline the spec issue with calliope.to_netcdf(include_inputs=False) 🙈.

I can get around it in other ways on my side, and if it has not popped up yet it means most people do not do it.

@irm-codebase
Copy link
Contributor Author

irm-codebase commented Sep 17, 2024

Just to clarify: we already have calliope.results.to_netcdf(), since results is an xarray.
So I was not really requesting us to add any new functions.

This issue was just about making sure using xarray functionality with it works when saving to netCDF.

@irm-codebase
Copy link
Contributor Author

I'm setting this as 'not planned'.
Perhaps the math cleanup turns this into a non-issue.

@irm-codebase irm-codebase closed this as not planned Won't fix, can't repro, duplicate, stale Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants