Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving model output to NetCDF #176

Open
stijnvanhoey opened this issue Oct 26, 2020 · 3 comments
Open

Saving model output to NetCDF #176

stijnvanhoey opened this issue Oct 26, 2020 · 3 comments

Comments

@stijnvanhoey
Copy link
Contributor

@mrollier as discussed in the meeting, we had a check on the saving of the model outputs to disk. In the current model setup, there is only one attribute, which is the dictionary of parameters:

> out.attrs.keys()
dict_keys(['parameters'])

So instead of a nested dictionary (parameter dict inside the attributes which is also a dict), we could use each of the parameters as a a separate attribute. When doing so:

> out.attrs = out.attrs["parameters"]
> out.to_netcdf("./test.h5")
ValueError: multi-dimensional array attributes not supported

there is still an error. So, the N-D arrays should all be flattened to support saving them as h5/netcdf files. This is something you could do in a one-liner using dict-comprehension:

out.attrs = { key: (value.flatten() if isinstance(value, np.ndarray) else value) for key, value in out.attrs["parameters"].items() }
out.to_netcdf("./test.h5")

I would maybe also add an additional attribute with the original shape (dimension size) stratification_dimension_... to support re-creation of the original ndarray.

zarr

zarr, https://zarr.readthedocs.io/en/stable/, actually supports the usage of N-D arrays in the attributes.

out=model.sim(100)
out.attrs = out.attrs["parameters"]
out.to_zarr("./test.zarr")

will also work.

usage of groups

When saving scenario's (both zarr and h5/netxdf) you can use the concepts of groups to save all the data in a 'single' file (which is still a directory for zarr):

model_scenario_dbase = "../data/interim/tmodel_outputs.zarr"

# scenario_1 with certian settings, parameters,...
out_1 = model.sim(100)
out_1.attrs = out_1.attrs["parameters"]
out_1.to_zarr(model_scenario_dbase, group="20201024_scenario_1")

# scenario_2 with other settings, parameters,...
out_2 = model.sim(100)
out_2.attrs = out_2.attrs["parameters"]
out_2.to_zarr(model_scenario_dbase, group="20201024_scenario_2")

And each group can contain it's own set of attributes.

If you would work with a new 'scenario' dimension, you would loose the attributes, i.e. parameter values, when concatenating along the 'scenario' dimension limiting the reproducibility of it, e.g. :

# scenario_1 with certian settings, parameters,...
out_1 = model.sim(100)
out_1.attrs = out_1.attrs["parameters"]
out_1 = out_1.assign_coords(scenario="20201024_scenario_1")
#out_1.to_zarr("./tmodel_outputs.zarr", group="20201024_scenario_3")

# scenario_2 with other settings, parameters,...
out_2 = model.sim(100)
out_2.attrs = out_2.attrs["parameters"]
out_2 = out_2.assign_coords(scenario="20201024_scenario_2")
out_2.attrs["da"] = 42
result = xr.concat([out_1, out_2], dim="scenario")
result.attrs["da"]  # -> return 7.0;  result only has the attributes of out_1

@mrollier have a check if this helps you. I would propose to:

  • adjust the code in order to have the parameters as individual attributes, making out_1.attrs = out_1.attrs["parameters"] not required anymore
  • use zarr with the group concept to save your runs

@twallema, @JennaVergeynst and @jorisvandenbossche are there other elements stored in the attributes apart from the parameters? Would it be an option to have each of the parameter stored as a separate key/value pair in the attributes instead of using a 'parameters' dict?

@twallema
Copy link
Collaborator

@stijnvanhoey No attributes that I know of. You can store each of the parameters as a separate key/value pair, np.

@mrollier
Copy link
Collaborator

Hi @stijnvanhoey , thanks for the elaborate explanation! I think I'll follow your 2-step advice, I agree that losing the "parameters" name (i.e. going "up one level") seems OK, and grouping together results is also useful. Introducing a new dimension is probably overkill.

Question: is it easy to browse through the various groups when they are saved, or should I 'remember' the names I gave to these groups? Can an attribute of a group be a description (long string)?

@stijnvanhoey
Copy link
Contributor Author

stijnvanhoey commented Oct 27, 2020

As zarr is actually a folder structure on itself, you can still navigate to the folder in your file explorer and the groups will be the main level, e.g.

└── test.zarr
    ├── 20201024_scenario_1
    │   ├── ...
    │   └── ...
    ├── 20201024_scenario_2
    │   ├── ...
    │   └── ...
    └── 20201024_scenario_3
    │   ├── ...
    │   └── ...

Can an attribute of a group be a description (long string)?

That should be ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants