Saving model output to NetCDF #176

stijnvanhoey · 2020-10-26T14:13:12Z

@mrollier as discussed in the meeting, we had a check on the saving of the model outputs to disk. In the current model setup, there is only one attribute, which is the dictionary of parameters:

> out.attrs.keys()
dict_keys(['parameters'])

So instead of a nested dictionary (parameter dict inside the attributes which is also a dict), we could use each of the parameters as a a separate attribute. When doing so:

> out.attrs = out.attrs["parameters"]
> out.to_netcdf("./test.h5")
ValueError: multi-dimensional array attributes not supported

there is still an error. So, the N-D arrays should all be flattened to support saving them as h5/netcdf files. This is something you could do in a one-liner using dict-comprehension:

out.attrs = { key: (value.flatten() if isinstance(value, np.ndarray) else value) for key, value in out.attrs["parameters"].items() }
out.to_netcdf("./test.h5")

I would maybe also add an additional attribute with the original shape (dimension size) stratification_dimension_... to support re-creation of the original ndarray.

zarr

zarr, https://zarr.readthedocs.io/en/stable/, actually supports the usage of N-D arrays in the attributes.

out=model.sim(100)
out.attrs = out.attrs["parameters"]
out.to_zarr("./test.zarr")

will also work.

usage of groups

When saving scenario's (both zarr and h5/netxdf) you can use the concepts of groups to save all the data in a 'single' file (which is still a directory for zarr):

model_scenario_dbase = "../data/interim/tmodel_outputs.zarr"

# scenario_1 with certian settings, parameters,...
out_1 = model.sim(100)
out_1.attrs = out_1.attrs["parameters"]
out_1.to_zarr(model_scenario_dbase, group="20201024_scenario_1")

# scenario_2 with other settings, parameters,...
out_2 = model.sim(100)
out_2.attrs = out_2.attrs["parameters"]
out_2.to_zarr(model_scenario_dbase, group="20201024_scenario_2")

And each group can contain it's own set of attributes.

If you would work with a new 'scenario' dimension, you would loose the attributes, i.e. parameter values, when concatenating along the 'scenario' dimension limiting the reproducibility of it, e.g. :

# scenario_1 with certian settings, parameters,...
out_1 = model.sim(100)
out_1.attrs = out_1.attrs["parameters"]
out_1 = out_1.assign_coords(scenario="20201024_scenario_1")
#out_1.to_zarr("./tmodel_outputs.zarr", group="20201024_scenario_3")

# scenario_2 with other settings, parameters,...
out_2 = model.sim(100)
out_2.attrs = out_2.attrs["parameters"]
out_2 = out_2.assign_coords(scenario="20201024_scenario_2")
out_2.attrs["da"] = 42
result = xr.concat([out_1, out_2], dim="scenario")
result.attrs["da"]  # -> return 7.0;  result only has the attributes of out_1

@mrollier have a check if this helps you. I would propose to:

adjust the code in order to have the parameters as individual attributes, making out_1.attrs = out_1.attrs["parameters"] not required anymore
use zarr with the group concept to save your runs

@twallema, @JennaVergeynst and @jorisvandenbossche are there other elements stored in the attributes apart from the parameters? Would it be an option to have each of the parameter stored as a separate key/value pair in the attributes instead of using a 'parameters' dict?

The text was updated successfully, but these errors were encountered:

twallema · 2020-10-26T14:24:21Z

@stijnvanhoey No attributes that I know of. You can store each of the parameters as a separate key/value pair, np.

mrollier · 2020-10-27T09:43:14Z

Hi @stijnvanhoey , thanks for the elaborate explanation! I think I'll follow your 2-step advice, I agree that losing the "parameters" name (i.e. going "up one level") seems OK, and grouping together results is also useful. Introducing a new dimension is probably overkill.

Question: is it easy to browse through the various groups when they are saved, or should I 'remember' the names I gave to these groups? Can an attribute of a group be a description (long string)?

stijnvanhoey · 2020-10-27T09:56:36Z

As zarr is actually a folder structure on itself, you can still navigate to the folder in your file explorer and the groups will be the main level, e.g.

└── test.zarr
    ├── 20201024_scenario_1
    │   ├── ...
    │   └── ...
    ├── 20201024_scenario_2
    │   ├── ...
    │   └── ...
    └── 20201024_scenario_3
    │   ├── ...
    │   └── ...

Can an attribute of a group be a description (long string)?

That should be ok.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving model output to NetCDF #176

Saving model output to NetCDF #176

stijnvanhoey commented Oct 26, 2020

twallema commented Oct 26, 2020

mrollier commented Oct 27, 2020

stijnvanhoey commented Oct 27, 2020 •

edited

Loading

Saving model output to NetCDF #176

Saving model output to NetCDF #176

Comments

stijnvanhoey commented Oct 26, 2020

twallema commented Oct 26, 2020

mrollier commented Oct 27, 2020

stijnvanhoey commented Oct 27, 2020 • edited Loading

stijnvanhoey commented Oct 27, 2020 •

edited

Loading