-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enh/manage groups #66
Conversation
When open_ncml is called with group="*", every group will be read and they will be flatten in the resulting dataset. If names are conflicting, the dimensions and varaibles names are appended with __n where n is the number of existing simlar names.
Converting PR to draft as I discovered flatten is not working properly at the moment. |
xncml/parser.py
Outdated
@@ -579,7 +675,8 @@ def build_scalar_variable(var_name: str, values_tag: Values, var_type: str) -> x | |||
' <values> is empty. Provide a single values within <values></values>' | |||
' to preserve the type.' | |||
) | |||
return xr.Variable(data=None, dims=()) | |||
default_value = nctype(var_type)() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed (again) the behavior of empty scalar parsing here.
For context, here the ncml describe a scalar with a certain type but without providing any value. We would ideally like to create a placeholder for a scalar variable with a numpy type.
But numpy doesn't allow this for scalar, only with an ndarray we can create a typed empty array of a certain shape.
I first though it would be better to loose the type and create a empty scalar with the value None
but not having this type can mess with subsequent processing.
Now I think it's better to preserve the dtype and fill the scalar with the default value of this dtype.
I would appreciate comments/suggestions here.
- improved performances by parsing ncml reprsentation only once - fixed issues of rewritting content when read multiple times
Co-authored-by: David Huard <[email protected]>
This PR fixes #64
group
argument toopen_ncml
'/'
is read.The above is similar to xarray's
open_dataset
.In addition, using
group='*'
, it flattens every group into a single datasets (somewhat preserving the current behavior).The names conflicts between groups are solved by appending an incrementing
__n
to the variable names, where n is a number.Plus, an attribute
group_path
is added to variables in order to retrieve their original path once they have been flatten.This can be useful to recreate the original structure.