Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enh/manage groups #66

Merged
merged 13 commits into from
Feb 6, 2024
Merged

Enh/manage groups #66

merged 13 commits into from
Feb 6, 2024

Conversation

bzah
Copy link
Collaborator

@bzah bzah commented Jan 16, 2024

  • Include documentation when adding new features.
  • Include new tests or update existing tests when applicable.

This PR fixes #64

  • This adds an optional group argument to open_ncml
  • When present, only the given group of the ncml will be read.
  • When absent, the root group '/' is read.
    The above is similar to xarray's open_dataset.

In addition, using group='*', it flattens every group into a single datasets (somewhat preserving the current behavior).
The names conflicts between groups are solved by appending an incrementing __n to the variable names, where n is a number.
Plus, an attribute group_path is added to variables in order to retrieve their original path once they have been flatten.
This can be useful to recreate the original structure.

Abel Aoun added 5 commits January 11, 2024 20:01
When open_ncml is called with group="*", every group will be read
and they will be flatten in the resulting dataset.
If names are conflicting, the dimensions and varaibles names are
appended with __n where n is the number of existing simlar names.
@bzah bzah marked this pull request as draft January 16, 2024 15:22
@bzah
Copy link
Collaborator Author

bzah commented Jan 16, 2024

Converting PR to draft as I discovered flatten is not working properly at the moment.

xncml/parser.py Outdated
@@ -579,7 +675,8 @@ def build_scalar_variable(var_name: str, values_tag: Values, var_type: str) -> x
' <values> is empty. Provide a single values within <values></values>'
' to preserve the type.'
)
return xr.Variable(data=None, dims=())
default_value = nctype(var_type)()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed (again) the behavior of empty scalar parsing here.
For context, here the ncml describe a scalar with a certain type but without providing any value. We would ideally like to create a placeholder for a scalar variable with a numpy type.
But numpy doesn't allow this for scalar, only with an ndarray we can create a typed empty array of a certain shape.

I first though it would be better to loose the type and create a empty scalar with the value None but not having this type can mess with subsequent processing.
Now I think it's better to preserve the dtype and fill the scalar with the default value of this dtype.

I would appreciate comments/suggestions here.

Abel Aoun added 3 commits February 2, 2024 10:23
- improved performances by parsing ncml reprsentation only once
- fixed issues of rewritting content when read multiple times
@bzah bzah marked this pull request as ready for review February 5, 2024 12:37
@bzah bzah requested a review from huard February 5, 2024 12:45
xncml/parser.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
xncml/parser.py Outdated Show resolved Hide resolved
xncml/parser.py Outdated Show resolved Hide resolved
xncml/parser.py Outdated Show resolved Hide resolved
xncml/parser.py Outdated Show resolved Hide resolved
xncml/parser.py Outdated Show resolved Hide resolved
xncml/parser.py Outdated Show resolved Hide resolved
@bzah bzah merged commit dc7ab52 into main Feb 6, 2024
6 checks passed
@bzah bzah deleted the enh/manage_groups branch February 6, 2024 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Create a Datatree instead of a Dataset when there are groups
2 participants