Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix group kwarg #338

Merged
merged 21 commits into from
Dec 10, 2024
Merged

Fix group kwarg #338

merged 21 commits into from
Dec 10, 2024

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Dec 9, 2024

This also changes the expected behaviour of open_virtual_dataset(file, group=None) (i.e. the default value of the group kwarg) when multiple groups are encountered. It now just opens the root group instead of raising, which is more consistent with xr.open_dataset.

@TomNicholas TomNicholas added Kerchunk Relating to the kerchunk library / specification itself readers labels Dec 9, 2024
for key in refs.keys():
# has to capture "foo/.zarray", but ignore ".zgroup", ".zattrs", and "subgroup/bar/.zarray"
# TODO this might be a sign that we should introduce a KerchunkGroupRefs type and cut down the references before getting to this point...
if key not in (".zgroup", ".zattrs", ".zmetadata"):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@norlandrhagen the only reason I needed the ".zmetadata" key in the check here is to make one kerchunk parquet test pass. Is it really the case that kerchunk parquet references use different keys than kerchunk json references? Or is that just a mistake in the fake kerchunk parquet data that we create and use in

with open(tmp_path / "refs" / ".zmetadata") as f:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think we were relying on the parquet directory to have a .zmetadata file so that we could identify the directory containing parquet's as parquet. #278 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay let's merge then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I have another bug to fix first...

found_var_names = []
for key in refs.keys():
# has to capture "foo/.zarray", but ignore ".zgroup", ".zattrs", and "subgroup/bar/.zarray"
# TODO this might be a sign that we should introduce a KerchunkGroupRefs type and cut down the references before getting to this point...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be cleaner, but is a refactor that can be done afterwards.

@TomNicholas TomNicholas merged commit fcdd5e4 into main Dec 10, 2024
11 checks passed
@TomNicholas TomNicholas deleted the fix_group_kwarg branch December 10, 2024 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Kerchunk Relating to the kerchunk library / specification itself readers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

open_virtual_dataset fails when there is a subgroup
2 participants