Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support specifying single HDF Group in open_virtual_dataset #165

Merged
merged 14 commits into from
Aug 27, 2024

Conversation

scottyhq
Copy link
Contributor

@scottyhq scottyhq commented Jun 29, 2024

@TomNicholas @forrestfwilliams I took a pass at this for basic functionality of loading a single group. Seems to be working for the couple datasets I'm trying out from https://nisar.jpl.nasa.gov/data/sample-data/

@TomNicholas TomNicholas added enhancement New feature or request references generation Reading byte ranges from archival files labels Jun 29, 2024
Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on @scottyhq!

Seems to work fine, my comments are mostly about refactoring.

One other thing is that the group kwarg could also be supported when reading references from Zarr, but that could also just be flagged as an issue and left for a later PR, as the logic will be independent of the kerchunk-parsing logic here.

virtualizarr/xarray.py Outdated Show resolved Hide resolved
virtualizarr/xarray.py Outdated Show resolved Hide resolved
virtualizarr/tests/test_xarray.py Outdated Show resolved Hide resolved
virtualizarr/tests/test_xarray.py Outdated Show resolved Hide resolved
virtualizarr/xarray.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why the diff is so big on this file, i guess it's the ruff formatting...

@scottyhq
Copy link
Contributor Author

One other thing is that the group kwarg could also be supported when reading references from Zarr, but that could also just be flagged as an issue and left for a later PR, as the logic will be independent of the kerchunk-parsing logic here.

Yeah, sounds like a plan. I think this is now refactored and minimally functional. I'm sure working with other example datasets with groups (https://github.com/pydata/xarray-data/blob/master/cmip6.nc) will uncover improvements or the need for additional tests.

Comment on lines 397 to 399
tmpfile = fsspec.open_local(
f"filecache::{url}", filecache=dict(cache_storage="/tmp", same_names=True)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this fsspec function. Is this not something that can just be done with pathlib?

indexes={},
drop_variables=["listOfCovarianceTerms", "listOfPolarizations"],
)
tmpref = "/tmp/cmip6.json"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest has a fixture tmpdir - I think we just want to write to that?

Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great @scottyhq - just a couple of very minor comments and then we can merge.

virtualizarr/zarr.py Outdated Show resolved Hide resolved
@TomNicholas TomNicholas mentioned this pull request Aug 9, 2024
6 tasks
@TomNicholas TomNicholas merged commit 4f9647c into zarr-developers:main Aug 27, 2024
8 checks passed
@TomNicholas
Copy link
Member

Thank you @scottyhq !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request references generation Reading byte ranges from archival files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KeyError: '.zarray' with HDF5 data
3 participants