-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: Better use Xarray for Loading Grids #1092
base: main
Are you sure you want to change the base?
Conversation
@philipc2 why is #954 linked here? Are you planning to tackle this issue in this pull request? It doesn't look like you have any code in place that handles it yet, I have added an initial implementation for #954 in PR 1101. I didn't realize the issue was linked here, and it was one of my tasks for this iteration. I don't want to step on your toes or anything though if you have done work on that specific thing. |
This PR was started after a discussion with @florianziemen on better utilizing Xarray and Dask for loading our Grids. I linked #954 to this one because I was going to explore how to delay the loading of connectivity from a file until the user needs it, which would effectively work as a minimal reader. Instead of having the user manually specify that they want a minimal grid to be read in, having it automatically delayed while still preserving the metadata (i.e. connectivity names that are available, dimensions, etc.) would better align with what Xarray does. The issue with a minimal reader is that the data would still be directly loaded into memory, including and computations that need to be done on it (i.e. converting to zero index, replacing fill values). For extremely large grids this leads to a very noticeable delay in the time it takes to print out information about a @florianziemen and @erogluorhan can share their experiences also. @aaronzedwick let's spend some time on Tuesday to look at this together, I might need some help anyways. |
The only thing I'd like to share now is that Dask should be brought into the game only iif the user explicitly wants it (i.e. feeding UXarray with chunked data, Dask arrays). That said, I am glad to see the direction this PR is going towards, i.e. avoiding explicit Numpy/Dask array conversions from our Xarray data, and instead sticking to Xarray data structures as they already cater great IO experience. |
Closes #XXX
Overview