Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: Better use Xarray for Loading Grids #1092

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

philipc2
Copy link
Member

@philipc2 philipc2 commented Dec 3, 2024

Closes #XXX

Overview

@aaronzedwick
Copy link
Member

@philipc2 why is #954 linked here? Are you planning to tackle this issue in this pull request? It doesn't look like you have any code in place that handles it yet, I have added an initial implementation for #954 in PR 1101. I didn't realize the issue was linked here, and it was one of my tasks for this iteration. I don't want to step on your toes or anything though if you have done work on that specific thing.

@philipc2
Copy link
Member Author

philipc2 commented Dec 6, 2024

@philipc2 why is #954 linked here? Are you planning to tackle this issue in this pull request? It doesn't look like you have any code in place that handles it yet, I have added an initial implementation for #954 in PR 1101. I didn't realize the issue was linked here, and it was one of my tasks for this iteration. I don't want to step on your toes or anything though if you have done work on that specific thing.

Hi @aaronzedwick

This PR was started after a discussion with @florianziemen on better utilizing Xarray and Dask for loading our Grids. I linked #954 to this one because I was going to explore how to delay the loading of connectivity from a file until the user needs it, which would effectively work as a minimal reader.

Instead of having the user manually specify that they want a minimal grid to be read in, having it automatically delayed while still preserving the metadata (i.e. connectivity names that are available, dimensions, etc.) would better align with what Xarray does.

The issue with a minimal reader is that the data would still be directly loaded into memory, including and computations that need to be done on it (i.e. converting to zero index, replacing fill values). For extremely large grids this leads to a very noticeable delay in the time it takes to print out information about a Grid or UxDataset

@florianziemen and @erogluorhan can share their experiences also.

@aaronzedwick let's spend some time on Tuesday to look at this together, I might need some help anyways.

@philipc2 philipc2 added the scalability Related to scalability & performance efforts label Dec 6, 2024
@philipc2 philipc2 added this to the Scalability & Performance milestone Dec 6, 2024
@philipc2 philipc2 changed the title DRAFT: Use Dask for Loading Grids DRAFT: Better use Xarray for Loading Grids Dec 7, 2024
@erogluorhan
Copy link
Member

The only thing I'd like to share now is that Dask should be brought into the game only iif the user explicitly wants it (i.e. feeding UXarray with chunked data, Dask arrays). That said, I am glad to see the direction this PR is going towards, i.e. avoiding explicit Numpy/Dask array conversions from our Xarray data, and instead sticking to Xarray data structures as they already cater great IO experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scalability Related to scalability & performance efforts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better utilize Xarray when loading grids Option to load minimal required coordinates and connectivity
3 participants