Support chunking multiple assets together in the `time`/`band` dimensions #106

gjoseph92 · 2021-12-16T00:54:36Z

Currently, stackstac is built around each STAC Asset being its own chunk in the dask array—the time and band dimensions always have a chunksize of 1.

However, there are cases where you might want to load multiple Assets in one chunk of the array. Most commonly, you'd do this when you have a huge graph, need to cut down on tasks, and can give up some granularity. Particularly, you might be happy to combine the time dimension into fewer chunks if you know you're doing a composite right away anyway. See microsoft/PlanetaryComputer#12 (comment) for a motivating example.

So let's support extending the chunksize= argument to stackstac.stack to take up to 4-tuples (time, band, y, x), so you can specify the chunking along all dimensions.

Note that this isn't #66 (though that could be a follow-on): we're not talking about flattening/pre-mosaicing the data. We'd still load every asset as usual, it's just that the chunks of the dask array might be (4, 2, Y, X) instead of always (1, 1, Y, X).

This should be done/considered as a part of #105.

Questions:

When a chunk contains multiple assets, should they be loaded serially, or in parallel? We could create our own internal threadpool, since most of the IO is not CPU-bound. However, because we have to duplicate the GDAL Dataset and file-descriptor per-thread, that might be expensive on memory. I suppose the runtime of T threads reading N assets is the same as T threads reading N / C assets, where each read takes C times longer. So probably in serial. Sure would be nice to just have an aiocogeo Reader for this 😁
How will combining multiple bands into a single chunk interplay with Support multi-band COGs #62?

The text was updated successfully, but these errors were encountered:

gjoseph92 mentioned this issue Dec 16, 2021

How to make a big-ish median mosaic microsoft/PlanetaryComputer#12

Closed

This was referenced Jan 12, 2022

Support chunking the time/band dimensions #116

Merged

Support dask>=2022 #117

Closed

gjoseph92 mentioned this issue Jan 20, 2022

Support dask>=2022 gjoseph92/geogif#4

Closed

gjoseph92 closed this as completed in #116 Feb 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support chunking multiple assets together in the `time`/`band` dimensions #106

Support chunking multiple assets together in the `time`/`band` dimensions #106

gjoseph92 commented Dec 16, 2021

Support chunking multiple assets together in the time/band dimensions #106

Support chunking multiple assets together in the time/band dimensions #106

Comments

gjoseph92 commented Dec 16, 2021

Support chunking multiple assets together in the `time`/`band` dimensions #106

Support chunking multiple assets together in the `time`/`band` dimensions #106