-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spatial partitioning, sorting and shuffling #84
Comments
Can you add a comment to pydata/xarray#9546 describing the workload and what API might be useful please |
Yes sure, I don't have any precise idea yet but I will do when I'll think more about it. |
Doesn't this just mean computing Hilbert distance (we can use vanilla Geopandas if needed or vendor that code) and using That is also how |
yeah probably best to reuse dask-geopandas here, we'd need some new dask array API to shuffle by unknown values. |
To be honest I didn't look closely into either dask_geopandas' I have looked into that a bit more now but probably not enough yet.
@dcherian -- What do you mean exactly by "unknown values"? What we want to shuffle here are "distance" values (encoded as integer indices) along a 1-dimensional space filling curve of a fixed resolution ( Would it be possible to use something like
@martinfleis -- Hmm from what I see I guess we might as well reuse Computing Hilbert distance is more embarrassingly parallel, if we can do it with vanilla Geopandas we should do it! |
If the distance values are in-memory then pydata/xarray#9320 should work for you! ds.groupby_bins(...).shuffle_to_chunks() |
Great! I'll give it a try when I find time. |
When dealing with large sets of geometries it would be nice if we could partition (chunk) the geometry coordinate and the GeometryIndex based on spatial locality (thus requiring spatial sorting or shuffling), like explained for geo-dataframes in dask-geopandas' spatial partitioning user guide.
This would require a good amount of work both here and upstream, though:
xarray.IndexVariable
to allow dask arrays or other lazy arrays More flexible index variables pydata/xarray#8124 (although we can work around this by using regularxarray.Variable
objects for now)GroupBy.shuffle_to_chunks()
pydata/xarray#9320The text was updated successfully, but these errors were encountered: