Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve query performance with dask #8

Open
luukvdmeer opened this issue Nov 30, 2021 · 0 comments
Open

Improve query performance with dask #8

luukvdmeer opened this issue Nov 30, 2021 · 0 comments
Labels
feature 🎁 New feature or request help wanted 🆘 Extra attention is needed optimization 🚀 Because fast isn't fast enough

Comments

@luukvdmeer
Copy link
Member

Description

Although this is primarily a research project it is good to look into possibilites for query performance optimization. One that might be most easy to implement is dask, since it has good integration with xarray and ODC. We should explore how dask can help us to improve query performance. Specific aspects to look into are:

  • Lazy loading. Especially of importance when you apply spatial or temporal filters after referencing a semantic concept, factbase resource or result definition. Currently data are loaded at the reference, while it is obviously better when they only get loaded after the filter is applied.
  • Chunking. Should improve performance e.g. for reductions on large arrays, that can easily be computed in parallel. However, I do think chunking decreases performance when the data are small. So we should in that case think of a decision mechanism that defines if data should be chunked and if yes in how many chunks. Could also be something that the user just needs to define.

Additional context

Lazy loading with Opendatacube and Dask
Parallel processing in Digital Earth Africa
Dask array reduce
Xarray and dask

@luukvdmeer luukvdmeer added feature 🎁 New feature or request help wanted 🆘 Extra attention is needed optimization 🚀 Because fast isn't fast enough labels Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature 🎁 New feature or request help wanted 🆘 Extra attention is needed optimization 🚀 Because fast isn't fast enough
Projects
None yet
Development

No branches or pull requests

2 participants