-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out-source storage backends to fsspec drivers #541
Comments
Because it is not always necessary to have e.g. plotting capabilities and install all the additional earthkit dependencies, it would be great to extract these backends into separate packages Regarding this point, that is the reason why earthkit is broken into components. Loading and converting data is the responsibility of earthkit-data. You should not need to install the plotting components if you dont want plotting. We are trying to minimise the dependencies. A lot of the earthkit-data dependencies are related to its data sources and its data conversions. |
The backends developed within this package seem useful also outside of earthkit. Regarding this point, please note that most of the backends for loading from data sources, already exist as lower level packages which bring their own clients and protocols (FDB, MARS, CDS, etc) |
Irrespective of the 2 comments above, a backend based on |
I would suggest that starting with an implementation of the ECFS backend could be a good first step? |
That I have developed at https://github.com/observingClouds/ecmwfspec, would be great to see more like these for FDB and MARS 😉 |
I guess what I am saying is that it would be great to implement an fsspec entrypoint in those lower level packages (or earthkit-data if the lower packages miss some fundamental functionality). In case of FDB for example, the entry point could be implemented based on https://github.com/ecmwf/earthkit-data/blob/develop/src/earthkit/data/sources/fdb.py or https://github.com/ecmwf/pyfdb such that the following works: import xarray
ds = xr.open_dataset("fdb://domain=g&stream=oper&levtype=pl&levelist=300&date=20191110&time=0000&step=0¶m=138&class=rd&type=an&expver=xxxx") |
This is already implemented with earthkit-data, reading from the FDB and then using the If you want to avoid using earthkit-data and go via the |
Is your feature request related to a problem? Please describe.
The backends developed within this package seem useful also outside of earthkit. Who doesn't like to have easier access to data and focus more on the fun parts?! 😄 Because it is not always necessary to have e.g. plotting capabilities and install all the additional earthkit dependencies, it would be great to extract these backends into separate packages. This would also further be in line with the ECMWF Software Strategy and Roadmap for 2023–2027, which earthkit's development follows and help the community to
Describe the solution you'd like
I propose to develop these backends (ECFS, FDB, MARS,...) as fsspec drivers. The benefit would be:
Further, earthkit could have a general entrypoint for fsspec (which e.g.
xarray.open_dataset()
is) and get support for all sort of other data sources for free, e.g. S3, zip, tar, webdav, ftp, Databricks, DVC, git, memory, cache and many, many, many more. Though the package name includes "filesystem" it fully supports also object stores and with memory and cache drivers also in-memory streaming.I have already implemented a driver for ECFS that helps to abstract the system commands and allows to access ECFS resources directly via a URI.
Some of the current issues strengthen this idea as they would benefit from an fsspec centered implementation:
Having also the ECMWF related protocols implemented as fsspec drivers would be amazing.
@tlmquintino as we were recently touching this topic
Describe alternatives you've considered
No response
Additional context
No response
Organisation
DMI
The text was updated successfully, but these errors were encountered: