You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In some cases it would be helpful to be able to load a particular intake catalog of EUREC4A data. get_intake_catalog is currently allowing to select different catalogs based on their CID when served via IPFS. I'd like to propose a similar option also for the file system referenced catalogs. Currently only the catalog in the master branch of the eurec4a repository can be loaded.
Possibly this could be written as a new function called e.g. open_intake_catalog or integrated as an argument to get_intake_catalog. This would add the possibility to switch to other EUREC4A intake catalogs ( different fork/ branch/filesystem) which might be under development or contain e.g. references to a local HPC file system.
defget_intake_catalog(use_ipfs=False):
""" Open the intake data catalog. The catalog provides access to public EUREC4A datasets without the need to manually specify URLs to the individual datasets. """ifuse_ipfs:
ifisinstance(use_ipfs, str):
cid=use_ipfselse:
cid=get_cids()['intake']['latest']
returnopen_intake_catalog(f"ipfs://{cid}/catalog.yml")
else:
returnopen_intake_catalog("https://raw.githubusercontent.com/eurec4a/eurec4a-intake/master/catalog.yml")
to reduce redundancy.
Of course it would also be an option to just load different catalogs directly via intake without using this package in those cases.
The text was updated successfully, but these errors were encountered:
I have to admit, that I'm a bit reserved with respect to this proposal. Maybe others should chime in and add more opinions. Here's a bit about how we arrived at the current state:
The initial idea of get_intake_catalog has been to have no arguments at all (it should just return the "best available catalog"). It also started out as kind of a work-around to have the hard-coded URL to github in a certral place with the option to update it if needed.
This got a little washed out by adding the option use_ipfs which initially was only False or True, but that's maybe still reasonable to do. Even in this case, the function does a bit of (non-trivial) work, namely to fetch the latest CID from github.
The latest update (the possibility to give an actual CID) is arguably too much: we could and maybe should just advise people to do intake.open_catalog(f"ipfs://{cid}/catalog.yml") themselves 🤷♂️. In particular, as it's possible to put in arbitrary CIDs, it's now possible to open non-eurec4a intake catalogs with eurec4a.get_intake_catalog, which probably should then be called intake.open_catalog instead... However, if you think of a CID as a version instead of as a path, it might still be somewhat reasonable 🤔 .
Anyways, my main cocern is, that I don't really see an advantage of using eurec4a.open_intake_catalog instead of the proposed intake.open_catalog, where I do see an advantage of using eurec4a.get_intake_catalog(True|False). CIDs are somewhere in-between.
I'm also a bit concerned about referencing data which is local to an HPC system in something which is somehow labeled an "eurec4a" intake catalog, as this obviously goes agains the purpose of having a globally accessible catalog.
To move this forward: how could potential future implementations of eurec4a.open_intake_catalog could look like, which would be a reason to establish this method now (in stead of recommending the use of intake.open_catalog)?
In some cases it would be helpful to be able to load a particular intake catalog of EUREC4A data.
get_intake_catalog
is currently allowing to select different catalogs based on their CID when served via IPFS. I'd like to propose a similar option also for the file system referenced catalogs. Currently only the catalog in the master branch of the eurec4a repository can be loaded.Possibly this could be written as a new function called e.g.
open_intake_catalog
or integrated as an argument toget_intake_catalog
. This would add the possibility to switch to other EUREC4A intake catalogs ( different fork/ branch/filesystem) which might be under development or contain e.g. references to a local HPC file system.open_intake_catalog
could be as simple as:get_intake_catalog
could be rewritten toto reduce redundancy.
Of course it would also be an option to just load different catalogs directly via
intake
without using this package in those cases.The text was updated successfully, but these errors were encountered: