-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficiently parsing erddap server metadata #3
Comments
The get_info method should download only the metadata, it is something like: info_url = e.get_info_url(dataset_id, response="csv")
info = pd.read_csv(info_url)
info.head() However, that is quite "low level" and ideally we should allow for a "dataset-like" class with the metadata and load the data lazily afterwards. We are working on a refactor to go into this direction. With that said I believe that libraries that build on top of erddapy should use the low level interface. The high level is mostly for end users. |
Hi @ocefpaf. I have finally came back to this issue. Thanks for the tip above but my problem remains. So using the I am posting below an example. I am using the EMODNET server as an example: from erddapy import ERDDAP
import pandas as pd
e = ERDDAP(
server="https://erddap.emodnet-physics.eu/erddap",
protocol="tabledap",
)
e.response = "csv"
e.dataset_id = "EMODPACE_NMDIS_PSMSL_L2A_SLEV_TG_TS"
info_url = e.get_info_url(response='csv')
info = pd.read_csv(info_url)
info['Variable Name'].unique()
info['Attribute Name'].unique() So far so good. However, what I need is the following e.variables = [
"StationName",
"EP_PLATFORM_CODE",
"EP_PLATFORM_TYPE",
"EP_PLATFORM_LINK",
"StationCountry",
"longitude",
"latitude",
] If I use df = e.to_pandas(low_memory=False) I get all times. How I can get the above info without the time dimension? |
@brey is this still an issue? I just tested it, and I don't see > df.head()
StationName EP_PLATFORM_CODE EP_PLATFORM_TYPE EP_PLATFORM_LINK StationCountry longitude (degrees_east) latitude (degrees_north)
0 Dalian Dalian TG https://www.emodnet-physics.eu/map/spi.aspx?id... CN 121.68 38.87
1 Kanmen Kanmen TG https://www.emodnet-physics.eu/map/spi.aspx?id... CN 121.28 28.08
2 Nansha Nansha TG https://www.emodnet-physics.eu/map/spi.aspx?id... CN 112.88 9.55
3 Xisha Xisha TG https://www.emodnet-physics.eu/map/spi.aspx?id... CN 112.30 16.80
4 Zhapo Zhapo TG https://www.emodnet-physics.eu/map/spi.aspx?id... CN 111.81 21.58 |
Try df.loc[df.EP_PLATFORM_CODE=='Xisha'] You get for each station one entry per timestamp |
All the rows are identical, are they not? Then maybe,
might be enough? Or maybe even: df.groupby(df.EP_PLATFORM_CODE).first() |
I know but that means that if another server has a longer time range the amount of data you'll download will be quite large. |
When using
erddapy
to retrieve the metadata, the full set of data is parsed, including data variables. This results in a long wait depending on the volume of data.There has to be a way to simplify/expedite this.
The text was updated successfully, but these errors were encountered: