Efficiently parsing erddap server metadata #3

brey · 2022-05-05T08:20:24Z

When using erddapy to retrieve the metadata, the full set of data is parsed, including data variables. This results in a long wait depending on the volume of data.

There has to be a way to simplify/expedite this.

The text was updated successfully, but these errors were encountered:

ocefpaf · 2022-07-07T13:09:50Z

When using erddapy to retrieve the metadata, the full set of data is parsed, including data variables.

The get_info method should download only the metadata, it is something like:

info_url = e.get_info_url(dataset_id, response="csv")

info = pd.read_csv(info_url)
info.head()

However, that is quite "low level" and ideally we should allow for a "dataset-like" class with the metadata and load the data lazily afterwards. We are working on a refactor to go into this direction.

With that said I believe that libraries that build on top of erddapy should use the low level interface. The high level is mostly for end users.

brey · 2023-09-20T08:09:21Z

Hi @ocefpaf. I have finally came back to this issue. Thanks for the tip above but my problem remains. So using the get_info_url I am getting all the variables/Attributes which is fine. But then I would like to retrieve a subset of these and I can't see how I can avoid the time parameter.

I am posting below an example. I am using the EMODNET server as an example:

from erddapy import ERDDAP
import pandas as pd

e = ERDDAP(
  server="https://erddap.emodnet-physics.eu/erddap",
  protocol="tabledap",
)
e.response = "csv"
e.dataset_id = "EMODPACE_NMDIS_PSMSL_L2A_SLEV_TG_TS"


info_url = e.get_info_url(response='csv')
info = pd.read_csv(info_url)

info['Variable Name'].unique()

info['Attribute Name'].unique()

So far so good. However, what I need is the following

e.variables = [
    "StationName",
    "EP_PLATFORM_CODE",
    "EP_PLATFORM_TYPE",
    "EP_PLATFORM_LINK",
    "StationCountry",
    "longitude",
    "latitude",
]

If I use

df = e.to_pandas(low_memory=False)

I get all times. How I can get the above info without the time dimension?

pmav99 · 2023-10-05T16:29:58Z

@brey is this still an issue? I just tested it, and I don't see time in the returned results:

> df.head()
  StationName EP_PLATFORM_CODE EP_PLATFORM_TYPE                                   EP_PLATFORM_LINK StationCountry  longitude (degrees_east)  latitude (degrees_north)
0      Dalian           Dalian               TG  https://www.emodnet-physics.eu/map/spi.aspx?id...             CN                    121.68                     38.87
1      Kanmen           Kanmen               TG  https://www.emodnet-physics.eu/map/spi.aspx?id...             CN                    121.28                     28.08
2      Nansha           Nansha               TG  https://www.emodnet-physics.eu/map/spi.aspx?id...             CN                    112.88                      9.55
3       Xisha            Xisha               TG  https://www.emodnet-physics.eu/map/spi.aspx?id...             CN                    112.30                     16.80
4       Zhapo            Zhapo               TG  https://www.emodnet-physics.eu/map/spi.aspx?id...             CN                    111.81                     21.58

brey · 2023-10-05T16:37:01Z

Try

df.loc[df.EP_PLATFORM_CODE=='Xisha']

You get for each station one entry per timestamp

pmav99 · 2023-10-05T16:42:24Z

All the rows are identical, are they not? Then maybe,

df.loc[df.EP_PLATFORM_CODE=='Xisha'].iloc[0]

might be enough?

Or maybe even:

df.groupby(df.EP_PLATFORM_CODE).first()

brey · 2023-10-05T16:47:19Z

I know but that means that if another server has a longer time range the amount of data you'll download will be quite large.

ocefpaf · 2024-01-09T18:43:11Z

Sorry, this one flew under the radar but I just found it. Maybe

df = e.to_pandas(distinct=True)

can help you there. That would return only unique values, filtered on the server-side first. It should be similar to the post pandas unique method call.

pmav99 · 2024-01-09T20:27:35Z

Aha!
https://github.com/ioos/erddapy/blob/109ddec1efc223c2dfeea450efa2245b6ab9c5ef/erddapy/core/url.py#L61-L75C9
http://erddap.ioos.us/erddap/tabledap/documentation.html#distinct

Thank you Felipe.

brey mentioned this issue Jul 7, 2022

add searvey recipe conda-forge/staged-recipes#19567

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficiently parsing erddap server metadata #3

Efficiently parsing erddap server metadata #3

brey commented May 5, 2022

ocefpaf commented Jul 7, 2022

brey commented Sep 20, 2023

pmav99 commented Oct 5, 2023

brey commented Oct 5, 2023

pmav99 commented Oct 5, 2023

brey commented Oct 5, 2023

ocefpaf commented Jan 9, 2024 •

edited

Loading

pmav99 commented Jan 9, 2024

Efficiently parsing erddap server metadata #3

Efficiently parsing erddap server metadata #3

Comments

brey commented May 5, 2022

ocefpaf commented Jul 7, 2022

brey commented Sep 20, 2023

pmav99 commented Oct 5, 2023

brey commented Oct 5, 2023

pmav99 commented Oct 5, 2023

brey commented Oct 5, 2023

ocefpaf commented Jan 9, 2024 • edited Loading

pmav99 commented Jan 9, 2024

ocefpaf commented Jan 9, 2024 •

edited

Loading