Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle ESTAT ?compressed=… query parameter #183

Open
nicolas-graves opened this issue Jun 13, 2024 · 4 comments
Open

Handle ESTAT ?compressed=… query parameter #183

nicolas-graves opened this issue Jun 13, 2024 · 4 comments
Labels
data-source Issues related to specific web services/data source(s) enh Enhancements & new features help welcome Issues that depend on contributions from new developers

Comments

@nicolas-graves
Copy link

According to https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+query , for the Eurostat format, the compressed=true param will return gzipped data.
I've tested this, it seems to work well (as gzipped data is recognized already) :

 sdmx/rest/v21.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sdmx/rest/v21.py b/sdmx/rest/v21.py
index faa0fef..fe483a1 100644
--- a/sdmx/rest/v21.py
+++ b/sdmx/rest/v21.py
@@ -36,6 +36,7 @@ PARAM: Dict[str, common.Parameter] = {
     "start_period": QueryParameter("start_period"),
     "end_period": QueryParameter("end_period"),
     "explicit_measure": QueryParameter("explicit_measure", {True, False}),
+    "compressed": QueryParameter("compressed"),
 }
 
 
@@ -59,7 +60,8 @@ class URL(common.URL):
         self.handle_path_params(self.rt + "/{flow}/{key}/{provider}")
         self.handle_query_params(
             "start_period end_period updated_after first_n_observations "
-            "last_n_observations dimension_at_observation detail_d include_history"
+            "last_n_observations dimension_at_observation detail_d include_history "
+            "compressed"
         )
 
     def handle_metadata(self):

@nicolas-graves
Copy link
Author

compressed is a boolean so maybe the right way is to add it as explicit_measure does in the line above.

@khaeru
Copy link
Owner

khaeru commented Jun 17, 2024

Hi @nicolas-graves—thanks for the suggestion here. Sorry for a slight delay in responding, as I was travelling.

The key point to note is that ?compressed=true is a Eurostat extension that is not part of the SDMX-REST standard.

  • You can see the full spec for the query format at https://github.com/sdmx-twg/sdmx-rest/blob/master/api/sdmx-rest.yaml (SDMX-REST v2.1.0 corresponding to SDMX 3.0.0) or https://github.com/sdmx-twg/sdmx-rest/blob/v1.5.0/v2_1/ws/rest/src/sdmx-rest.yaml (SDMX-REST v1.5.0 corresponding to SDMX 2.1).
  • As you can see there, "compress" is only specified as part of the Accept-Encoding HTTP header on a request.
  • The Eurostat docs you've linked—like the docs for many providers, honestly—are not very clear about "These things are all exactly the same as the SDMX standards" versus "Here is some special limitation/behaviour/extra feature of our particular service". These make it hard (IMO unnecessarily so) for users or developers like us to tell the difference.

So, all that said:

  • It is possible and in-scope to accommodate this special ESTAT feature in sdmx1.
  • However, the generic sdmx.rest.v21.URL class is not the place to do it. This class is supposed to conform to the standard exactly, and the standard does not have such a query parameter. If we added it here, then sdmx1 would happily pass ?compressed=true to a query URL for any REST source; but most of them would see this as erroneous and might return errors.
  • The proper place for this to be handled is in sdmx.source.estat, i.e. the Source subclass specific to ESTAT.
  • I am not sure if the existing internals make it easy to do this, but if not it would be great to extend them to cover this case. Then the .estat.Source class could declare certain additional query parameters, or subclass .v21.URL, or something lightweight to express what it will accept.

@nicolas-graves
Copy link
Author

Understood, thanks. Is such a thing already done in another source or should I rather try and implement it myself?

@khaeru
Copy link
Owner

khaeru commented Jun 18, 2024

I believe the existing source classes (see https://github.com/khaeru/sdmx/tree/main/sdmx/source) are limited to modifying the values only of existing/official keywords, like ?references=. So there is nothing that could be simply copied and modified.

To briefly sketch the changes:

sdmx/sdmx/client.py

Lines 225 to 228 in 473f3af

# Identify the URL class; handle the `kw`
# TODO specify API version for sources that support multiple API versions at the
# same URL
url = self.source.get_url_class()(**kw)

  • Here, Client._request_from_args() calls the Source.get_url_class() method on the current Source instance.
  • I believe all classes currently use the same, default implementation:

def get_url_class(self) -> Type["sdmx.rest.common.URL"]:
"""Return a class for constructing URLs for this Source.
- If :attr:`.versions` includes *only* SDMX 3.0.0, return :class:`.v30.URL`.
- If :attr:`.versions` includes SDMX 2.1, return :class:`.v21.URL`.
- Raise an exception for other :attr:`.versions` that are not supported.
"""
if {Version["3.0.0"]} == self.versions:
import sdmx.rest.v30
return sdmx.rest.v30.URL
elif Version["2.1"] in self.versions:
import sdmx.rest.v21
return sdmx.rest.v21.URL
else: # pragma: no cover
raise NotImplementedError(f"Query against {self.versions}")

  • At the moment, this is limited to returning either .v21.URL or .v30.URL.
  • We could implement estat.Source.get_url_class() and instead return a different class.
  • In particular that would be a subclass of .v21.URL, with one added query parameter per your original comment above.

Thus when querying ESTAT, this source-specific subclass of URL would be returned to _request_from_args(), which would then pass **kw to it, including compressed = True. That URL subclass would happily handle the query parameter and assemble the URL string.

You are welcome to try this if you feel up to it and have the time. But if not, I will come back to it after finishing some other changes I started for #180.

@khaeru khaeru changed the title Handle eurostat compressed keyword. Handle ESTAT ?compressed=… query parameter Jun 18, 2024
@khaeru khaeru added enh Enhancements & new features data-source Issues related to specific web services/data source(s) help welcome Issues that depend on contributions from new developers labels Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-source Issues related to specific web services/data source(s) enh Enhancements & new features help welcome Issues that depend on contributions from new developers
Projects
None yet
Development

No branches or pull requests

2 participants