You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The performance of the maap_py library for CMR granule searches of moderate size appears to be rather poor.
Here we compare the performance of using maap_py vs. direct HTTP requests for CMR granule searches of varying sizes. We compare the absolute speeds of a number of requests sizes, as well as the growth rate of the times taken for the requests.
Using maap-py with the MAAP CMR host is nearly twice as fast as when using the NASA CMR host, which is perhaps surprising, given that the NASA CMR system is much larger
The performance of maap-py is roughly linear (perhaps slightly worse) relative to the size of the request
The performance of maap-py is far worse than using direct HTTP requests
With direct HTTP requests, the difference in performance between the MAAP CMR host and the NASA CMR host appears negligible (unlike the difference in performance between the 2 hosts when using maap-py)
NOTE: One obvious difference between the maap-py requests and the direct HTTP requests is that maap-py uses the ECHO-10 XML format and performs XML parsing, whereas the direct HTTP requests use JSON (UMM), and JSON parsing is likely much more performant than XML parsing, which might account for a significant portion of the performance difference.
The text was updated successfully, but these errors were encountered:
I discovered why using maap-py for finding granules is (by default) slower than making direct HTTP requests, at least when using maap-py within the ADE: the default page size is only 20. No matter how large limit is in the examples above, maap-py uses a default page size of 20 (as configured in the maap.cfg file in the ADE).
However, in order to override this page size, the setting in maap.cfg must be modified, which would affect all users within the ADE. Alternatively, the default maap.cfg file could be copied to the current directory and updated there, but this is problematic due to #29.
Further, since the page size is configured in maap.cfg, the page size cannot be specified on a per-request basis. All requests, regardless of the limit specified for a request, use the same page size.
CMR Granule Search Performance Comparison
The performance of the
maap_py
library for CMR granule searches of moderate size appears to be rather poor.Here we compare the performance of using
maap_py
vs. direct HTTP requests for CMR granule searches of varying sizes. We compare the absolute speeds of a number of requests sizes, as well as the growth rate of the times taken for the requests.Define Searching Functions
So that we can capture timings using a simple function-timing decorator, we define functions for comparison:
Find GEDI L4A Granules
We'll use the GEDI L4A collection for our granule searches.
MAAP Using MAAP OPS CMR Host
MAAP Using NASA OPS CMR Host
Direct HTTP Using MAAP OPS CMR Host
Direct HTTP Using NASA OPS CMR Host
Summary
maap-py
with the MAAP CMR host is nearly twice as fast as when using the NASA CMR host, which is perhaps surprising, given that the NASA CMR system is much largermaap-py
is roughly linear (perhaps slightly worse) relative to the size of the requestmaap-py
is far worse than using direct HTTP requestsmaap-py
)NOTE: One obvious difference between the
maap-py
requests and the direct HTTP requests is thatmaap-py
uses the ECHO-10 XML format and performs XML parsing, whereas the direct HTTP requests use JSON (UMM), and JSON parsing is likely much more performant than XML parsing, which might account for a significant portion of the performance difference.The text was updated successfully, but these errors were encountered: