Byte range selection performance #40
adair-kovac
started this conversation in
Ideas
Replies: 1 comment 4 replies
-
Hi @adair-kovac, thanks for sharing your thoughts on this. Yes, this is an aspect of Herbie that could be improved. The byte range request is implemented with curl because it was easy, and I haven't thought about it much since then. One thing that hasn't worked is making multiple byte range requests in a single curl command (I think this is a limitation of the servers the data are located and not of curl). That is why curl is executed once for each grib message that is subset. It would be interesting to see if the performance is faster with boto3 or requests. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi @blaylockbk , I meant to at least write some benchmarks to verify and quantify this but it's been 2 months and I haven't done that so I'll just report an impression I got about the performance -
The byte range selection for the HRRR should be significantly faster than downloading the whole GRIB2 file, and I believe it is if you use the boto3 library. But from a place with decent network speed (so not my home wifi, yes the CHPC or an AWS EC2 node), it's actually faster to just download the whole GRIB2 file than to select a certain field using herbie. I'm guessing that's due to curl overhead, though my second guess is that it could be due to whatever the process for indexing into the grib file is.
I see from the code comments that you've thought about different ways of implementing the byte range selection, and I think it would be a good enhancement to herbie if that were reliably faster than downloading the whole file.
Beta Was this translation helpful? Give feedback.
All reactions