-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Use satellite data stored at AWS #35
Comments
Thanks! two things --- If you go to https://github.com/oceanhackweek/ohw20-tutorials you can run the example yourself with the Binder link at the bottom. |
@cgentemann I'm rereading that |
as far as I can tell, the original netcdf file has no internal chunking == each netcdf file has 1 chunk == 1x5424x5424 |
Ah ok so that matches with the zarr dataset. I was curious if the chunk size was playing a role in the timing at all. Looks like it is mostly just data access. Thanks. |
yes, I'm actually hoping someone might jump in here with an explanation. We didn't change the chunking on purpose, to make the comparison as much apple-to-apple. The decrease in initial access time makes sense because now all the metadata is consolidated. The decrease in the analysis time I'm not sure I understand - maybe it has something to do with zarr concurrent reads? also, i've generalized the read routine to read all the goes aws data (not just SST). I'll post a link in a day or two. No power here right now. |
Very interesting test! What is the chunking of the Could you also time the it takes to run the |
The chunk size is the same as the netcdf (1x5424x5424). |
I created an end-to-end example here: It could be improved upon an added to the repo to supplement other Himawari examples: The gist could be updated by making a dir, downloading data, saving the fig then deleting the downloaded data. The next thing to test would be 'streaming' the data to avoid having to download the data locally. In addition, one thing I would be interested in - could slot on the end of this example - is how to save a true color image of the full disk as a e-mailable size limit (< 20 Mb) e.g. there was chat in the slack about using tiled=True when saving as a geotiff (https://pytroll.slack.com/archives/C0LNH7LMB/p1599313293263100) |
@raybellwaves Very nice. A couple things:
I'm not saying we can't incorporate your usage directly, but might be nice with the rest of your suggestions to include something like this where the files don't have to be downloaded to disk. |
@cgentemann has an example of how to access GOES data via AWS
https://github.com/oceanhackweek/ohw20-tutorials/blob/master/10-satellite-data-access/Access_cloud_SST_data_examples.ipynb
This is also related to pytroll/satpy#1287
The text was updated successfully, but these errors were encountered: