-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of DataModel
reading for single (or simple) metadata access
#284
Comments
I agree having this function would be very useful. What dangers do you foresee? |
Thanks for taking a look at this issue. Most of the dangers are that it bypasses all of the stdatamodels and asdf machinery. This means:
I think for the |
If one were concerned with validation, the option could be to do it the current way (though I would not make that the option for operations since the file should have been validated, and no sneaky, invalid updates should have been done). But I agree that an easy-to-use version of 4) should be provided. I can see mining of the files meta data is extremely useful, particularly out of operations and should be as efficient as possible. |
It would be great if the efficient metadata access code could also load multiple fits keywords. |
There are use cases (like the jwst
resample
step) where loading a single keyword from manyDataModel
containing fits files may be useful. As the number of files might be very large and opening every model might exceed reasonable amounts of RAM it will be important to have a performant way to perform these simple keyword accesses.Using
meta.wcsinfo.s_region
(contained in the ASDF extension) as an example there are a few ways this keyword can be read:stdatamodels.jwst.datamodels.open
:stdatamodels.asdf_in_fits
:astropy.io.fits
andasdf.open
:astropy.io.fits
andasdf.util.load_yaml
:Of the 4 options above 1-3 are similar in performance (using both a
ImageModel
and a largerIFUImageModel
as test files). With performance being limited primarily byasdf.open
(more on that below). 4 is much faster in both cases. See the below table for performance (run withcProfile
so slightly slower than real).The table shows it's ~10x faster to use
load_yaml
as this skips:Although all of the above is public API it seems worthwhile to investigate wrapping 4 as a helper function in stdatamodels (with sufficient documentation about how this is dangerous).
Below is a snakeviz generated graph of the call to
dm.open
for theIFUImageModel
data file showing the bulk of the time spent inasdf.open
:The text was updated successfully, but these errors were encountered: