Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add common resample code to stcal #279

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

mcara
Copy link
Member

@mcara mcara commented Aug 8, 2024

This PR add the common resample code used by both JWST and Roman pipelines to stcal. Also, for the first time, this PR adopts the new drizzle API from spacetelescope/drizzle#134 for the resample code used in the pipelines.

This work is related to https://jira.stsci.edu/browse/AL-835

At this moment this is a very rough draft for illustration purpose. It should run with default arguments (except input_models and output file name can be specified; everything else is not guaranteed to work). There are no unit/regression tests and documentation may not match the code.

Checklist

  • added entry in CHANGES.rst (either in Bug Fixes or Changes to API)
  • updated relevant tests
  • updated relevant documentation
  • updated relevant milestone(s)
  • added relevant label(s)

@mcara mcara requested a review from a team as a code owner August 8, 2024 13:37
@mcara mcara marked this pull request as draft August 8, 2024 13:37
@mcara mcara self-assigned this Aug 8, 2024
Copy link

codecov bot commented Aug 8, 2024

Codecov Report

Attention: Patch coverage is 92.53996% with 42 lines in your changes missing coverage. Please review.

Project coverage is 33.42%. Comparing base (60bd3b8) to head (55294b0).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/stcal/resample/resample.py 92.48% 42 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #279       +/-   ##
===========================================
- Coverage   86.21%   33.42%   -52.80%     
===========================================
  Files          47       39        -8     
  Lines        8812     8818        +6     
===========================================
- Hits         7597     2947     -4650     
- Misses       1215     5871     +4656     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


@abc.abstractmethod
def run(self):
...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using ellipses, would it be better to raise and exception with a message saying "method not implemented"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by marking it as an abstractmethod this is done automatically if you try to call it directly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file necessary? It is imported by only one other file. It is not a set of common functions shared among many files. It's two short functions.

@braingram
Copy link
Collaborator

Would you rebase this and add a drizzle dependency with the branch for spacetelescope/drizzle#134 so we can see the CI run?

@@ -0,0 +1,36 @@
import numpy as np
from stdatamodels.dqflags import interpret_bit_flags
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from stdatamodels.dqflags import interpret_bit_flags
from stdatamodels.dqflags import interpret_bit_flags

Is there a non-stdatamodels way to handle this. We can't make stdatamodels a dependency of this code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, there is. I'll fix it (I forgot to switch to astropy although stdatamodels.dqflags.interpret_bit_flags() for unknown to me reasons does things slightly different. I think that function should be removed altogether from stdatamodels.dqflags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I did it in resample.py but not in utils.


self.final_post_processing()

self._output_model.write(self._output_filename, overwrite=True)
Copy link
Collaborator

@braingram braingram Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work for roman since write does not exist for roman_datamodels.DataModel and overwrite is not a valid keyword argument for roman_datamodels.DataModel.save.

I think it's best if we leave all file-IO up to the pipeline and not include it here in stcal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was an omission. I'll fix this.

Copy link
Collaborator

@emolter emolter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my main question is still the extent to which we assume the structure and attributes of datamodels, and now also of ModelLibrary

"""Raised when the output is too large for in-memory instantiation"""


def output_wcs_from_input_wcs(input_wcs_list, pixel_scale_ratio=1.0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might put this somewhere other than resample.py, e.g. alignment/utils or alignment/resample_utils. I would vote (eventually) for having a separate submodule for WCS-related utility functions, but that's probably beyond the PR scope

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. It was temporary here. This will be fixed once we have a function that computes output WCS from s_region of input data models and other parameters.


# loop over only science exposures in the ModelLibrary
# sci_indices = self._input_models.ind_asn_type("science")
with self._input_models:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be in the same infinite loop as before here, about whether or not to make certain datamodels dependencies of stcal, either implicitly or explicitly. It seems to me that with this and other code lines, ResampleBase can only be used if the input models are assumed to be ModelLibrary instances. Are we okay with this? If so, should we just make this explicit in some way by specifying the input model type inside the ResampleBase abstract base class, and make stpipe a dependency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be in the same infinite loop as before here, about whether or not to make certain datamodels dependencies of stcal, either implicitly or explicitly. It seems to me that with this and other code lines, ResampleBase can only be used if the input models are assumed to be ModelLibrary instances.

True. It is difficult to maximize the common code ported to stcal with the purpose of reducing maintenance burden in the pipelines given the rigid usage imposed by the ModelLibrary (by contrast, ModelContainer was behaving like a standard list). I can try to see how to bury ModelLibrary into the "IO" class, but it's not going to be pretty.

Are we okay with this? If so, should we just make this explicit in some way by specifying the input model type inside the ResampleBase abstract base class, and make stpipe a dependency?

I don't know what are the plans for these packages but I do not think they are intended to be general purpose algorithms like those in astropy (or even drizzle) so I do not see an issue with using structures shared by all pipelines.


try:
if self.get_model_attr_value(model, "exptype").upper() != "SCIENCE":
self._input_models.shelve(model, modify=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually when there's a try...except clause inside a ModelLibrary context where the try could fail while models are borrowed from the library, it's necessary to put the shelve into a finally block so it executes regardless of the path the try...except takes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, if an exception is triggered then there is some other processing after the try-except block and then the model is closed. So, I think, the code is fine: this was intentional.

src/stcal/resample/resample.py Outdated Show resolved Hide resolved
"""
resample_suffix = 'i2d'
resample_file_ext = '.fits'
n_arrays_per_output = 2 # #flt-point arrays in the output (data, weight, var, err, etc.)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this hard-coded? Will it be different for co-add and single resamples? Don't we have multiple types of variance that need to be in memory simultaneously in the current version of resample, i.e., doesn't this underestimate the memory usage when it's used in check_memory_requirements?

def build_driz_weight(self, model, weight_type=None, good_bits=None):
"""Create a weight map for use by drizzle
"""
data = self.get_model_array(model, "data")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again here certain attributes of the model are assumed to exist, and IMO this should be made explicit either by having a model base class or by adding stcal dependencies (although we've been trying hard to avoid the latter)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again here certain attributes of the model are assumed to exist, and IMO this should be made explicit either by having a model base class or ...

Seems like this is not going to happen.

... by adding stcal dependencies (although we've been trying hard to avoid the latter)

I don't think this is necessary as this PR illustrates how to avoid imports.

Yes, it is true thst we assume some attributes to exist. I think for now we are safe but if you would like to harden the code, I could add try-except blocks around these data retrieval statements.


@abc.abstractmethod
def run(self):
...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by marking it as an abstractmethod this is done automatically if you try to call it directly


Notes
-----
This routine performs the following operations::
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks identical to ResampleCoAdd. Some doc updates should be made to clarify how they are different

Copy link
Collaborator

@perrygreenfield perrygreenfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments so far.

return tmeasure, True


# FIXME: temporarily copied here to avoid this import:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps jwst should import it from stcal in future PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I think that would be the right place.


# get the available memory
available_memory = (
psutil.virtual_memory().available + psutil.swap_memory().total
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should swap be included in this? If it is what I think it is, relying on it will slow things down.

Copy link
Collaborator

@emolter emolter Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ported over from current jwst main branch, but I don't think it does anything useful at present. See spacetelescope/jwst#8775

I wouldn't mind if it were removed as part of this PR, but the JIRA ticket is still assigned to David Law which means it hasn't been officially approved for work by the pipeline team yet

@mcara mcara force-pushed the resample-common-code branch 2 times, most recently from bcead21 to 43493b1 Compare October 11, 2024 07:08
@mcara mcara force-pushed the resample-common-code branch from 43493b1 to 2da57ba Compare October 25, 2024 13:59
@mcara mcara mentioned this pull request Nov 27, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants