Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update resample for new reference file type #8242

Closed

Conversation

braingram
Copy link
Collaborator

@braingram braingram commented Jan 26, 2024

This PR:

  • removes the code that handles the outdated and unused drizpars resample reference file
  • adds support for a possible new reference file format pars-resamplestep

drizpars removal

The current resample code will only allow the reference files to define the following:

  • pixfrac
  • kernel
  • fillval
  • weight_type

However, if any of those values are either defined by the user or have a default value the reference file data will not be used. As all of these values have defaults making the reference file data unused:

pixfrac = float(default=1.0) # change back to None when drizpar reference files are updated
kernel = string(default='square') # change back to None when drizpar reference files are updated
fillval = string(default='INDEF' ) # change back to None when drizpar reference files are updated
weight_type = option('ivm', 'exptime', None, default='ivm') # change back to None when drizpar ref update

pars-resample

To add support for pars-resample this PR adds a new entry to the ResampleStep spec:

ref_pars_table = list(default=list()) # parameter 'table' read from pars-resample

and sets the above mentioned parameters to None to allow them to be set from the pars-resample file.

As this new file is a pars- file, it is read at the creation of the step and sets attributes on the step instance (in this case the ref_pars_table attribute). When the step computes kwargs to pass to the actual resample code the 'table' in this attribute is used (in a way similar to what was done with the drizpars table) to only set values that aren't defined by the user. The code is hopefully commented thoroughly enough to explain the details.

Documentation (and the spec comments) will need to be updated before this PR is opened for review.

Resolves JP-nnnn

Closes #

This PR addresses ...

Checklist for maintainers

  • added entry in CHANGES.rst within the relevant release section
  • updated or added relevant tests
  • updated relevant documentation
  • added relevant milestone
  • added relevant label(s)
  • ran regression tests, post a link to the Jenkins job below.
    How to run regression tests on a PR
  • Make sure the JIRA ticket is resolved properly

@braingram braingram changed the title Resample ref update Update resample for new reference file type Jan 26, 2024
Copy link

codecov bot commented Jan 26, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (bddb39c) 75.18% compared to head (30dc004) 75.17%.

Files Patch % Lines
jwst/resample/resample_step.py 92.30% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8242      +/-   ##
==========================================
- Coverage   75.18%   75.17%   -0.01%     
==========================================
  Files         470      470              
  Lines       38547    38485      -62     
==========================================
- Hits        28980    28931      -49     
+ Misses       9567     9554      -13     
Flag Coverage Δ *Carryforward flag
nightly 77.35% <ø> (-0.03%) ⬇️ Carriedforward from bddb39c

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@stscieisenhamer stscieisenhamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise this is OK. I'm still having an issue conceptually and will ask this only rhetorically: This seems to be a significant complication to remove a reference file to just then implement a reference-within-a-reference file. If stakeholders are happy with the answer to that, then 👍

Copy link
Collaborator

@jdavies-st jdavies-st left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general comment here - there should be no need of special handling of a pars- file for resample. It is all handled by the default code in stpipe, just like every other step with a pars- file. The reftype is automatic as well, as soon as the reference file is delivered to CRDS. One only needs to tear out all the old drizpars reffile junk in the code. I.e. _compute_resample_kwargs() should be unnecessary.

The logic based on number of input images in rows in the old drizpars fits binary table only really varied pixfrac:

$ showtable jwst_nircam_drizpars_0001.fits
numimages filter pixfrac kernel fillval wht_type stepsize
--------- ------ ------- ------ ------- -------- --------
        1    ANY     1.0 square   INDEF  exptime       10
        2    ANY     1.0 square   INDEF  exptime       10
        4    ANY     0.8 square   INDEF  exptime       10

but this was never tested or optimized and is from 2006 when resample was first implemented. These are all placeholders. Anything other than pixfrac=1.0 in fact is not actually supported mathematically. It's the equivalent of unsharp mask in Photoshop™. stepsize is not used, exptime weighting instead of ivm ruins the uncertainties in output error arrays and the errors in the source catalogs, and INDEF is meaningless.

How about we make proper, tested input parameter sets as reference files that are not tables and do not depend on number of input images?

I'll add that if there's new pars- reffile to be delivered, this is an excellent time to get rid of all references to values of "INDEF" in this codebase as well, which have no meaning in Python or the C++ code in drizzle. See #2219 and #7664.

@tapastro
Copy link
Contributor

tapastro commented Feb 5, 2024

... exptime weighting instead of ivm ruins the uncertainties in output error arrays and the errors in the source catalogs, ...

#8258
🙈

Do we have a write-up somewhere (I thought I'd seen it detailed in a github issue discussion from 3+ years ago) explaining this to users?

@braingram
Copy link
Collaborator Author

braingram commented Feb 5, 2024

Thanks all for looking at this.

@hbushouse many of the comments here might be relevant to JP-2682

The ticket mentions that keeping the ability to select parameters based on the number of input files "could be useful" for NIRISS and is awaiting feedback about NIRCam.

Is there currently a way to provide the number of input files when selecting a pars file from crds? I have not been able to find one and chose (with this example PR) to reproduce what was done for the old drizpars reference file (which stores a table where a row of parameters is selected based on the number of images). Because of this choice the pars file data still needs special handling (as the number of input files is not provided when stpipe calls get_reference_file for the pars file.

One alternative to the approach in this PR would be to update stpipe to provide the number of input images to the parameters provided to crds during pars selections. I believe the information is available just prior to the call to get_reference_file (when strun is called with an assocation, the association is opened to read the fist model to use for get_crds_parameters). I don't know what all corner cases would need to be handled with this approach (running a step with a model input, container input, etc) minimally a standard for NIMAGES would need to be defined as this approach would impact all jwst and romancal steps.

Copy link
Collaborator

@hbushouse hbushouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the more I think about this, the less likely it seems that we can accomplish the use of multiple settings for a given param via a pars ref file. Like I said in another comment, stpipe is setup to automatically search for and load a pars ref file for each step and when it reads the pars ref file it's just expecting one value for each defined step parameter and returns whatever values it finds in the Step class instance. No idea how we could ever get it to return a list of values for one or more params, which could then be searched through and selected by the step itself.

kernel = string(default='square') # change back to None when drizpar reference files are updated
fillval = string(default='INDEF' ) # change back to None when drizpar reference files are updated
weight_type = option('ivm', 'exptime', None, default='ivm') # change back to None when drizpar ref update
pixfrac = float(default=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to set all of these defaults to None in order to have the values from the new pars-resamplestep ref file take effect? I know we have to leave the default values set when the drizpars ref files are in use, because of the odd logic that was in the get_drizpars function. But the normal order of precedence used in stpipe and jwst should allow values from the pars-resamplestep ref files to override defaults set here in the step spec block. We have plenty of active examples of this. For example, several prototype steps have skip=True set in the step spec blocks, so that when run on their own they get skipped. But then there are pars ref files in place that override that value and allow the step to execute. So I think those comments about setting all the defaults back to None only applied to the case where we continue to use the drizpars ref files. We've got to set some sensible defaults for some of these, like pixfrac=1.0 and kernel='square', so that when/if the step is run in the absence of pars-resamplestep ref files, it'll do something sensible (the whole purpose of defaults).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were set to None as a way to tell that the parameters were "unset". The _compute_resample_kwargs function overwrote these parameters only if they were None (that way if a user provided a value the pars- file, or the default table in _compute_resample_kwargs wouldn't overwrite the user provided option).

@@ -58,8 +55,10 @@ class ResampleStep(Step):
blendheaders = boolean(default=True)
allowed_memory = float(default=None) # Fraction of memory to use for the combined image.
in_memory = boolean(default=True)
ref_pars_table = list(default=list()) # parameter 'table' read from pars-resample
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need any of this, as @jdavies-st already mentioned. stpipe automatically goes out and finds and loads an appropriate pars ref file, if one exists, and returns the step argument values from them. Or was this necessary in order to allow for having a list/table of different possible values for the params, based on the number of input images?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your last sentence is spot on. This was needed because the number of input images is not provided to crds during the pars- file selection.

@hbushouse
Copy link
Collaborator

Until we're able to sort out how multiple values could be implemented within a pars ref file, I think the thing to do in the meantime is to stick with the use of drizpars, get the instrument teams to deliver updated drizpars with appropriate settings for their instrument modes, and remove the default values that are currently hardwired in the resample_step code so that the drizpars values get used instead.

@braingram braingram closed this Feb 15, 2024
@hbushouse
Copy link
Collaborator

@braingram While it doesn't seem feasible to store and retrieve multiple parameter values from a parameter reference file, I'm wondering if there's a way that we could go the slightly less convenient route of having multiple pars ref files in CRDS, with NUM_IMAGES being one of the selection criteria? Many pars ref files currently use selection criteria that are based on meta data in the datamodel being processed (same as regular reference files). So the problem to be worked around in this case is the fact that the number of images being processed is not (currently) a meta attribute in the input datamodel (or ModelContainer) being passed into the resample/resample_spec steps.

Given that the primary argument to the stpipe Spec class get_reference_file method is the input datamodel, would it be possible to somehow (at least temporarily) add the number of images into the meta data of that datamodel before it is used in the call that retrieves the parameter reference file? This is especially tricky due to the fact that the call to find and load a step's parameter reference file takes place at the time the step instance is created, before you even make it to the point of control being passed to the process method within the step module itself. If it were a regular reference file, where the call to get_reference_file is made somewhere within the step's process method, we could easily add that extra bit of meta data to the datamodel before making the call to get_reference_file. But in the case of a parameter ref file, that call is made here (https://github.com/spacetelescope/stpipe/blob/main/src/stpipe/step.py#L865) within the get_config_from_reference method that's called during Step creation and startup.

Any brilliant ideas? Or do we just brute force it by letting the Step instance make the call as it does now, but we ignore the results and call it again explicitly from within the resample_step process method after having added the number of images as a meta attribute?

@jdavies-st
Copy link
Collaborator

I would argue that having the output resample scale (arcsec per pixel) change or pixel shrinking on the input images before drizzling in standard DMS processing based on number of input images is not what you want.

First, you don't know the dither pattern based just on the number of images, and you don't know the science use case. Different dither patterns will sample the pixel phase differently. The number of overlapping images will be different than the number of images in the modelcontainer (NIRCam long vs short), and in the end, anything other than pixfrac = 1.0 (standard geometric overlap distribution of the input flux to the output flux) or pixfrac = 0 (uncorrelated noise in output) is essentially doing Photoshop unsharp mask on the images. Yes, you can do it, but it is not scientifically supported in terms of its effect on the PSF, correlated uncertainties, etc in the output image. It makes strong assumptions about the distribution of flux within the pixel which is simply not correct unless you know something about the shape (gradient of flux) in the underlying source on sky.

Currently the above drizpars reffile shrinks pixels (pixfrac < 1), but there is no change the output scale - complete garbage, and not what anyone would do for real science. If anything, one does not shrink pixels (pixfrac=1) but does rescale the pixel size of the output to match to some other band, say to match NIRCam short and long on the same pixel scale.

The whole reason people played with pixfrac to begin with is because HST pixels severely undersampled the PSF, and for stars, one knows something about the PSF. This is not the case for NIRCam and extragalactic fields. And while we can argue about its validity in fiddling with pixfrac for JWST NIRISS and NIRSpec, which are undersampled, I don't think anyone has shown the effects of such assumptions on the chunky JWST PSF. In the end, best not to go there and instead forward-model whatever it is you're trying to do.

The standard pipeline should not assume particular science scenes (stars, galaxies, exoplanets) for general processing, but should produce products that are those that are best understood and best calibrated, and for this, no input pixel shrinking and no pixel scaling in the output make the most sense, as we currently do.

Scientists are always free to disagree and reprocess their data with custom pipeline parameters, as they all already do.

@braingram
Copy link
Collaborator Author

Any brilliant ideas? Or do we just brute force it by letting the Step instance make the call as it does now, but we ignore the results and call it again explicitly from within the resample_step process method after having added the number of images as a meta attribute?

If the goal is to use NUM_IMAGES for pars- file selection I think there are benefits to allowing the "usual" routines to do the selection (so the log messages match the behavior, etc). I think this would mean patching NUM_IMAGES into the metadata when the ModelContainer is opened to call get_crds_parameters on the first model here:
https://github.com/spacetelescope/stpipe/blob/aea1617c797c173b8f9b994a5007bc00d060a897/src/stpipe/step.py#L843-L848
It seems pretty reasonable to add NUM_IMAGES to crds_parameters if the opened model is a container (a quick search of jwst and romancal returns no results for this string so it looks safe to use without conflicting with other metadata).

@hbushouse
Copy link
Collaborator

@braingram @jdavies-st Thanks for all of your comments. Given the still semi-difficult and off-nominal procedure that would be needed in order to select pars ref files based on a non-meta attribute like num_images, plus all of the arguments as to why it's a bad idea to be blindly using different values of pixfrac in the pipeline anyway, I'm inclined to leave all of this as a theoretical exercise for now and make the recommendation to the JP coord team that we 1) go ahead and do away with drizpars ref files in favor of a pars-resamplestep ref file, and 2) only use a single version of a pars ref file in automated pipeline processing, which will NOT be selected based on something like num_images. If there are real needs for some NIRISS users to have different values of something like pixfrac or output pixel scale, then they can accomplish that on their own in off-line reprocessing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants