Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API re-design: simplify interface, remove I/O, make instrument agnostic #134

Merged
merged 30 commits into from
Oct 23, 2024

Conversation

mcara
Copy link
Member

@mcara mcara commented Jan 19, 2024

Closes #144
Closes #133
Closes #130
Closes #125

WARNING: This first commit is the first draft and it is definitely not ready for merging and unit tests have not been updated. Also, documentation is incomplete.

This PR quite radically changes the API of this package with intention to make this package independent of how data are stored (FITS files, asdf, etc.) and the instrument used (JWST, HST, etc.) and also to reduce the number of functions and duplication and the number of conflicting parameters (or parameters that are changing meaning in different functions).

If some items on the TODO list (in the code) are implemented such as support for weight map of the input image to be None, then these changes would result in performance increase for EXPTIME, EXPSQ weighting schemes (as opposite to IVM). I would argue that we go even further with this in not having two parameters for input weights: weight_map and wht_scale. I think these should combine into a single weight which can be either an array or a scalar. IMO, having two parameters is an unnecessary complication.

BUG FIX: exposure time was undefined when in_units were not cps. That is, drizzle was producing random outputs when input image units were, let's say 'counts'.

BUG FIX: tdriz signature was out of date.

It is also worth mentioning this issue that probably should be handled in this PR too: spacetelescope/drizzlepac#1722

Copy link

codecov bot commented Jan 19, 2024

Codecov Report

Attention: Patch coverage is 98.16514% with 4 lines in your changes missing coverage. Please review.

Project coverage is 97.27%. Comparing base (0104065) to head (f78639f).
Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
drizzle/util.py 0.00% 3 Missing ⚠️
drizzle/utils.py 98.14% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #134       +/-   ##
===========================================
+ Coverage   75.29%   97.27%   +21.98%     
===========================================
  Files           7        3        -4     
  Lines         344      220      -124     
===========================================
- Hits          259      214       -45     
+ Misses         85        6       -79     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mcara
Copy link
Member Author

mcara commented Jan 23, 2024

Example usage:

# wcs1 - WCS of the input image usually with distortions (to be resampled)
# wcs2 - WCS of the output image without distortions

import numpy as np

# simulate some data and a pixel map:
data = np.ones((240, 570))
y, x = np.indices((240, 570), dtype=np.float64)
pixmap = np.dstack([x, y])
# or call:
# from drizzle.utils import calc_pixmap
# pixmap = calc_pixmap(wcs1, wcs2)

# actual drizzle work
from drizzle.resample import Drizzle
d = Drizzle(out_shape=(240, 570))
d.add_image(data, exptime=1234, pixmap=pixmap)

# access outputs:
d.out_sci
d.out_ctx
d.out_wht

@mcara mcara force-pushed the redesign-api branch 2 times, most recently from b050a6e to 7e40598 Compare January 23, 2024 19:06
@perrygreenfield
Copy link

Could there be an option to fold the pixmap computation inside the the call to add_image? This would be useful in cases where the pixmap is very large and would allow the method to do it for subarrays in cases of very large images (and complex GWCS models).

@mcara
Copy link
Member Author

mcara commented Jan 29, 2024

Could there be an option to fold the pixmap computation inside the the call to add_image? This would be useful in cases where the pixmap is very large and would allow the method to do it for subarrays in cases of very large images (and complex GWCS models).

This is one possible approach to deal with large images. There also could be a different function (i.e., add_block()) that could deal with blocks of input images or caller could provide a callback, etc. I believe we should add parameters as we introduce the feature that needs those parameters. So, we could possibly add this later when we will support subarrays if this will be deemed to be the best approach.

The beauty of using pixmap is that it is completely independent on how WCS are computed and also keep in mind that pixmap size is the same as that of the input image and not of the output image and so maybe it is not that critical to be able to split input images into chunks.

out_sci : 2d array of float32, optional
A 2D numpy array containing the output image produced by
drizzling. On the first call it should be set to zero.
Subsequent calls it will hold the intermediate results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would resampling on a predefined image work, i.e. there is a resampled image from a previous run and we want to drizzle another image on it. Does this reseting to zero exclude the use case?

Copy link
Member Author

@mcara mcara Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is old documentation that applies to the "traditional" workflow in which we start with an image that is all 0s or NaNs and keep adding inputs as needed. Initial out_sci does not have to be all 0s: it can be the previous "end result" upon which we can resample more inputs. In other words, we can restart the process from were we stopped.

Copy link

@perrygreenfield perrygreenfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks very good, but I have some comments.

drizzle/utils.py Outdated
# from https://trs.jpl.nasa.gov/handle/2014/40409
if refpix is None:
if wcs.bounding_box is None:
refpix = np.ones(wcs.pixel_n_dim)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this line. In the absence of refpix or a bounding box, wouldn't the center of the pixel array shape be the usual default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any pixel could work. But you are right: more conventional is to choose the center. I will update this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more conventional is to choose the center

That's of course if one knows what the bounding box is.

return pixmap


def _estimate_pixel_scale(wcs, refpix):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be a comment that says that the pixel scale assumes approximately equal scales for each axis? There could be odd detectors that this isn't true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I mentioned, this PR is at "conceptual" stage at this moment. Just to get an idea that proposed API changes look OK. More comments will be added. Documentation was only partially updated. For some parameters it is wrong.

from . import cdrizzle


SUPPORTED_DRIZZLE_KERNELS = [

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the deletion of most of these kernels being held for a later PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since no one approved on that JIRA issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tophat was deprecated. I'd say remove it.

"'out_shape' cannot be None when all output arrays are "
"None."
)
out_shape = shapes[0]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused either about what sets allow or the code. I didn't think sets could be indexed. While I'm mentioning that, are the various "out" arrays permitted to have different shapes? If so, see comment below. Ah, apparently not. Maybe a comment is warranted when the set is created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mistake. In all other places I used pop. Will be fixed in the next commit.

def blot_image(data, pixmap, pix_ratio, exptime, output_pixel_shape,
interp='poly5', sinscl=1.0):
"""
Resample input image using interpolation to an output grid.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this confusing in that the term input seems ambiguous. Is that the input of the original image, or the input of this function. I would start saying that this is intended to do the inverse resampling that drizzle does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are comments from the existing code. I did not modify them at this moment. I only moved the blot_image out of the Drizzle class as it does not belong to drizzle logically and does not need to be in a class.

I do not know who came up with the term blot but it is definitely not me.

this is intended to do the inverse resampling that drizzle does.

I disagree. I think there is nothing "inverse" in here. Resampling is just resampling no matter the direction. Drizzle mixes values from several input pixels into one output pixel. The term "inverse" would suggest un-mingling those values to get the input (to drizzle) image, like Lucy-Richardson deconvolution.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think that an inverse mapping is tightly associated with the term "blot" (I'm not sure why it is called that either). Perhaps a round tripping is the right term, where it is understood that it is not a pure inverse. So I think my comment stands to make the issue clearer since those using blot are almost always using the inverse mapping, aren't they? If you got rid of blot and forced people to use drizzling directly, then this wouldn't be an issue. But there is a history to this term.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resampling algorithms (interpolating, like "blot", or drizzle) are all "one way" in the sense that you cannot apply it twice with "inverse mapping" (i.e., inverted pixmap) and expect things to "round-trip". Round-tripping will work with "blot" but only if image features can be perfectly (locally) modeled by linear/poly3/poly5 functions. With these kinds of images, in principle, we could use blot to resample from one frame to another and back. Reverse transformation in this case would be performed by exactly the same "blot" algorithm (and code). The only difference is in pixmap. Pixmap is always from the original (image's) coordinate system to the output (destination) coordinate system.

In any case,

I would start saying that this is intended to do the inverse resampling that drizzle does.

IMO is fundamentally incorrect: one cannot do "inverse resampling" of drizzled images using interpolation (possibly except for point kernel in the absence of distortions, etc.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"inverse resampling" suggests inverse operation to drizzle which is not what blot is doing. "blot" can be used both for forward (meaning resampling a distorted image onto an undistorted frame) transform and backward transform (from the distortion-corrected frame to the distorted frame). We usually use it for the "backward" transformation but it is not the same as "inverse": the final image is not the initial image used as input to drizzle.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are talking at cross-purposes. What I mean is that blot is usually taken as mapping in the opposite direction to the original sampling. It is not an inverse of the original resampling, for sure. The point I was making was that there is ambiguity as to what input means, i.e., the original input image used for the original resampling, or the product in the destination pixel coordinate system to be mapped back to the original pixel coordinate system. In other words, with blot one could be confused as to which array is being referred to. It would be nice to clarify that in this case the input is the image generated from the original resampling (or whatever words describe that best). Maybe the current wording is obvious to everyone else, but it wasn't for me. Perhaps we should take a vote :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the docs can be improved here.

data : 2D array
Input numpy array of the source image in units of 'cps'.

pixmap : 3D array

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same array as used for drizzle or one that maps the other direction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is "inverse" map: it maps from the wcs_from to wcs_to.

pixmap : 3D array
The world coordinate system to resample on.

output_pixel_shape : tuple of int

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar ambiguity here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to what?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Perry

Copy link

@perrygreenfield perrygreenfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the comment on one of docstrings, this looks good. I don't want to hold that up due to that comment, which can always be addressed (or not) later.

CTX_PLANE_BITS = 32


class Drizzle():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class Drizzle():
class Drizzle:

if len(out_ctx.shape) == 3:
shapes.add(out_ctx.shape[1:])
else:
shapes.add(out_ctx.shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the above if .... else use out_ctx.shape[-2:]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️

else:
expscale = exptime

self.exptime += exptime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It is an output for the user.

drizzle/utils.py Outdated
# it afterwards because wcs co-ordinates are one based, while pixel co-ordinates
# are zero based, The result is the final values in pixmap give a tranformation
# between the pixel co-ordinates in the first image to pixel co-ordinates in the
# co-ordinate system of the second.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the above comment be deleted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course it will be deleted. At this stage this PR is about API only. However, since it looks like you and @perrygreenfield approve this API redesign, I will now work on cleaning up comments and docstrings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's time to remove this comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about this. will be fixed

drizzle/utils.py Outdated
if wcs_from.pixel_shape is None:
raise ValueError('The "from" WCS must have pixel_shape property set.')
y, x = np.indices(wcs_from.pixel_shape, dtype=np.float64)
x, y = wcs_to.invert(*wcs_from(x, y))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's better to use the common API here instead of using gwcs.WCS methods.

@mcara mcara marked this pull request as ready for review March 4, 2024 16:02
@mcara mcara requested a review from a team as a code owner March 4, 2024 16:02
@mcara
Copy link
Member Author

mcara commented Mar 4, 2024

Section "The Drizzle Library" in the README.rst file has been removed because code examples are no longer valid/up-to-date with API changes. Updated examples should be provided either as a separate commit in this PR or a separate PR.

Copy link

@perrygreenfield perrygreenfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor question that shouldn't affect approval.

@@ -110,7 +110,7 @@ def centroid_statistics(title, fname, image1, image2, amp, size):
distances = centroid_distances(image1, image2, amp, size)
indexes = (0, int(len(distances) / 2), len(distances) - 1)
fd = open(fname, 'w')
fd.write("*** %s ***\n" % title)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this and following cases is it necessary to add the type to the variable substitution when the variable is already that type? (Or is there a style commandment to always add type to these?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not necessary but doing so is closest to the original code and probably is not needed, especially for the s format (f is different though).

@mcara mcara force-pushed the redesign-api branch 5 times, most recently from 416232c to 781c317 Compare March 7, 2024 08:58

.. warning::
The "tophat" kernel was deprecated and it will be removed in a
future release.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say just remove tophat in this PR.

drizzle/utils.py Outdated
# it afterwards because wcs co-ordinates are one based, while pixel co-ordinates
# are zero based, The result is the final values in pixmap give a tranformation
# between the pixel co-ordinates in the first image to pixel co-ordinates in the
# co-ordinate system of the second.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's time to remove this comment?

drizzle/utils.py Outdated
_estimate_pixel_scale(wcs_from, refpix_from))
return pixmap, pscale_ratio

return pixmap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function returns a different number of outputs based on the parameters that are passed. How is downstream code to deal with this? What is the benefit of adding this option here?

from . import cdrizzle


SUPPORTED_DRIZZLE_KERNELS = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tophat was deprecated. I'd say remove it.

A class for managing resampling and co-adding of multiple images onto a
common output grid. The main method of this class is :py:meth:`add_image`.
The main functionality of this class is to resample and co-add multiple
images onto one output image. In the simplest terms, it redistributes flux
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere here it needs to mention that the resampling is performed using the drizzle algorithm.

out_img=out_img,
out_wht=out_wht,
out_ctx=out_ctx,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can create a Drizzle object with default values of parameters. In this case the _alloc_output_arrays function does not work as expected. It does not generate arrays and does not raise an error.


if out_img is not None:
out_img = np.asarray(out_img, dtype=np.float32)
shapes.add(out_img.shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with not handling the case of out_img is None is that one can generate a default object which does not have an attribute Drizzle._out_img .

def _alloc_output_arrays(self, out_shape, max_ctx_id, out_img, out_wht,
out_ctx):
if out_shape is None:
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps take the check outside this function.

def blot_image(data, pixmap, pix_ratio, exptime, output_pixel_shape,
interp='poly5', sinscl=1.0):
"""
Resample input image using interpolation to an output grid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the docs can be improved here.

pixmap : 3D array
The world coordinate system to resample on.

output_pixel_shape : tuple of int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Perry

@mcara mcara marked this pull request as draft October 15, 2024 22:32
@mcara mcara marked this pull request as ready for review October 15, 2024 23:21
@mcara
Copy link
Member Author

mcara commented Oct 18, 2024

Regression tests using this branch and jwst pipeline main are running here: https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1801/

These tests were failing because util.py was removed (the only call used by pipelines was to is_blank()). I restored it in 9fece19 and re-run regression test here:
https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1802/

@mcara
Copy link
Member Author

mcara commented Oct 18, 2024

Regression tests on romancal are running here: https://github.com/spacetelescope/RegressionTests/actions/runs/11410576103

@mcara mcara force-pushed the redesign-api branch 2 times, most recently from 3c799cf to a003f84 Compare October 23, 2024 00:31
@mcara
Copy link
Member Author

mcara commented Oct 23, 2024

@mcara
Copy link
Member Author

mcara commented Oct 23, 2024

Regression tests agains released version of Roman pipeline:

With released drizzle: https://github.com/spacetelescope/RegressionTests/actions/runs/11483667508
With this PR: https://github.com/spacetelescope/RegressionTests/actions/runs/11482465914

Both tests seem to have the same 28 failures, mostly in dark current tests and related to missing data files (probably regression tests have evolved meanwhile).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing __all__ in modules Missing imports in __init__.py
3 participants