-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commissioning fixes and refactoring for re-extraction using the frontend UI. #30
Conversation
…re-extraction possible. Not fully tested yet.
…length bug fixes.
…f the bugfixes along the way.
@@ -20,12 +20,6 @@ | |||
'line_source': 'Hg', | |||
'line_notes': '' | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed some lines in the wavelength solution that were blended. It's fine to include blended lines when we are doing the simultaneous fit of all the lines, but the initial centroiding per line to get the initial wavelength solution parameters can have catastrophic failures. I would like to add a category of lines in the future that doesn't centroid initially but does include the line in the full solution fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to recognize a catastrophic failure reliably without human intervention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not so far. I would like to wait on that as a future improvement as we get some experience with how the data quality looks.
@@ -0,0 +1,106 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote the background subtraction stage entirely (and moved it to the background.py file). I have separated this logic from the extraction logic to make it more modular to replace and test. I tried a wide variety of bsplines and two fits here without success. The scipy bplines either had significant issues with the number of points we are fitting in the whole 2d frame or could not capture the variation near sky line edges (the key reason to use 2d fits from Kelson 2003). I was originally using 2d polynomials, but to get the order high enough to capture the variation in the skyline edges, I was introducing significant ringing in the fits (to the point of oscillating between positive and negative values in the data). I am now doing something closer to what IRAF did, interpolating the background regions on the wavelength bin centers, fitting a 1d polynomial, and interpolating back on the original wavelengths to subtract per pixel. In this way, it is only the background model that is interpolated and not the pixel values themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be good to capture somewhere in the documentation.
Maybe a docstring?
Someone is going to look at this in a few years, and should appreciate the pain of your journey and hesitate to repeat it.
@@ -0,0 +1,9 @@ | |||
from banzai.stages import Stage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I broke the data binning into its own stage to make testing easier and more modular.
|
||
|
||
logger = get_logger() | ||
|
||
|
||
def profile_gauss_fixed_width(params, x, sigma): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the profile fitting and background fitting into their own files to make the code easier to find in the future.
return [Table({'center': (edges[right_cut] + edges[left_cut]) / 2.0, | ||
'width': edges[right_cut] - edges[left_cut]}) | ||
for edges, (right_cut, left_cut) in zip(bin_edges, cuts)] | ||
def set_extraction_region(image): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a helper function to turn the extraction windows into pixels that I include in the extraction region to make that logic more testable.
wavelength_bin_width = data_to_sum['wavelength_bin_width'][0] | ||
order_id = data_to_sum['order'][0] | ||
# This should be equivalent to Horne 1986 optimal extraction | ||
flux = data_to_sum['data'] - data_to_sum['background'] | ||
flux *= data_to_sum['weights'] | ||
flux *= data_to_sum['uncertainty'] ** -2 | ||
flux = np.sum(flux) | ||
flux_normalization = np.sum(data_to_sum['weights']**2 * data_to_sum['uncertainty']**-2) | ||
flux = np.sum(flux[data_to_sum['extraction_window']]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now only extract over certain pixels rather than the whole slit. This limits us from adding pixels that only include noise.
super().save_processing_metadata(context) | ||
if 'REDUCER' not in self.meta: | ||
self.meta['REDUCER'] = 'BANZAI' | ||
|
||
@property | ||
def profile(self): | ||
return self['PROFILE'].data | ||
|
||
@profile.setter | ||
def profile(self, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed how we load previous profile fits. Most of this logic existed already, but was spread across multiple files. I centralized it here.
@@ -97,13 +98,16 @@ class FringeCorrector(Stage): | |||
def do_stage(self, image): | |||
# Only divide the fringe out where the divisor is > 0.1 so we don't amplify | |||
# artifacts due to the edge of the slit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fringing stage takes a long time so I added some logs to tell the user it is still alive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long is a long time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a minute?
@@ -350,7 +350,7 @@ def optimize_match_filter(initial_guess, data, error, weights_function, x, weigh | |||
args=(data, error, weights_function, | |||
weights_jacobian_function, | |||
weights_hessian_function, x, *args), | |||
method='Powell', bounds=bounds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a rather nasty bug in scipy that using the Powell method with bounds on the parameters can give you the wrong answer even with a good initial guess. What's worse is the success flag on the fit says success even though it definitely isn't. The fitter I've chosen is the default in scipy.minimize.
@@ -404,7 +404,9 @@ class OrderSolver(Stage): | |||
ORDER_HEIGHT = 93 | |||
CENTER_CUT_WIDTH = 31 | |||
POLYNOMIAL_ORDER = 3 | |||
ORDER_REGIONS = [(0, 1700), (630, 1975)] | |||
|
|||
ORDER_REGIONS = {'ogg': [(0, 1550), (500, 1835)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added site specific order regions as a starting guess since the order placement on the chip is different at each site. Previously this manifested in weird ways when you would start dealing with data off one of the slits for one of the sites because the defaults had been set for the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a danger point if we update the instruments at all.
We'll need to make sure updating these values doesn't get missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be runtime config in the helm chart to make that process easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You certainly aren't wrong that this is a stumbling point. We have only changed cameras in floyds once in the last decade (literally) so I'm hesitant to spend the effort now to come up with a general fix for this problem. I'll put it on the longer term nice to have features list.
@@ -0,0 +1,134 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much of this is refactoring. I've tried to make things more explicit between when we are fitting sigma and when we are fitting the fwhm as we had not been consistent leading to issues. I did also increase the fitting window for fitting the profile width as we explicitly need to get constraints on the background. There is a degeneracy between width and the total flux if you don't estimate the background level.
'banzai_floyds.extract.Extractor', | ||
'banzai_floyds.trim.Trimmer', | ||
# 'banzai_floyds.trim.Trimmer', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was previously chopping off the ends of spectra. I think that was masking issues, so I've removed that for now.
@@ -5,15 +5,18 @@ | |||
'banzai.trim.Trimmer', | |||
'banzai.gain.GainNormalizer', | |||
'banzai.uncertainty.PoissonInitializer', | |||
'banzai.cosmic.CosmicRayDetector', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've nominally added cosmic ray detection but we don't do much with that information yet.
@@ -0,0 +1,77 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the order center polynomials as a result of doing a manual data reduction on our test data set to be used in the e2e tests for comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file contains the pixel to pixel results of manual wavelength fits in the manual reduction jupyter notebook to be used in the e2e tests as comparison/expected values.
@@ -0,0 +1,96 @@ | |||
from banzai_floyds.background import fit_background, set_background_region, BackgroundFitter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly a refactor. We now test against all pixels in the order except the outer edge (2 pixels wide). Due to the interpolation limitations, these pixels can be bad, but given they are away from the extraction, we don't really mind.
from collections import namedtuple | ||
import numpy as np | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test checks that we have the expected number of pixels in each of our bins after binning. It has been quite difficult to define tests to ensure that our binning is behaving correctly for the full dataset.
@@ -107,6 +113,25 @@ def test_that_order_mask_exists(self): | |||
# Note there are only two orders in floyds | |||
assert np.max(hdu['ORDERS'].data) == 2 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our original e2e tests only included logic that the files were created but nothing about the actual reduction quality. Here we add tests that we identified the centers of the orders correctly and that our wavelength solution matches a a solution I made by hand in the manual reduction jupyter notebook.
from banzai_floyds.utils.fitting_utils import fwhm_to_sigma | ||
from astropy.table import Table | ||
from numpy.polynomial.legendre import Legendre | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactoring out tests to more aptly named files.
# The extraction should be +- 5 pixels high so there should be 11 pixels in the extraction region | ||
for order in [1, 2]: | ||
in_order = fake_data.binned_data['order'] == order | ||
assert np.sum(fake_data.binned_data['extraction_window'][in_order]) == 11 * nx | ||
|
||
|
||
def test_extraction(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've removed a lot of the dependencies here when testing for extraction. This should be equivalent albeit more clean method to extraction.
@@ -63,7 +63,7 @@ def test_create_super_fringe(): | |||
|
|||
|
|||
def test_correct_fringe(): | |||
np.random.seed(291523) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silly numerical things near the tolerance limits of the test.
@@ -0,0 +1,29 @@ | |||
from banzai_floyds.profile import fit_profile_centers, fit_profile_sigma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just a refactor to move things here.
@@ -45,24 +45,24 @@ def test_linear_wavelength_solution(): | |||
np.random.seed(890154) | |||
min_wavelength = 3200 | |||
dispersion = 2.5 | |||
line_width = 3 | |||
line_sigma = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being more consistent between fwhm and sigma.
|
||
fit_list = refine_peak_centers(input_spectrum, 0.01 * np.ones_like(input_spectrum), | ||
recovered_peaks, sigma_to_fwhm(line_width)) | ||
recovered_peaks, sigma_to_fwhm(line_sigma)) | ||
|
||
# Need to figure out how to handle blurred lines and overlapping peaks. | ||
for fit in fit_list: | ||
assert np.min(abs(test_lines - fit)) < 0.2 | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously our unit test didn't include noise and worked fine. Once we got noisy data, we discovered a bug so this tests against that regression.
input_fringe_shift = fringe_offset | ||
|
||
order1 = Legendre((135.4, 81.8, 45.2, -11.4), domain=(0, 1700)) | ||
order2 = Legendre((410, 17, 63, -12), domain=(475, 1975)) | ||
order2 = Legendre((380, 17, 63, -12), domain=(475, 1975)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discovered the top order was falling off the image. Funny things happen when the order region does not have the same number of y pixels for each x.
@@ -254,6 +248,23 @@ def generate_fake_extracted_frame(do_telluric=False, do_sensitivity=True): | |||
return frame | |||
|
|||
|
|||
def load_manual_region(region_filename, site_id, order_id, shape, order_height): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a utility function for loading the results of the manual order center fits in the e2e tests.
return binned_data.group_by(('order', 'wavelength_bin')) | ||
|
||
|
||
def combine_wavelength_bins(wavelength_bins): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only a refactor, but I'm pretty sure this logic is wrong. It is not used yet and will be the subject of a future PR.
results['fluxrawerr'].append(uncertainty) | ||
results['wavelength'].append(wavelength_bin) | ||
results['binwidth'].append(wavelength_bin_width) | ||
results['order'].append(order_id) | ||
return Table(results) | ||
|
||
|
||
def combine_wavelegnth_bins(wavelength_bins): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor to a different file. See below.
return profile_data | ||
|
||
|
||
def load_profile_fits(hdu): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a utility function that loads a profile into our internal format from the output saved version in a fits file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have all of our tests including util tests in the main banzai_floyds/tests so we don't need this folder/package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the notebook to reproduce my manual reduction of the test data set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is used to run the banzai-floyds pipeline to characterize the quality of our reductions.
# We assert that only edge pixels can vary by 5 sigma due to edge effects | ||
for order in [1, 2]: | ||
order_region = get_order_2d_region(fake_frame.orders.data == order) | ||
residuals = fake_frame.background[order_region][-2:2, -2:2] - fake_frame.input_sky[order_region][-2:2, -2:2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, I was trying to require all pixels to have the residuals in the background fit be below some threshold. Given the large dynamic range in signal to noise that is not the correct metric. Instead, if we are fitting to the noise, the residuals/uncertainty should follow a unit gaussian distribution. So we now check that are only 1% of outliers larger than 3 sigma (similar to a gaussian) and no wild outliers < 5 sigma (skipping the edges which can have weird artifacts.)
…1 macs but conda installing it works just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only about half way through, but might not finish before tomorrow.
@@ -20,12 +20,6 @@ | |||
'line_source': 'Hg', | |||
'line_notes': '' | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to recognize a catastrophic failure reliably without human intervention?
@@ -0,0 +1,106 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be good to capture somewhere in the documentation.
Maybe a docstring?
Someone is going to look at this in a few years, and should appreciate the pain of your journey and hesitate to repeat it.
@@ -97,13 +98,16 @@ class FringeCorrector(Stage): | |||
def do_stage(self, image): | |||
# Only divide the fringe out where the divisor is > 0.1 so we don't amplify | |||
# artifacts due to the edge of the slit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long is a long time?
@@ -404,7 +404,9 @@ class OrderSolver(Stage): | |||
ORDER_HEIGHT = 93 | |||
CENTER_CUT_WIDTH = 31 | |||
POLYNOMIAL_ORDER = 3 | |||
ORDER_REGIONS = [(0, 1700), (630, 1975)] | |||
|
|||
ORDER_REGIONS = {'ogg': [(0, 1550), (500, 1835)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a danger point if we update the instruments at all.
We'll need to make sure updating these values doesn't get missed.
@@ -404,7 +404,9 @@ class OrderSolver(Stage): | |||
ORDER_HEIGHT = 93 | |||
CENTER_CUT_WIDTH = 31 | |||
POLYNOMIAL_ORDER = 3 | |||
ORDER_REGIONS = [(0, 1700), (630, 1975)] | |||
|
|||
ORDER_REGIONS = {'ogg': [(0, 1550), (500, 1835)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be runtime config in the helm chart to make that process easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This PR includes a host of fixes and refactors to work with the front end and to repair issues discovered during commissioning. The pr has many pieces so I will comment on sections of the code in line to explain the logic behind them. Note the bulk of the added lines is a new file of manual reduction results in json form.