Commissioning fixes and refactoring for re-extraction using the frontend UI. #30

cmccully · 2024-10-30T15:05:26Z

This PR includes a host of fixes and refactors to work with the front end and to repair issues discovered during commissioning. The pr has many pieces so I will comment on sections of the code in line to explain the logic behind them. Note the bulk of the added lines is a new file of manual reduction results in json form.

…re-extraction possible. Not fully tested yet.

…length bug fixes.

…f the bugfixes along the way.

cmccully · 2024-10-30T15:07:31Z

banzai_floyds/arc_lines.py

@@ -20,12 +20,6 @@
        'line_source': 'Hg',
        'line_notes': ''
    },


I removed some lines in the wavelength solution that were blended. It's fine to include blended lines when we are doing the simultaneous fit of all the lines, but the initial centroiding per line to get the initial wavelength solution parameters can have catastrophic failures. I would like to add a category of lines in the future that doesn't centroid initially but does include the line in the full solution fit.

Is there a way to recognize a catastrophic failure reliably without human intervention?

Not so far. I would like to wait on that as a future improvement as we get some experience with how the data quality looks.

cmccully · 2024-10-30T15:12:34Z

banzai_floyds/background.py

@@ -0,0 +1,106 @@
+import numpy as np


I rewrote the background subtraction stage entirely (and moved it to the background.py file). I have separated this logic from the extraction logic to make it more modular to replace and test. I tried a wide variety of bsplines and two fits here without success. The scipy bplines either had significant issues with the number of points we are fitting in the whole 2d frame or could not capture the variation near sky line edges (the key reason to use 2d fits from Kelson 2003). I was originally using 2d polynomials, but to get the order high enough to capture the variation in the skyline edges, I was introducing significant ringing in the fits (to the point of oscillating between positive and negative values in the data). I am now doing something closer to what IRAF did, interpolating the background regions on the wavelength bin centers, fitting a 1d polynomial, and interpolating back on the original wavelengths to subtract per pixel. In this way, it is only the background model that is interpolated and not the pixel values themselves.

This would be good to capture somewhere in the documentation.
Maybe a docstring?
Someone is going to look at this in a few years, and should appreciate the pain of your journey and hesitate to repeat it.

cmccully · 2024-10-30T15:13:09Z

banzai_floyds/binning.py

@@ -0,0 +1,9 @@
+from banzai.stages import Stage


I broke the data binning into its own stage to make testing easier and more modular.

cmccully · 2024-10-30T15:13:49Z

banzai_floyds/extract.py



 logger = get_logger()


-def profile_gauss_fixed_width(params, x, sigma):


I refactored the profile fitting and background fitting into their own files to make the code easier to find in the future.

cmccully · 2024-10-30T15:14:35Z

banzai_floyds/extract.py

-    return [Table({'center': (edges[right_cut] + edges[left_cut]) / 2.0,
-                   'width': edges[right_cut] - edges[left_cut]})
-            for edges, (right_cut, left_cut) in zip(bin_edges, cuts)]
+def set_extraction_region(image):


I added a helper function to turn the extraction windows into pixels that I include in the extraction region to make that logic more testable.

cmccully · 2024-10-30T15:15:21Z

banzai_floyds/extract.py

        wavelength_bin_width = data_to_sum['wavelength_bin_width'][0]
        order_id = data_to_sum['order'][0]
        # This should be equivalent to Horne 1986 optimal extraction
        flux = data_to_sum['data'] - data_to_sum['background']
        flux *= data_to_sum['weights']
        flux *= data_to_sum['uncertainty'] ** -2
-        flux = np.sum(flux)
-        flux_normalization = np.sum(data_to_sum['weights']**2 * data_to_sum['uncertainty']**-2)
+        flux = np.sum(flux[data_to_sum['extraction_window']])


We now only extract over certain pixels rather than the whole slit. This limits us from adding pixels that only include noise.

cmccully · 2024-10-30T15:16:55Z

banzai_floyds/frames.py

+        super().save_processing_metadata(context)
+        if 'REDUCER' not in self.meta:
+            self.meta['REDUCER'] = 'BANZAI'
+
    @property
    def profile(self):
        return self['PROFILE'].data

    @profile.setter
    def profile(self, value):


I have changed how we load previous profile fits. Most of this logic existed already, but was spread across multiple files. I centralized it here.

cmccully · 2024-10-30T15:17:38Z

banzai_floyds/fringe.py

@@ -97,13 +98,16 @@ class FringeCorrector(Stage):
    def do_stage(self, image):
        # Only divide the fringe out where the divisor is > 0.1 so we don't amplify
        # artifacts due to the edge of the slit


The fringing stage takes a long time so I added some logs to tell the user it is still alive.

How long is a long time?

Maybe a minute?

cmccully · 2024-10-30T15:18:48Z

banzai_floyds/matched_filter.py

@@ -350,7 +350,7 @@ def optimize_match_filter(initial_guess, data, error, weights_function, x, weigh
                                     args=(data, error, weights_function,
                                           weights_jacobian_function,
                                           weights_hessian_function, x, *args),
-                                     method='Powell', bounds=bounds)


I found a rather nasty bug in scipy that using the Powell method with bounds on the parameters can give you the wrong answer even with a good initial guess. What's worse is the success flag on the fit says success even though it definitely isn't. The fitter I've chosen is the default in scipy.minimize.

cmccully · 2024-10-30T15:20:22Z

banzai_floyds/orders.py

@@ -404,7 +404,9 @@ class OrderSolver(Stage):
    ORDER_HEIGHT = 93
    CENTER_CUT_WIDTH = 31
    POLYNOMIAL_ORDER = 3
-    ORDER_REGIONS = [(0, 1700), (630, 1975)]
+
+    ORDER_REGIONS = {'ogg': [(0, 1550), (500, 1835)],


I have added site specific order regions as a starting guess since the order placement on the chip is different at each site. Previously this manifested in weird ways when you would start dealing with data off one of the slits for one of the sites because the defaults had been set for the other.

This feels like a danger point if we update the instruments at all.
We'll need to make sure updating these values doesn't get missed.

This could also be runtime config in the helm chart to make that process easier?

You certainly aren't wrong that this is a stumbling point. We have only changed cameras in floyds once in the last decade (literally) so I'm hesitant to spend the effort now to come up with a general fix for this problem. I'll put it on the longer term nice to have features list.

cmccully · 2024-10-30T15:23:34Z

banzai_floyds/profile.py

@@ -0,0 +1,134 @@
+import numpy as np


Much of this is refactoring. I've tried to make things more explicit between when we are fitting sigma and when we are fitting the fwhm as we had not been consistent leading to issues. I did also increase the fitting window for fitting the profile width as we explicitly need to get constraints on the background. There is a degeneracy between width and the total flux if you don't estimate the background level.

cmccully · 2024-10-30T15:24:16Z

banzai_floyds/settings.py

    'banzai_floyds.extract.Extractor',
-    'banzai_floyds.trim.Trimmer',
+    # 'banzai_floyds.trim.Trimmer',


I was previously chopping off the ends of spectra. I think that was masking issues, so I've removed that for now.

cmccully · 2024-10-30T15:24:42Z

banzai_floyds/settings.py

@@ -5,15 +5,18 @@
    'banzai.trim.Trimmer',
    'banzai.gain.GainNormalizer',
    'banzai.uncertainty.PoissonInitializer',
+    'banzai.cosmic.CosmicRayDetector',


I've nominally added cosmic ray detection but we don't do much with that information yet.

cmccully · 2024-10-30T15:26:01Z

banzai_floyds/tests/data/orders_e2e_fits.dat

@@ -0,0 +1,77 @@
+ {


This is the order center polynomials as a result of doing a manual data reduction on our test data set to be used in the e2e tests for comparison.

cmccully · 2024-10-30T15:26:55Z

banzai_floyds/tests/data/wavelength_e2e_fits.dat

This file contains the pixel to pixel results of manual wavelength fits in the manual reduction jupyter notebook to be used in the e2e tests as comparison/expected values.

cmccully · 2024-10-30T15:28:12Z

banzai_floyds/tests/test_background.py

@@ -0,0 +1,96 @@
+from banzai_floyds.background import fit_background, set_background_region, BackgroundFitter


Mostly a refactor. We now test against all pixels in the order except the outer edge (2 pixels wide). Due to the interpolation limitations, these pixels can be bad, but given they are away from the extraction, we don't really mind.

cmccully · 2024-10-30T15:29:12Z

banzai_floyds/tests/test_binning.py

+from collections import namedtuple
+import numpy as np
+
+


This test checks that we have the expected number of pixels in each of our bins after binning. It has been quite difficult to define tests to ensure that our binning is behaving correctly for the full dataset.

cmccully · 2024-10-30T15:30:27Z

banzai_floyds/tests/test_e2e.py

@@ -107,6 +113,25 @@ def test_that_order_mask_exists(self):
            # Note there are only two orders in floyds
            assert np.max(hdu['ORDERS'].data) == 2



Our original e2e tests only included logic that the files were created but nothing about the actual reduction quality. Here we add tests that we identified the centers of the orders correctly and that our wavelength solution matches a a solution I made by hand in the manual reduction jupyter notebook.

cmccully · 2024-10-30T15:30:48Z

banzai_floyds/tests/test_extract.py

 from banzai_floyds.utils.fitting_utils import fwhm_to_sigma
+from astropy.table import Table
+from numpy.polynomial.legendre import Legendre




Refactoring out tests to more aptly named files.

cmccully · 2024-10-30T15:31:55Z

banzai_floyds/tests/test_extract.py

+    # The extraction should be +- 5 pixels high so there should be 11 pixels in the extraction region
+    for order in [1, 2]:
+        in_order = fake_data.binned_data['order'] == order
+        assert np.sum(fake_data.binned_data['extraction_window'][in_order]) == 11 * nx


 def test_extraction():


We've removed a lot of the dependencies here when testing for extraction. This should be equivalent albeit more clean method to extraction.

cmccully · 2024-10-30T15:32:21Z

banzai_floyds/tests/test_fringing.py

@@ -63,7 +63,7 @@ def test_create_super_fringe():


 def test_correct_fringe():
-    np.random.seed(291523)


Silly numerical things near the tolerance limits of the test.

cmccully · 2024-10-30T15:32:44Z

banzai_floyds/tests/test_profile.py

@@ -0,0 +1,29 @@
+from banzai_floyds.profile import fit_profile_centers, fit_profile_sigma


Mostly just a refactor to move things here.

cmccully · 2024-10-30T15:33:04Z

banzai_floyds/tests/test_wavelengths.py

@@ -45,24 +45,24 @@ def test_linear_wavelength_solution():
    np.random.seed(890154)
    min_wavelength = 3200
    dispersion = 2.5
-    line_width = 3
+    line_sigma = 3


Being more consistent between fwhm and sigma.

cmccully · 2024-10-30T15:33:55Z

banzai_floyds/tests/test_wavelengths.py


    fit_list = refine_peak_centers(input_spectrum, 0.01 * np.ones_like(input_spectrum),
-                                   recovered_peaks, sigma_to_fwhm(line_width))
+                                   recovered_peaks, sigma_to_fwhm(line_sigma))

    # Need to figure out how to handle blurred lines and overlapping peaks.
    for fit in fit_list:
        assert np.min(abs(test_lines - fit)) < 0.2




Previously our unit test didn't include noise and worked fine. Once we got noisy data, we discovered a bug so this tests against that regression.

cmccully · 2024-10-30T15:35:33Z

banzai_floyds/tests/utils.py

    input_fringe_shift = fringe_offset

    order1 = Legendre((135.4, 81.8, 45.2, -11.4), domain=(0, 1700))
-    order2 = Legendre((410, 17, 63, -12), domain=(475, 1975))
+    order2 = Legendre((380, 17, 63, -12), domain=(475, 1975))


I discovered the top order was falling off the image. Funny things happen when the order region does not have the same number of y pixels for each x.

cmccully · 2024-10-30T15:36:23Z

banzai_floyds/tests/utils.py

@@ -254,6 +248,23 @@ def generate_fake_extracted_frame(do_telluric=False, do_sensitivity=True):
    return frame


+def load_manual_region(region_filename, site_id, order_id, shape, order_height):


This is a utility function for loading the results of the manual order center fits in the e2e tests.

cmccully · 2024-10-30T15:37:35Z

banzai_floyds/utils/binning_utils.py

+    return binned_data.group_by(('order', 'wavelength_bin'))
+
+
+def combine_wavelength_bins(wavelength_bins):


This is only a refactor, but I'm pretty sure this logic is wrong. It is not used yet and will be the subject of a future PR.

cmccully · 2024-10-30T15:37:56Z

banzai_floyds/extract.py

        results['fluxrawerr'].append(uncertainty)
        results['wavelength'].append(wavelength_bin)
        results['binwidth'].append(wavelength_bin_width)
        results['order'].append(order_id)
    return Table(results)


-def combine_wavelegnth_bins(wavelength_bins):


Refactor to a different file. See below.

cmccully · 2024-10-30T15:38:32Z

banzai_floyds/utils/profile_utils.py

    return profile_data
+
+
+def load_profile_fits(hdu):


This is a utility function that loads a profile into our internal format from the output saved version in a fits file.

cmccully · 2024-10-30T15:39:39Z

banzai_floyds/utils/tests/__init__.py

We have all of our tests including util tests in the main banzai_floyds/tests so we don't need this folder/package.

cmccully · 2024-10-30T15:40:32Z

characterization_testing/ManualReduction.ipynb

This is the notebook to reproduce my manual reduction of the test data set.

cmccully · 2024-10-30T15:41:10Z

characterization_testing/WavelengthCalibration.ipynb

This file is used to run the banzai-floyds pipeline to characterize the quality of our reductions.

cmccully · 2024-10-30T15:57:13Z

banzai_floyds/tests/test_background.py

+    # We assert that only edge pixels can vary by 5 sigma due to edge effects
+    for order in [1, 2]:
+        order_region = get_order_2d_region(fake_frame.orders.data == order)
+        residuals = fake_frame.background[order_region][-2:2, -2:2] - fake_frame.input_sky[order_region][-2:2, -2:2]


Previously, I was trying to require all pixels to have the residuals in the background fit be below some threshold. Given the large dynamic range in signal to noise that is not the correct metric. Instead, if we are fitting to the noise, the residuals/uncertainty should follow a unit gaussian distribution. So we now check that are only 1% of outliers larger than 3 sigma (similar to a gaussian) and no wild outliers < 5 sigma (skipping the edges which can have weird artifacts.)

…1 macs but conda installing it works just fine.

jchate6

I'm only about half way through, but might not finish before tomorrow.

jchate6 · 2024-10-31T15:51:03Z

banzai_floyds/arc_lines.py

@@ -20,12 +20,6 @@
        'line_source': 'Hg',
        'line_notes': ''
    },


Is there a way to recognize a catastrophic failure reliably without human intervention?

jchate6 · 2024-10-31T19:08:45Z

banzai_floyds/background.py

@@ -0,0 +1,106 @@
+import numpy as np


This would be good to capture somewhere in the documentation.
Maybe a docstring?
Someone is going to look at this in a few years, and should appreciate the pain of your journey and hesitate to repeat it.

jchate6 · 2024-11-04T19:47:51Z

banzai_floyds/fringe.py

@@ -97,13 +98,16 @@ class FringeCorrector(Stage):
    def do_stage(self, image):
        # Only divide the fringe out where the divisor is > 0.1 so we don't amplify
        # artifacts due to the edge of the slit


How long is a long time?

jchate6 · 2024-11-04T20:04:12Z

banzai_floyds/orders.py

@@ -404,7 +404,9 @@ class OrderSolver(Stage):
    ORDER_HEIGHT = 93
    CENTER_CUT_WIDTH = 31
    POLYNOMIAL_ORDER = 3
-    ORDER_REGIONS = [(0, 1700), (630, 1975)]
+
+    ORDER_REGIONS = {'ogg': [(0, 1550), (500, 1835)],


This feels like a danger point if we update the instruments at all.
We'll need to make sure updating these values doesn't get missed.

mgdaily · 2024-11-05T00:20:00Z

banzai_floyds/orders.py

@@ -404,7 +404,9 @@ class OrderSolver(Stage):
    ORDER_HEIGHT = 93
    CENTER_CUT_WIDTH = 31
    POLYNOMIAL_ORDER = 3
-    ORDER_REGIONS = [(0, 1700), (630, 1975)]
+
+    ORDER_REGIONS = {'ogg': [(0, 1550), (500, 1835)],


This could also be runtime config in the helm chart to make that process easier?

sfoale

lgtm

cmccully added 6 commits September 23, 2024 19:50

Fixes to wavelength solution. Many other fixes to make UI easier and …

461cabf

…re-extraction possible. Not fully tested yet.

Added manual reduction code

c416caa

More fixes to make the frontend work with re-extraction and some wave…

e6e3935

…length bug fixes.

More bugfixes discovered in commissioning

7ea7521

WIP: Background subtraction is still broken, but this includes some o…

ceb1305

…f the bugfixes along the way.

Fixes to the unit tests.

5487148

cmccully requested review from mgdaily, sfoale and jchate6 October 30, 2024 15:05

cmccully commented Oct 30, 2024

View reviewed changes

cmccully added 3 commits October 30, 2024 12:05

Fixes to failing tests on github actions.

0070d28

Removing mac github actions build. There is some postgres oddity on M…

cd02816

…1 macs but conda installing it works just fine.

Fixed normalization in extraction with uniform weights

fab2e53

jchate6 reviewed Nov 4, 2024

View reviewed changes

mgdaily approved these changes Nov 5, 2024

View reviewed changes

jchate6 approved these changes Nov 5, 2024

View reviewed changes

sfoale approved these changes Nov 5, 2024

View reviewed changes

cmccully added 3 commits November 5, 2024 11:16

Merge branch 'main' of github.com:LCOGT/banzai-floyds into reextract

9c69174

Responding to comments from the PR review.

cc8bf92

Added to the changelog

6d39c92

cmccully merged commit e7843e3 into main Nov 5, 2024
2 of 6 checks passed



		logger = get_logger()


		def profile_gauss_fixed_width(params, x, sigma):

		@@ -0,0 +1,96 @@
		from banzai_floyds.background import fit_background, set_background_region, BackgroundFitter

		@@ -107,6 +113,25 @@ def test_that_order_mask_exists(self):
		# Note there are only two orders in floyds
		assert np.max(hdu['ORDERS'].data) == 2

		@@ -63,7 +63,7 @@ def test_create_super_fringe():


		def test_correct_fringe():
		np.random.seed(291523)

		@@ -0,0 +1,29 @@
		from banzai_floyds.profile import fit_profile_centers, fit_profile_sigma

		@@ -254,6 +248,23 @@ def generate_fake_extracted_frame(do_telluric=False, do_sensitivity=True):
		return frame


		def load_manual_region(region_filename, site_id, order_id, shape, order_height):

		return binned_data.group_by(('order', 'wavelength_bin'))


		def combine_wavelength_bins(wavelength_bins):

Commissioning fixes and refactoring for re-extraction using the frontend UI. #30

Commissioning fixes and refactoring for re-extraction using the frontend UI. #30

Conversation

cmccully commented Oct 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jchate6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfoale left a comment

Choose a reason for hiding this comment