Set up tox-conda/pytest #1

ajjackson · 2024-11-07T12:36:55Z

Set build-backend so isolated pip install works
Add pychop as testing dependency
Configure tox-conda
Create __init__.py files under tests/ so that all tests are found by pytest

We need to use conda to install Mantid and check results against Abins functions. Tox-conda is stuck on tox v3; there is a branch to support v4 but it's not clear if/when this will be finished. But we also might be able to drop the conda component of these tests once we are satisfied with validation against Mantid: it can be replaced with tests against reference values.

Perhaps we don't need the test dependencies at all in pyproject.toml, now this is all defined in tox.ini. But they could be useful for running tests without tox if desired.

- Set build-backend so isolated pip install works - Add pychop as testing dependency - Configure tox-conda - Create __init__.py files under tests/ so that all tests are found by pytest We need to use conda to install Mantid and check results against Abins functions. Tox-conda is stuck on tox v3; there is a branch to support v4 but it's not clear if/when this will be finished. But we also might be able to drop the conda component of these tests once we are satisfied with validation against Mantid: it can be replaced with tests against reference values. Perhaps we don't need the test dependencies at all in pyproject.toml, now this is all defined in tox.ini. But they could be useful for running tests without tox if desired.

ajjackson · 2024-11-07T14:04:44Z

This is a decent start, the tests seem to be running correctly. Lots are failing, but we expected that.

- Use reservoir sampling to efficiently grab a subset of Fermi chopper frequency combinations - Simplify error matching with match= argument - Get id on the fly with a function rather than precomputing

- defer string ids to a function run by test - replace explicit list-extending with itertools.chain

The sweep of incident energies is cut down by some more (seeded) random sampling. At the moment we have some cases with invalid chopper frequencies; make sure the error is checked robustly when handling these. Ideally we should have smarter test-case generation and check the errors in a separate test

These aren't currently being checked in CI because tox takes care of it, so can easily become stale. There must be a better way.

ajjackson · 2024-11-08T15:48:50Z

There's more work to do on test coverage / legibility / efficiency but this at least gets some CI up and running without too much complexity or runtime. Would be good to fix the failing tests next!

RastislavTuranyi · 2024-11-11T09:49:23Z

The MAPS and MARI integration tests against AbINS that are failing in CI, on my local they are all passing.

The tests seem to be failing due to NoTransmissionError, but that case is expected (it should go into the except block). The interesting thing is that AbINS is not raising a LinAlgError for the invalid combinations of energy and frequency, causing our strategy to fail. Maybe the CI is using a different version of Mantid? Have there been any relevant changes to AbINS that change the behaviour that the tests expect?

    try:
        abins._polyfits = {}
        expected = abins.calculate_sigma(frequencies * MEV_TO_WAVENUMBER) * WAVENUMBER_TO_MEV
    except LinAlgError:
        with pytest.raises(NoTransmissionError):
            rf_2d.get_resolution_function('PyChop_fit', chopper_package=setting, e_init=energy, chopper_frequency=chopper_frequency)
    else:
        rf = rf_2d.get_resolution_function('PyChop_fit', chopper_package=setting, e_init=energy,
                                           chopper_frequency=chopper_frequency)
        actual = rf(frequencies)

        assert_allclose(actual, expected, rtol=1e-5)

Concerning the failing Lagrange tests, those are due to the issue in AbINS where the cut-off values have a bug in the unit conversion, resulting in the low-energy values being incorrect.

ajjackson · 2024-11-11T10:03:04Z

Presumably that means there is no LinAlgError first. LinAlgError error comes from numpy, perhaps it depends on the Numpy/Scipy version?

I get the same errors when running with tox on a Rocky 8 VM, so there isn't anything weird going on with the Github runner.

I have tried Mantid versions 6.9, 6.10, 6.11; 6.9 segfaults (probably numpy trouble) and the two recent versions give the same MAPS/MARI errors

ajjackson · 2024-11-11T10:09:34Z

In Mantid, if I try to call calculate_sigma() at invalid energy/chopper settings I get NaN back; it doesn't raise an error at this point. How do you get the LinAlgError to appear?

(It probably should, this would give better user feedback.)

RastislavTuranyi · 2024-11-11T10:16:03Z

I get this in tests without the except block (i.e. np.polyfit raising a LinAlgError):

tests\integration_tests\test_pychop.py:57 (test_against_abins[MAPS_A-ei=50,f=450])
matrix = (50, 450)
abins_rf_2d = (<abins.instruments.pychop.PyChopInstrument object at 0x000001E292EC78E0>, Instrument(name='MAPS', version='MAPS', mod...'tjit': 0.0}, 'S': {'pslit': 0.002899, 'radius': 0.049, 'rho': 1.3, 'tjit': 0.0}}}}}, default_model='PyChop_fit'), 'A')

    @pytest.mark.parametrize('matrix', EF_MATRIX, ids=EF_IDS)
    def test_against_abins(matrix, abins_rf_2d,):
        abins, rf_2d, setting = abins_rf_2d
>       _test_against_abins(abins, rf_2d, setting, matrix)

integration_tests\test_pychop.py:61: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
integration_tests\test_pychop.py:73: in _test_against_abins
    expected = abins.calculate_sigma(frequencies * MEV_TO_WAVENUMBER) * WAVENUMBER_TO_MEV
..\..\..\..\AppData\Local\miniconda3\envs\rfd\Library\scripts\abins\instruments\pychop.py:53: in calculate_sigma
    self._polyfit_resolution()
..\..\..\..\AppData\Local\miniconda3\envs\rfd\Library\scripts\abins\instruments\pychop.py:66: in _polyfit_resolution
    fit = np.polyfit(frequencies_invcm, resolution_sigma * MILLI_EV_TO_WAVENUMBER, order)
..\..\..\..\AppData\Local\miniconda3\envs\rfd\lib\site-packages\numpy\lib\polynomial.py:669: in polyfit
    c, resids, rank, s = lstsq(lhs, rhs, rcond)
..\..\..\..\AppData\Local\miniconda3\envs\rfd\lib\site-packages\numpy\linalg\linalg.py:2326: in lstsq
    x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

err = 'invalid value', flag = 8

    def _raise_linalgerror_lstsq(err, flag):
>       raise LinAlgError("SVD did not converge in Linear Least Squares")
E       numpy.linalg.LinAlgError: SVD did not converge in Linear Least Squares

..\..\..\..\AppData\Local\miniconda3\envs\rfd\lib\site-packages\numpy\linalg\linalg.py:124: LinAlgError

We are seeing some interesting inconsistencies in test results between Rastislav's local Windows installation and my Linux/tox setup.

RastislavTuranyi · 2024-11-11T10:27:22Z

Could indeed be a Windows issue (e.g. this numpy problem caused by a bug in Windows, which should be resolved, but there could be something happening again?).

In either case, we could add a check for nans to the else statement:

    try:
        abins._polyfits = {}
        expected = abins.calculate_sigma(frequencies * MEV_TO_WAVENUMBER) * WAVENUMBER_TO_MEV
    except LinAlgError:
        with pytest.raises(NoTransmissionError):
            rf_2d.get_resolution_function('PyChop_fit', chopper_package=setting, e_init=energy, chopper_frequency=chopper_frequency)
    else:
        if np.any(np.isnan(expected)):
            with pytest.raises(NoTransmissionError):
                rf_2d.get_resolution_function('PyChop_fit', chopper_package=setting, e_init=energy, chopper_frequency=chopper_frequency)
            return

        rf = rf_2d.get_resolution_function('PyChop_fit', chopper_package=setting, e_init=energy,
                                           chopper_frequency=chopper_frequency)
        actual = rf(frequencies)

        assert_allclose(actual, expected, rtol=1e-5)

We know some tests are failing regardless, but want to see what happens on Windows!

ajjackson · 2024-11-11T10:41:15Z

Whether it is a "bug" or not, on our side it is probably safest not to assume that NaN values will always be treated the same way across math libraries and operating systems.

ajjackson · 2024-11-11T10:55:53Z

Yup, on Windows there are only the Lagrange failures so it is taking different paths depending on platform/libraries 😭

When testing on Linux, Abins returns NaN rather than raise LinAlgError when the chopper settings are out of range. Our test should detect both cases and ensure appropriate behaviour in corresponding situation. (i.e. raise NoTransmissionError)

RastislavTuranyi · 2024-11-11T11:07:14Z

PyChop issues a warning (warnings.warn) when it returns a nan due to no transmission; we could take advantage of that:

    abins._polyfits = {}

    with warnings.catch_warnings():
        warnings.filterwarnings('error', message='PyChop: tchop\(\): No transmission.*')
        try:
            expected = abins.calculate_sigma(frequencies * MEV_TO_WAVENUMBER) * WAVENUMBER_TO_MEV
        except Warning:
            with pytest.raises(NoTransmissionError):
                rf_2d.get_resolution_function('PyChop_fit', chopper_package=setting, e_init=energy, chopper_frequency=chopper_frequency)
        else:
            rf = rf_2d.get_resolution_function('PyChop_fit', chopper_package=setting, e_init=energy,
                                               chopper_frequency=chopper_frequency)
            actual = rf(frequencies)

            assert_allclose(actual, expected, rtol=1e-5)

ajjackson · 2024-11-11T11:59:34Z

NaN check is working, will give warning a try and see which is nicer

This is a bit cleaner than having separate checks for different OS/Library situations

There is a unit-conversion bug in the low-frequency limit so this doesn't kick in at the right place in Abins. When that is fixed we can de-skip the test and verify it here.

ajjackson · 2024-11-11T14:17:38Z

Warning-based logic looks a bit cleaner to me and seems to work across platforms 👍

I have set the low-frequency Abins-Lagrange comparison to skip, as we know that one is suspect. Higher frequencies are ok but I needed to slightly loosen the assert_allclose tolerance.

I'd still like to see some of these parameter sweeps collapsed into single test cases, but this seems useful enough to merge now?

ajjackson added 2 commits November 7, 2024 12:34

Create a run_tests Action

f0785ae

ajjackson force-pushed the test branch from b032825 to f0785ae Compare November 7, 2024 12:49

ajjackson added 5 commits November 7, 2024 15:04

Move manual tests/tools out of tests directory

e5a3c3c

Streamline unit_test

eb7f15c

- Use reservoir sampling to efficiently grab a subset of Fermi chopper frequency combinations - Simplify error matching with match= argument - Get id on the fly with a function rather than precomputing

Rework test matrices:

bd5d48e

- defer string ids to a function run by test - replace explicit list-extending with itertools.chain

Update pyproject.toml with another test dependency.

9210cdc

These aren't currently being checked in CI because tox takes care of it, so can easily become stale. There must be a better way.

ajjackson marked this pull request as ready for review November 8, 2024 15:30

enable integration tests

8928d96

ajjackson requested a review from RastislavTuranyi November 8, 2024 15:49

Try running on windows as well

2872756

We are seeing some interesting inconsistencies in test results between Rastislav's local Windows installation and my Linux/tox setup.

Disable fail-fast

b5e2676

We know some tests are failing regardless, but want to see what happens on Windows!

Indentation matters

4c7023e

ajjackson force-pushed the test branch from e07eb9b to 4c7023e Compare November 11, 2024 11:31

Detect PyChop Warning in Abins integration test

8efd6dc

This is a bit cleaner than having separate checks for different OS/Library situations

ajjackson force-pushed the test branch from 3069ae2 to 8efd6dc Compare November 11, 2024 13:57

Skip Lagrange low-frequency comparison with Abins

1a45e22

There is a unit-conversion bug in the low-frequency limit so this doesn't kick in at the right place in Abins. When that is fixed we can de-skip the test and verify it here.

Tidier Lagrange testing: use pytest.param to mark parameter set

26a3d33

RastislavTuranyi merged commit 4bf9324 into main Nov 22, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set up tox-conda/pytest #1

Set up tox-conda/pytest #1

ajjackson commented Nov 7, 2024 •

edited

Loading

ajjackson commented Nov 7, 2024

ajjackson commented Nov 8, 2024

RastislavTuranyi commented Nov 11, 2024

ajjackson commented Nov 11, 2024

ajjackson commented Nov 11, 2024 •

edited

Loading

RastislavTuranyi commented Nov 11, 2024 •

edited

Loading

RastislavTuranyi commented Nov 11, 2024

ajjackson commented Nov 11, 2024

ajjackson commented Nov 11, 2024

RastislavTuranyi commented Nov 11, 2024

ajjackson commented Nov 11, 2024

ajjackson commented Nov 11, 2024 •

edited

Loading

Set up tox-conda/pytest #1

Set up tox-conda/pytest #1

Conversation

ajjackson commented Nov 7, 2024 • edited Loading

ajjackson commented Nov 7, 2024

ajjackson commented Nov 8, 2024

RastislavTuranyi commented Nov 11, 2024

ajjackson commented Nov 11, 2024

ajjackson commented Nov 11, 2024 • edited Loading

RastislavTuranyi commented Nov 11, 2024 • edited Loading

RastislavTuranyi commented Nov 11, 2024

ajjackson commented Nov 11, 2024

ajjackson commented Nov 11, 2024

RastislavTuranyi commented Nov 11, 2024

ajjackson commented Nov 11, 2024

ajjackson commented Nov 11, 2024 • edited Loading

ajjackson commented Nov 7, 2024 •

edited

Loading

ajjackson commented Nov 11, 2024 •

edited

Loading

RastislavTuranyi commented Nov 11, 2024 •

edited

Loading

ajjackson commented Nov 11, 2024 •

edited

Loading