Make tests robust and independent #543

glatterf42 · 2024-09-11T11:01:55Z

This is a continuation from #521 that puts the branch in the upstream repo rather than my fork to enable windows test runners to access the GAMS_LICENSE secret. Please see the original PR for a detailed description, to-dos, etc.

PR checklist

Continuous integration checks all ✅
Add or expand tests; coverage checks both ✅
~~[ ] Add, expand, or update documentation.~~ Just CI tests.
~~[ ] Update release notes.~~ Just CI tests.

* Split up test_close to receive only clean stdout

* Adapt expected values to new scenario names

codecov · 2024-09-11T11:25:11Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.8%. Comparing base (90cfe46) to head (749709e).
Report is 44 commits behind head on main.

Additional details and impacted files

@@          Coverage Diff          @@
##            main    #543   +/-   ##
=====================================
  Coverage   98.8%   98.8%           
=====================================
  Files         44      44           
  Lines       4813    4819    +6     
=====================================
+ Hits        4756    4765    +9     
+ Misses        57      54    -3

Files with missing lines	Coverage Δ
ixmp/backend/jdbc.py	`97.4% <100.0%> (+<0.1%)`	⬆️
ixmp/core/platform.py	`98.9% <ø> (ø)`
ixmp/testing/data.py	`97.3% <100.0%> (+<0.1%)`	⬆️
ixmp/testing/jupyter.py	`100.0% <ø> (+6.0%)`	⬆️
ixmp/tests/backend/test_base.py	`98.9% <100.0%> (ø)`
ixmp/tests/backend/test_jdbc.py	`100.0% <100.0%> (ø)`
ixmp/tests/core/test_scenario.py	`100.0% <100.0%> (ø)`
ixmp/tests/core/test_timeseries.py	`100.0% <100.0%> (ø)`
ixmp/tests/report/test_operator.py	`100.0% <100.0%> (ø)`
ixmp/tests/report/test_reporter.py	`100.0% <100.0%> (ø)`
... and 6 more

... and 1 file with indirect coverage changes

glatterf42 · 2024-09-11T11:28:02Z

I haven't seen test_export_timeseries_data being flaky in a while, but I also had not included anything here that could fix its flakiness. The error seems to arise from the fact that export_timeseries_data() to a tmp_path and reading it back in sometimes produces an empty df (reading back in via pd.read_csv()).
I'm not entirely sure why this is happening. Pytest's docs mention that "[c]oncurrent invocations of the same test function are supported by configuring the base temporary directory to be unique for each concurrent run." But the details below state that care is already taken when tests are distributed on a local machine using pytest-xdist. Still, I figure it can't hurt trying, so I've now included a pytest option that discards all directories created from tmp_path after each test (if that's true, note the singular). If this is indeed the source of flakiness, it might be resolved now, or if some test is still trying to read in files that have now been deleted, we will at least have more information about what's going on (assuming the error message is helpful).

glatterf42 · 2024-09-11T11:29:47Z

I'm just noticing: the windows tests consume much more time than all others. This is not specific to this PR; it was true also for recent dependabot PRs. Not sure if that is worth investigating, definitely not high in priority, I'm afraid.

glatterf42 · 2024-09-11T12:52:52Z

I'm rerunning these tests a few times now to see if any flakiness comes up. If it doesn't, I'm happy with the current state of the PR. Just to be sure: my current solution for the flakiness of the tutorial tests is to run these tests on the same worker, which might run counter to our desire to distribute the runs using pytest-xdist. However, the tutorial suite in ixmp is small enough so that this doesn't worry me. Most test on recent Python versions still finish in ~3 minutes, which I think is fine.

glatterf42 · 2024-09-11T14:25:10Z

The second rerun already revealed something interesting: On windows-py3.11, both of the first tutorial tests timed out when trying to reach the platform, but the second ones where happy to start. Unfortunately, for the R tutorials, the first test is required as the second tutorial requires the scenario from the first tutorial to be available. I have now introduced the pytest-dependency plugin to skip the second tutorial if the first one failed, but this still doesn't explain why the cells couldn't connect to the platform in the first place.

glatterf42 · 2024-09-11T14:49:49Z

pytest-dependency did not work as intended, so I'm now trying to include the helper function for creating the scenario that the first tutorial would create in the second one so that it can run independently of the first.

glatterf42 · 2024-09-11T15:29:15Z

#467 is also still open, despite there being a workaround in our code base: the tutorial tests are skipped on ubuntu because they're taking too long. Not sure if we need to keep this issue open, I'm not working on running the tutorials on ubuntu with this PR. We could still use it as a tracking issue if we want to remove the skips one day.

glatterf42 · 2024-09-11T15:39:19Z

I've run these tests a number of times now and the only tests I saw failing (but twice, unfortunately) were the first parts of the tutorial tests (i.e. the tests not called *_scenario). The second parts seemed to work, which makes me wonder: maybe there's an increased time-out risk for the first test that runs? Could it still interfere with some background task?
This also only happened on windows-py3.11, which might be coincidence.
As far as async is concerned, I didn't find any option to enable or disable it for NotebookClient. However, we could theoretically overwrite the default kernel manager, which is an AsyncKernelManager. This is the default, though, likely faster, tested, and working for all our other cases, so I'm hesitant here.
Maybe the flakiness of the first parts of the tutorial tests is for the moment best addressed with a flaky marker.

glatterf42 · 2024-09-11T16:10:38Z

test_del_ts is flaky again, too, though I'm not sure if not something else is going on there, too. The traceback includes lots of lines that seem to be logging output (or so) of something else.

glatterf42 · 2024-09-12T06:51:27Z

We discussed test_del_ts extensively in #521.
I'm now choosing to not dive deep into this or the tutorial-related failures since they may well be due to very specific technological details that are not relevant anymore once we transition to ixmp4. As far as I know, ixmp4 is currently only tests on ubuntu, but doesn't contain a single flaky test (despite having ~90% coverage). So for now, I think our best course of action is to mark these tests as flaky (rerunning them when they are failing; only on Windows and GHA) and migrating to ixmp4 as soon as we can.
I'd not take the same approach with message_ix or message-ix-models, which won't be replaced in the future as far as I can tell.

glatterf42 · 2024-09-12T08:14:37Z

I have rerun the tests now four times and did not observe any flaky behaviour, not even reruns due to flakiness. One time, the setup-graphviz action failed due to the backend server being overloaded, but there's not a lot we can do about that in this PR.
I did notice, though, that consistently this happens: XPASS ixmp/tests/backend/test_jdbc.py::test_del_ts - #463
This test got excluded for all Python versions <= 3.10 because of an incompatibility with jpype 1.4.1 and Python 1.5.0, which does not seem to persist with pandas > 2.x. I'll remove the xfail mark for this and see what happens.

khaeru

Thanks for the thorough effort here.

Some comments inline, and I pushed one commit to satisfy mypy in .testing.data. Also:

R_transport_scenario.ipynb appears to have added cell metadata that are editor-specific. I'd prefer we not commit these unless they are needed for a specific reason.

Aside from those items, good to go.

khaeru · 2024-09-12T09:02:09Z

ixmp/tests/test_tutorials.py

@@ -7,53 +7,73 @@

 from ixmp.testing import get_cell_output, run_notebook

+group_base_name = platform.system() + platform.python_version()
+
+GHA = "GITHUB_ACTIONS" in os.environ


Typically we'd define this in ixmp.testing. If it's only used here for now, fine to leave it; but if it's to be re-used, we can move it there.

ixmp/tests/test_tutorials.py

glatterf42 added 18 commits September 11, 2024 12:49

Use request.node.name for make_dantzig scenarios

2102c83

Fix typo in docstring

c625e12

Group tutorials that depend on each other

31d7a44

Give every platform a test-specific name

d68f757

Use test-specific name for scenarios

6a8bd54

Use test-specific name for platforms

c7d94ef

* Split up test_close to receive only clean stdout

Revert temporary fix from #510

38ac174

Copy model["dantzig"] properly

b955aab

* Adapt expected values to new scenario names

Avoid searching for to-be-created test-specific platforms

3b62932

Search for correct new scenario name

540d613

Increase default cell runtime on GHA

4127f57

Use more test-specific platforms

6c111d8

Make call to to-be-tested function explicit

9f6afa3

Rename expected data files for individual tests

4a67769

Call Scenario deletion function explicitly

f3615ce

Call TimeSeries deletion function explicitly

6216c02

Add type hint for correct specific backend type

12bfcc3

Remove timeseries from jindex explicitly

12464ed

glatterf42 added enh New features & functionality ci Continuous integration labels Sep 11, 2024

glatterf42 self-assigned this Sep 11, 2024

Discard tmp_path directories after each test

d469fd3

glatterf42 linked an issue Sep 11, 2024 that may be closed by this pull request

Address tests that are flaky with pytest-xdist #512

Closed

glatterf42 added 2 commits September 11, 2024 16:47

Drop workaround for jupyter/nbclient#85

bb48e78

Make second R tutorial independent with helper function

918298f

glatterf42 force-pushed the enh/ci-test-stability branch from a494ad5 to 918298f Compare September 11, 2024 14:48

glatterf42 added 2 commits September 12, 2024 08:47

Mark first tutorial tests as flaky on Windows

34f0595

Mark test_del_ts as flaky on Windows

6a50d32

Remove xfail marker for consistently xpassing test

5b30a63

glatterf42 requested a review from khaeru September 12, 2024 08:24

glatterf42 and others added 2 commits September 12, 2024 10:43

Distinguish platforms for more stability

fc76fab

Expand typing in .testing.data

61b40cb

khaeru approved these changes Sep 12, 2024

View reviewed changes

glatterf42 added 2 commits September 12, 2024 12:09

Remove editor-specific notebook metadata

97f1121

Address review comments

749709e

glatterf42 merged commit 080ad2b into main Sep 12, 2024
21 checks passed

glatterf42 deleted the enh/ci-test-stability branch September 12, 2024 10:31

This was referenced Sep 25, 2024

Make tests robust & independent #521

Closed

Make more tests robust and independent #546

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make tests robust and independent #543

Make tests robust and independent #543

glatterf42 commented Sep 11, 2024 •

edited

Loading

codecov bot commented Sep 11, 2024 •

edited

Loading

glatterf42 commented Sep 11, 2024

glatterf42 commented Sep 11, 2024

glatterf42 commented Sep 11, 2024 •

edited

Loading

glatterf42 commented Sep 11, 2024 •

edited

Loading

glatterf42 commented Sep 11, 2024

glatterf42 commented Sep 11, 2024

glatterf42 commented Sep 11, 2024

glatterf42 commented Sep 11, 2024

glatterf42 commented Sep 12, 2024 •

edited

Loading

glatterf42 commented Sep 12, 2024

khaeru left a comment

khaeru Sep 12, 2024

Make tests robust and independent #543

Make tests robust and independent #543

Conversation

glatterf42 commented Sep 11, 2024 • edited Loading

PR checklist

codecov bot commented Sep 11, 2024 • edited Loading

Codecov Report

glatterf42 commented Sep 11, 2024

glatterf42 commented Sep 11, 2024

glatterf42 commented Sep 11, 2024 • edited Loading

glatterf42 commented Sep 11, 2024 • edited Loading

glatterf42 commented Sep 11, 2024

glatterf42 commented Sep 11, 2024

glatterf42 commented Sep 11, 2024

glatterf42 commented Sep 11, 2024

glatterf42 commented Sep 12, 2024 • edited Loading

glatterf42 commented Sep 12, 2024

khaeru left a comment

Choose a reason for hiding this comment

khaeru Sep 12, 2024

Choose a reason for hiding this comment

glatterf42 commented Sep 11, 2024 •

edited

Loading

codecov bot commented Sep 11, 2024 •

edited

Loading

glatterf42 commented Sep 11, 2024 •

edited

Loading

glatterf42 commented Sep 11, 2024 •

edited

Loading

glatterf42 commented Sep 12, 2024 •

edited

Loading