duration_based_chunks algorithm produces empty groups, causing pytest to fail when collecting tests #26

alexforencich · 2021-06-18T20:53:44Z

Something in the most recent set of changes has broken something. Previously I have been doing --splits 20 and all of the groups have at least one test. Now, at least the last two groups are empty, which causes pytest to exit with error 5 and github actions to kill all of the tests. Not sure what the precise cause of this is at the moment.

alexforencich · 2021-06-18T22:27:38Z

Yep, there is definitely something borked with duration_based_chunks. It is producing empty groups. Flip side, it seems like a pretty useless algorithm anyway that's highly dependent on the test ordering. The least_duration algorithm seems to be much more sensible, and I think it's not possible for it to produce empty groups so long as there are at least as many tests as there are groups.

jerry-git · 2021-06-20T16:42:44Z

Thanks for reporting @alexforencich, can you pinpoint which version started causing this? Or can you share the repo in which this is happening?

alexforencich · 2021-06-20T17:56:10Z

I have not attempted to figure out which version caused this, but presumably this was introduced when the duration_based_chunks algorithm was added.

Repo in question: https://github.com/alexforencich/verilog-pcie

Commit that caused the workflow run to fail: alexforencich/verilog-pcie@ccc44d7

Workflow run: https://github.com/alexforencich/verilog-pcie/runs/2861569844?check_suite_focus=true

jerry-git · 2021-06-21T08:15:45Z

Thanks @alexforencich! FYI @mbkroese

mbkroese · 2021-06-21T19:26:47Z

@alexforencich thanks for reporting. Is it possible to run your test suite with this version: 0.2.3?
This is the version before the last one, and would help us pin down the problem.

alexforencich · 2021-06-21T19:38:18Z

Yes, 0.2.3 is also affected.

alexforencich · 2021-06-21T19:42:02Z

Did some more testing, 0.1.6 is OK, but 0.2.0 is broken.

mbkroese · 2021-06-22T06:55:44Z

This is a small example that reproduces the issue:

from collections import named tuple

from pytest_split import algorithms

item = namedtuple('item', 'nodeid')

durations = {}
durations['a'] = 2313.7016449670773
durations['b'] = 46.880724348986405
durations['c'] = 2196.7077018650016
durations['d'] = 379.9717799640057
durations['e'] = 1476.3481151770247
durations['f'] = 979.7326026459923
durations['g'] = 1876.5443489580794
durations['h'] = 1.3951316330058035

items = [item(x) for x in ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']]

groups = algorithms.duration_based_chunks(7, items, durations)

alexforencich · 2021-06-22T07:20:33Z

The problem is that the algorithm is extremely greedy. Seems to me that this case is likely to happen if there are a bunch of short tests that are processed last. One, ah, stupid fix would be to move on to the next group if there are less than splits - group_idx items left to process. This would ensure that all of the groups have at least one test, but the algorithm is still extremely greedy so you would still end up with a very lopsided split.

jerry-git · 2021-06-22T07:20:56Z

Thanks for the investigation and sorry for pinging @mbkroese 😅 If it happened between 0.1.6 and 0.2.0 then it's most likely something in #14, FYI @sondrelg.

jerry-git · 2021-06-22T07:31:44Z

Yeah indeed sounds like we should have some mechanism which ensures that there's at least one test per group (and if splits > len(tests) -> error)

sondrelg · 2021-06-22T07:37:00Z

Whoops, that's unfortunate. I could probably add a test and make sure it works again sometime this week if needed 🙂

Just for the sake of my own curiosity @alexforencich, how small is your test sample size? I think I might have changed the behavior from stopping before a group exceeds the time-limit to stopping when it exceeds. My thinking was that front-loading tests a little bit makes sense 99% of the time, because at least in Github actions, containers seem to spin up sequentially; i.e., group 1 starts considerably earlier than group 20 (at least for me).

Then again, I guess if there are a very small number of tests this could create some weird behavior. Is that what's happening here?

alexforencich · 2021-06-22T07:37:23Z

The other algorithm (least_duration) is fine as it adds tests to the group with the smallest duration. So this means that if you have N splits, the first N tests get distributed across those splits. This is also a very simple greedy algorithm, but at least it's guaranteed to not produce any empty groups so long as you have at least as many tests as you have splits. I suppose what you could do if you have less tests than you have splits is just duplicate one of the tests in all of the remaining splits so pytest doesn't crash.

I think there are two parts to this: 1) fix duration_based_chunks, and 2) add some safeguard somewhere else to ensure that it's not possible to ever produce any empty groups so that pytest doesn't crash no matter what the splitting algorithm does (unless, of course, there are no tests at all).

alexforencich · 2021-06-22T07:41:20Z

I want to say it's something like 170 tests split into 20 groups. But the spread of run-times is quite large, some of them run in a few seconds, some take 10 minutes. And that spread is probably what's screwing up the greedy algorithm that places tests into each group until it hits the average, then moves on to the next group. I think if you make a test case of, say, 10 tests of 1 second each, then 1 test of 100 seconds each, and ask for 10 groups, you'll get 1 group with all of the tests and 9 empty groups.

mbkroese · 2021-06-22T08:03:26Z

sorry for pinging @mbkroese

happy to help :)

sondrelg · 2021-06-22T21:43:20Z

Opened a quick PR. Please take a look when you have time 🙂

jgopel · 2023-01-31T15:52:29Z

Any update on this? I seem to be encountering this issue in 0.8.0.

sondrelg · 2023-01-31T17:33:03Z

This is the last thing to happen AFAIK

jgopel · 2023-01-31T17:52:40Z

Seems like the PR is abandoned - is there interest in reviving it?

alexforencich · 2023-01-31T21:49:52Z

Is using least_duration not an option for you?

jgopel · 2023-01-31T23:57:23Z

The issue I have is that the tests were designed such that there are optimizations around data loading, etc that are setup in the first test of the suite to run and then consumed by all tests that follow. Using least_duration ends up with inconsistency in .test_durations because whether a test is the first test of a suite to run in a particular group is not deterministic. I also get better performance from grouping similar tests together for the same reason.

alexforencich · 2023-02-01T00:08:54Z

It almost sounds like you want to do something similar to what's detailed in the readme for nbval - shuffling things around to keep groups together and preserve the ordering. So perhaps even more restrictive than what the default algorithm is doing, since the default one I think will split things up.

jgopel · 2023-02-01T00:20:47Z

I had missed that part of the readme. That looks like pretty much exactly what I'd like to do in the ideal case. Is there a mode for that? Or is it automatically detected?

alexforencich · 2023-02-01T00:26:48Z

I suspect that it is linked to using nbval specifically. But it seems like it might be a useful mode to have in general.

Perhaps there is a general way to do this: instead of dealing with individual tests, handle groups of tests, with tests grouped according to some parameters, perhaps based on the file. The current behavior would just be putting each test in its own group.

jgopel · 2023-02-01T00:29:03Z

Are you imagining that the user could provide a predicate for custom splitting? How would that be specified and passed in?

alexforencich · 2023-02-01T00:36:31Z

not sure, just tossing out some ideas. But a simple one could be an option to keep all of the tests defined in each file together.

jgopel · 2023-02-01T16:07:35Z

That makes sense to me as a starting point. Would there be an appetite to merge that if I were to do the work?

Prodian0013 · 2024-06-13T16:52:55Z

I ran into this same issue with duration_based_chunks trying to run empty groups 0 selected I had to switch to using least_duration

Hchanni · 2024-12-13T03:10:31Z

Hey @jerry-git still hitting this issue in 0.8.0 any updates?

alexforencich changed the title ~~Getting empty groups causing pytest to fail~~ duration_based_chunks algorithm produces empty groups, causing pytest to fail when collecting tests Jun 19, 2021

sondrelg mentioned this issue Jun 22, 2021

Get rid of test front-loading in the duration_based_chunks algo #27

Closed

joeylegere mentioned this issue May 3, 2022

[WIP] Speed up unit tests opentensor/bittensor#760

Closed

jerry-git mentioned this issue Nov 10, 2022

[Actions] Auto-Update cookiecutter template #63

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

duration_based_chunks algorithm produces empty groups, causing pytest to fail when collecting tests #26

duration_based_chunks algorithm produces empty groups, causing pytest to fail when collecting tests #26

alexforencich commented Jun 18, 2021

alexforencich commented Jun 18, 2021

jerry-git commented Jun 20, 2021

alexforencich commented Jun 20, 2021

jerry-git commented Jun 21, 2021

mbkroese commented Jun 21, 2021

alexforencich commented Jun 21, 2021 •

edited

Loading

alexforencich commented Jun 21, 2021

mbkroese commented Jun 22, 2021 •

edited

Loading

alexforencich commented Jun 22, 2021

jerry-git commented Jun 22, 2021

jerry-git commented Jun 22, 2021

sondrelg commented Jun 22, 2021

alexforencich commented Jun 22, 2021

alexforencich commented Jun 22, 2021

mbkroese commented Jun 22, 2021

sondrelg commented Jun 22, 2021

jgopel commented Jan 31, 2023

sondrelg commented Jan 31, 2023

jgopel commented Jan 31, 2023

alexforencich commented Jan 31, 2023

jgopel commented Jan 31, 2023

alexforencich commented Feb 1, 2023

jgopel commented Feb 1, 2023

alexforencich commented Feb 1, 2023

jgopel commented Feb 1, 2023

alexforencich commented Feb 1, 2023

jgopel commented Feb 1, 2023

Prodian0013 commented Jun 13, 2024

Hchanni commented Dec 13, 2024

duration_based_chunks algorithm produces empty groups, causing pytest to fail when collecting tests #26

duration_based_chunks algorithm produces empty groups, causing pytest to fail when collecting tests #26

Comments

alexforencich commented Jun 18, 2021

alexforencich commented Jun 18, 2021

jerry-git commented Jun 20, 2021

alexforencich commented Jun 20, 2021

jerry-git commented Jun 21, 2021

mbkroese commented Jun 21, 2021

alexforencich commented Jun 21, 2021 • edited Loading

alexforencich commented Jun 21, 2021

mbkroese commented Jun 22, 2021 • edited Loading

alexforencich commented Jun 22, 2021

jerry-git commented Jun 22, 2021

jerry-git commented Jun 22, 2021

sondrelg commented Jun 22, 2021

alexforencich commented Jun 22, 2021

alexforencich commented Jun 22, 2021

mbkroese commented Jun 22, 2021

sondrelg commented Jun 22, 2021

jgopel commented Jan 31, 2023

sondrelg commented Jan 31, 2023

jgopel commented Jan 31, 2023

alexforencich commented Jan 31, 2023

jgopel commented Jan 31, 2023

alexforencich commented Feb 1, 2023

jgopel commented Feb 1, 2023

alexforencich commented Feb 1, 2023

jgopel commented Feb 1, 2023

alexforencich commented Feb 1, 2023

jgopel commented Feb 1, 2023

Prodian0013 commented Jun 13, 2024

Hchanni commented Dec 13, 2024

alexforencich commented Jun 21, 2021 •

edited

Loading

mbkroese commented Jun 22, 2021 •

edited

Loading