-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
duration_based_chunks algorithm produces empty groups, causing pytest to fail when collecting tests #26
Comments
Yep, there is definitely something borked with duration_based_chunks. It is producing empty groups. Flip side, it seems like a pretty useless algorithm anyway that's highly dependent on the test ordering. The least_duration algorithm seems to be much more sensible, and I think it's not possible for it to produce empty groups so long as there are at least as many tests as there are groups. |
Thanks for reporting @alexforencich, can you pinpoint which version started causing this? Or can you share the repo in which this is happening? |
I have not attempted to figure out which version caused this, but presumably this was introduced when the duration_based_chunks algorithm was added. Repo in question: https://github.com/alexforencich/verilog-pcie Commit that caused the workflow run to fail: alexforencich/verilog-pcie@ccc44d7 Workflow run: https://github.com/alexforencich/verilog-pcie/runs/2861569844?check_suite_focus=true |
Thanks @alexforencich! FYI @mbkroese |
@alexforencich thanks for reporting. Is it possible to run your test suite with this version: |
Yes, 0.2.3 is also affected. |
Did some more testing, 0.1.6 is OK, but 0.2.0 is broken. |
This is a small example that reproduces the issue: from collections import named tuple
from pytest_split import algorithms
item = namedtuple('item', 'nodeid')
durations = {}
durations['a'] = 2313.7016449670773
durations['b'] = 46.880724348986405
durations['c'] = 2196.7077018650016
durations['d'] = 379.9717799640057
durations['e'] = 1476.3481151770247
durations['f'] = 979.7326026459923
durations['g'] = 1876.5443489580794
durations['h'] = 1.3951316330058035
items = [item(x) for x in ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']]
groups = algorithms.duration_based_chunks(7, items, durations) |
The problem is that the algorithm is extremely greedy. Seems to me that this case is likely to happen if there are a bunch of short tests that are processed last. One, ah, stupid fix would be to move on to the next group if there are less than |
Yeah indeed sounds like we should have some mechanism which ensures that there's at least one test per group (and if splits > len(tests) -> error) |
Whoops, that's unfortunate. I could probably add a test and make sure it works again sometime this week if needed 🙂 Just for the sake of my own curiosity @alexforencich, how small is your test sample size? I think I might have changed the behavior from stopping before a group exceeds the time-limit to stopping when it exceeds. My thinking was that front-loading tests a little bit makes sense 99% of the time, because at least in Github actions, containers seem to spin up sequentially; i.e., group 1 starts considerably earlier than group 20 (at least for me). Then again, I guess if there are a very small number of tests this could create some weird behavior. Is that what's happening here? |
The other algorithm (least_duration) is fine as it adds tests to the group with the smallest duration. So this means that if you have N splits, the first N tests get distributed across those splits. This is also a very simple greedy algorithm, but at least it's guaranteed to not produce any empty groups so long as you have at least as many tests as you have splits. I suppose what you could do if you have less tests than you have splits is just duplicate one of the tests in all of the remaining splits so pytest doesn't crash. I think there are two parts to this: 1) fix duration_based_chunks, and 2) add some safeguard somewhere else to ensure that it's not possible to ever produce any empty groups so that pytest doesn't crash no matter what the splitting algorithm does (unless, of course, there are no tests at all). |
I want to say it's something like 170 tests split into 20 groups. But the spread of run-times is quite large, some of them run in a few seconds, some take 10 minutes. And that spread is probably what's screwing up the greedy algorithm that places tests into each group until it hits the average, then moves on to the next group. I think if you make a test case of, say, 10 tests of 1 second each, then 1 test of 100 seconds each, and ask for 10 groups, you'll get 1 group with all of the tests and 9 empty groups. |
happy to help :) |
Opened a quick PR. Please take a look when you have time 🙂 |
Any update on this? I seem to be encountering this issue in 0.8.0. |
This is the last thing to happen AFAIK |
Seems like the PR is abandoned - is there interest in reviving it? |
Is using least_duration not an option for you? |
The issue I have is that the tests were designed such that there are optimizations around data loading, etc that are setup in the first test of the suite to run and then consumed by all tests that follow. Using |
It almost sounds like you want to do something similar to what's detailed in the readme for nbval - shuffling things around to keep groups together and preserve the ordering. So perhaps even more restrictive than what the default algorithm is doing, since the default one I think will split things up. |
I had missed that part of the readme. That looks like pretty much exactly what I'd like to do in the ideal case. Is there a mode for that? Or is it automatically detected? |
I suspect that it is linked to using nbval specifically. But it seems like it might be a useful mode to have in general. Perhaps there is a general way to do this: instead of dealing with individual tests, handle groups of tests, with tests grouped according to some parameters, perhaps based on the file. The current behavior would just be putting each test in its own group. |
Are you imagining that the user could provide a predicate for custom splitting? How would that be specified and passed in? |
not sure, just tossing out some ideas. But a simple one could be an option to keep all of the tests defined in each file together. |
That makes sense to me as a starting point. Would there be an appetite to merge that if I were to do the work? |
I ran into this same issue with |
Hey @jerry-git still hitting this issue in 0.8.0 any updates? |
Something in the most recent set of changes has broken something. Previously I have been doing --splits 20 and all of the groups have at least one test. Now, at least the last two groups are empty, which causes pytest to exit with error 5 and github actions to kill all of the tests. Not sure what the precise cause of this is at the moment.
The text was updated successfully, but these errors were encountered: