-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable test duration writes on all runs #14
Conversation
Some thoughts based on the PR description (haven't looked at the code yet 🙂 ).
This is OK if it's still behind
This is problematic as basically all the projects have |
Ok that sounds good 👍 I'll make that the default behavior
Yeah I should maybe have waited to let you respond in #11 before starting, but had some time yesterday so thought I'd jump into it 🙂 Personally I would prefer that the cached durations be left out of version control, since the files generated can become quite big. At work we have a medium sized project with ~4000 tests which produces a 550kb sized test-durations output file, which is quite big and will become part of our PR reviews every time it changes. At the same time, there's no need to make this the default behavior if you don't want it to be - we could skip the pytest-cache stuff altogether, and I could just put the output file ( |
I think I have a good idea of how to proceed now @jerry-git; thanks for responding in the issue. One final thing before finalizing this - would you mind quickly taking a look at the _split_tests method I added? I had a little bit of trouble understanding the original split-function (looking at it now it seems makes sense, I think I was just tired), so I changed it slightly. Some of the changes are new and have to be there (I think), like:
Then there's a change that's a consequence of changing how we select (and deselect) tests a little items[:] = selected_tests
config.hook.pytest_deselected(items=deselected_tests) The deselection results in pytest outputting something like The final change is that instead of finding the index at which the runtime threshold is reached, I've now sorted the tests by slowest first and then add faster tests to get as close to the threshold as possible. It might be overkill and could be reverted if you don't see any value in it. If you could review that function specifically I think I have everything I need to finish 👍 |
Sounds legit 👍
👍
One of the key features of |
Sounds good 👍 🙂 |
I'm more or less done I think 🙂 Just have one quick question @jerry-git: what do you think about making the pytest-split output stand out a little? This is all the output messages I've left in currently (the top one will only be output when no durations file is found) It might not be a big deal, but right now it might be hard to differentiate between regular pytest output and plugin output. How do you feel about doing changing the color of the output slightly? Like this (bold white): Or this (yellow)? 😄 The last one might be a little much - a color change might not be necessary - perhaps adding a Might also not be necessary at all, just thought I'd ask 🙂 |
e7858aa
to
5f4078e
Compare
A lot of back and forth here, but I think the PR is now ready for a second look when you have time @jerry-git 🙂 |
@sondrelg this PR will have a lot of conflicts with this one: #12 In that PR I'm basically adding the extra printouts that you're also adding in this PR. I also change the logic somewhat to use a different PyTest hook. I agree the idea for the colours is nice, but I think it would be better to keep these changes in separate PRs. What do you think about commenting on my PR, so we can merge that, and afterwards apply your PR on top of it? |
Looked up the durations and yeah it looks like the start-time is recorded: These are the tests over 100s in my durations file with the recorded time and the time it took to re-run them locally.
For some reason I have 10 (9) tests with long runtimes, when I split in 8 groups 🤔 But looks like a good explanation of the unevenness! |
I see two options:
Any thoughts? |
Great insights with the durations! I assume @sondrelg's findings are related to running db migrations in some fixture once per sub suite? |
I don't mind adding support for reading both. I can add a comment to remember to remove support for the old format when you release v1 for example 🙂
Yep! This fixture is what takes so long, and it looks like the way pytest is set up, the setup times of all fixtures (including those scoped for a whole session) are attributed to the first test in the test suite. I opened this issue in the pytest repo; hopefully it's possible to fix this there in time. In the meantime though, I would suggest pytest-split should do something like this where we just remove the setup time of the first test in the test suite. I'm testing that branch on my project at work now and it seems to have gotten rid of a lot of the variability 🎉 If you're keen to work on the split logic and variability fixes after this PR is merged, feel free to tag me in the PR and I'd be happy to test/review it for you when it's ready 🙂 |
This sounds like a potential solution but how about if the first test doesn't use the heavy fixture? 🙂 |
I don't think a perfect solution exists. We either risk reducing the setup time of the first test too much, or too little (current behaviour is also not correct right). And I guess an important point is that pytest-split cannot know which of them it is. Since pytest-split cannot know, but we know that all session scoped fixtures will always be attributed to the first test, then I think it's pretty safe to assume that the first test will almost always be inflated. And the larger a project gets, the more pronounced the problem becomes, while under-estimation for a single test would probably matter less and less. There is a chance that some project will have a ridiculous setup time for the first test in their test suite, and changing the behaviour as suggested above would cause them problems, but I think it's about 1000x times less likely than the other way around 😛 Another option though is to let the user decide - the user actually can know what plugins they use, and what their fixture setups look like, so they are in a much better position to make the decision. A possible solution might be to discard the first setup time if I guess the question boils down to whether adding the choice is worth it, or if discarding the initial setup time of the first test is so unlikely to cause significant problems for users that it should just be default behaviour. I don't see a harm in adding an opt-out 🙂 |
These are the outputs after the last commit Running without a test durations file
Since we now deselect tests, we get this nice |
73015f4
to
cb87064
Compare
PR should be ready to review when you're ready @jerry-git 👏 |
This is not true, consider e.g. import time
import pytest
@pytest.fixture(scope="session")
def expensive_session_fixture():
time.sleep(10)
def test_1():
...
def test_2(expensive_session_fixture):
... Yup, we could have some flag for enabling / disabling the "skip measuring the first setup". Considering your use case (and the whole thing in general), I think more valuable flag could be something like |
Would you only apply the long-duration threshold for the |
I would treat every test identically 🙂 A configurable threshold would also provide a workaround against misc bugs with other libraries, see e.g. #1 |
# This code provides backwards compatibility after we switched | ||
# from saving durations in a list-of-lists to a dict format | ||
# Remove this when bumping to v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💪
Is there a test case for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add one 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably add a 100% coverage requirement as well at some point 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do it tomorrow 😂
Added the old format to test in 794fedf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could help you write some tests if you want, and if you'd like I could also help implement poetry for package management and dependency stuff 🙂
Looks like their website is down - probably for the same reason Github was down earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Poetry would be welcome indeed 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💪
FYI, available in 0.2.0! |
This looks really good 👍 |
This PR:
--store-durations
even when splitting. This seems useful for maintaining a "warm cache" of durations. If test fixtures change, or test environments are upgraded, test durations can now continuously update to reflect those changes.pytest-split
with no durations file present. In that case we now just split tests evenly. This is a small detail, but one that will make implementing continuous updates in a ci-environment a little easier I think.There's also a slight restructuring wrt. pytest plugin structure, which is not a big deal if you prefer to revert. Just let me know - I just thought it might make more sense to register cache/split functionality based on certain criteria rather than run it every time and return early when not relevant. At least that was what I was trying to do 🙂