Support updating test durations when running for a single group #11

mbkroese · 2021-05-05T08:02:50Z

It would be useful to add support for calculating test durations while using the parameters --splits and group. To clarify, this is currently possible but it runs the full test suite, not just the tests for that particular group. Instead, users can calculate the test_durations for each group and then combine the results.

Happy to send a merge request your way if you think this makes sense!

The text was updated successfully, but these errors were encountered:

jerry-git · 2021-05-05T18:31:48Z

Interesting and good idea 👍 How would it actually work? Each pytest --splits ... process could generate a file containing the durations for the tests ran in that group but what kind of magic would combine those or would that be a manual process?

mbkroese · 2021-05-05T19:10:33Z

I'm afraid that would indeed be a manual process for now.
Each group would create its own test_durations file and in a later step these files are merged into one.

With the current format of storing execution times (JSON) it's a bit hard to combine the results since they are not easy to concatenate. However, by changing the format to CSV concatenation becomes trivial:

cat *.csv > test_durations.csv

What do you think?

jerry-git · 2021-05-06T06:06:13Z

I guess we could provide some simple command which would combine the reports, e.g. coverage has combine. Then it could do its magic using Python and the user would not need to know how it's done. And json files would do just fine.

mbkroese · 2021-05-06T07:16:43Z

Yes I'm aware of coverage combine.
I think the difference is that coverage is its own application whereas pytest-splits is a plugin and I'm not sure if the user woud expect a plugin for pytest to instal an additional command line tool. This command line tool would also have only one command: pytest-splits combine.

In my opinion appending files is slightly preferred because the cat command I shared above is simple and uses standard tools available on almost all machines.

I'll you decide :)

jerry-git · 2021-05-06T07:42:10Z

pytest-splits is a plugin and I'm not sure if the user woud expect a plugin for pytest to instal an additional command line tool.

I don't think it's a biggie. For example, pytest-cov depends on coverage and thus installing pytest-cov makes coverage CLI command available.

I just don't like that the user should know how the "internals" of pytest-split work, which is why I'd prefer a custom command 🙂 And the duration files for individual groups could go for example to .pytest_split or .test_durations directory.

sondrelg · 2021-05-28T08:42:34Z

I think just writing new results to the existing file makes the most sense. This would let users decide how they combine and/or manage their test-duration file.

Here's what I would like it to look like:

on: push

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      # Load test durations file from cache
      - uses: actions/cache@v2
        with:
          path: .test_durations
          key: 0  # doesn't matter
          restore-keys: 0  # doesn't matter
      
      # Run tests
      - run: pytest --splits 8 --group ${{ matrix.group }} --store-durations
    
      # If .test_durations was found in the cache, then the plugin would write to the existing file
      # If the cache was empty, it would create a new file with data from the tests run
      # In both cases the cache will store the file for next time

There are a few things to consider though:

I think the current file format might not make the most sense. I guess ordering the tests by duration is important for the core split logic, but I think it might make more sense to store the results as a dict. That would make writing new entries and/or updating entries much easier, wouldn't it?
I think it might make sense to allow the user to have no test-durations file at all. I know you assume a mean runtime for undocumented tests - maybe you could extend the logic to assume an equal (the number doesn't matter) runtime if no file exists, so that the first split just splits by number of tests. It wouldn't be perfect, but it would save users from ever needing to worry about test duration mapping. The test suite would just be improved with each run automatically.
You don't need to write to .pytest_split or .test_durations. You might be well aware of this, but just in case you're not, pytest provides a way to write to the pytest cache in the config object, like this. Writing to .test_durations makes sense if you want users to commit the timing data to version control, but ideally I would like to just cache my whole .pytest_cache between runs and let it update itself.

If we wanted to do all three, you could also remove the --store-durations flag and make writing to the cache the default behavior. That would leave new users only needing to add this to their workflow to reap all the benefits it brings:

on: push

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/cache@v2
        with:
          path: .pytest_cache
          key: 
          restore-keys: 
      # pytest-split is enabled if `splits` and `group` is found, 
      # and writes to `.pytest_cache/v/cache/pytest_splits` automatically
      - run: pytest --splits 8 --group ${{ matrix.group }}

And then each run would get a little bit more accurate 🙂

Sorry for the block of text; hope some of it made sense. Looking forward to hearing your thoughts - I've probably definitely forgotten to consider some things 👍

sondrelg · 2021-05-28T08:46:17Z

Forgot to say I'd be happy to try and implement this if you want help @jerry-git; after we agree on what the implementation should look like 🙂

mbkroese · 2021-05-28T10:46:56Z

I agree with these suggestions. The test identifiers are considered unique anyway so using a dictionary to store the times makes sense. Writing it to the pytest cache and simply splitting into chunks if there is no cache yet available seems to be the right way to use existing functionality.

I'm also happy to contribute further to this.

jerry-git · 2021-05-31T10:55:07Z

Great discussion here!

Regarding @sondrelg's points in this #11 (comment):

json file with json objects OK for me as long as the current functionality works. We need to update the major version when this is done as it's backwards incompatible (unless the implementation supports both the old and the new format).
I agree, would make sense to allow "dummy splitting"
Let's not remove support for storing the durations in VCS and let's keep .test_durations as default. However, support for storing the duration data in any other place would be nice, i.e. some configurability.

If we wanted to do all three, you could also remove the --store-durations

I'd only store the data in case --store-durations is provided. Otherwise people would easily start committing .test_durations file unintentionally.

Just to note, if the test durations data location would be configurable, one could still point it to .pytest_cache if it fits better their use case 🙂

sondrelg · 2021-05-31T11:05:28Z

Ok so to summarize:

Leave --store-durations as a required argument for triggering any caching
Write durations data to .test_durations by default, unless the existing --durations-path argument is passed
Change from list of list to saving the cache data as a dictionary

I agree we probably don't need to involve the .pytest_cache - I can just put the .test_durations file in my own .gitignore and get the same result 🙂

Otherwise people would easily start committing .test_durations file unintentionally.

Yeah I agree 👍

jerry-git · 2021-06-16T07:39:16Z

This was done in #14

This was referenced May 30, 2021

Pre-commit config, linting, and type hints #13

Merged

Enable test duration writes on all runs #14

Merged

jerry-git mentioned this issue May 31, 2021

Update messaging in case splitting is skipped and check for UsageErrors more aggressively #12

Merged

jerry-git closed this as completed Jun 16, 2021

michamos mentioned this issue Jun 16, 2021

How to use --store-durations in Github Actions? #20

Open

jerry-git mentioned this issue Nov 27, 2021

[Actions] Auto-Update cookiecutter template #42

Merged

DanielKotik mentioned this issue Mar 3, 2022

Improve speed of cpu test workflow with GitHub's action strategy mala-project/mala#272

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support updating test durations when running for a single group #11

Support updating test durations when running for a single group #11

mbkroese commented May 5, 2021 •

edited

Loading

jerry-git commented May 5, 2021

mbkroese commented May 5, 2021 •

edited

Loading

jerry-git commented May 6, 2021

mbkroese commented May 6, 2021

jerry-git commented May 6, 2021

sondrelg commented May 28, 2021

sondrelg commented May 28, 2021

mbkroese commented May 28, 2021

jerry-git commented May 31, 2021 •

edited

Loading

sondrelg commented May 31, 2021

jerry-git commented Jun 16, 2021

Support updating test durations when running for a single group #11

Support updating test durations when running for a single group #11

Comments

mbkroese commented May 5, 2021 • edited Loading

jerry-git commented May 5, 2021

mbkroese commented May 5, 2021 • edited Loading

jerry-git commented May 6, 2021

mbkroese commented May 6, 2021

jerry-git commented May 6, 2021

sondrelg commented May 28, 2021

sondrelg commented May 28, 2021

mbkroese commented May 28, 2021

jerry-git commented May 31, 2021 • edited Loading

sondrelg commented May 31, 2021

jerry-git commented Jun 16, 2021

mbkroese commented May 5, 2021 •

edited

Loading

mbkroese commented May 5, 2021 •

edited

Loading

jerry-git commented May 31, 2021 •

edited

Loading