Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculate mutation score only of changed lines of code in a file from pull requests #3077

Open
shivambagadia23 opened this issue Oct 22, 2024 · 3 comments
Labels
🚀 Feature request New feature or request

Comments

@shivambagadia23
Copy link

Is your feature request related to a problem? Please describe.
I wanted to use stryker in the pipeline such that when someone raises a new Pull Request stryker calculates mutation score of only the changed lines of code and not the entire file. By doing this, I want to enfore a restriction that if the mutation score of the changed lines in the PR is less than say 80% the build should fail. I tried baseline and since feature but both does not calculate mutation score based on only the changed lines instead it calculates mutation score for entire changed file which doesn't work in my case.

Describe the solution you'd like
I would want a simple config that when passed in stryker-config or command line takes only the changed lines of code and returns the mutation score for the changed lines only.

Describe alternatives you've considered
I have consideried using -m config and gave line span manually by giving -m filename.cs{line1..line2} but even this does not work for me. neither does the since or baseline feature works as they calculate mutation score for entire file chnaged.

Additional context
Add any other context or screenshots about the feature request here.

@shivambagadia23 shivambagadia23 added the 🚀 Feature request New feature or request label Oct 22, 2024
@dupdob
Copy link
Member

dupdob commented Oct 23, 2024

Hi
I have been thinking about this for the past few days and this raises several questions and remarks.

  1. I think you want to make sure PRs do not degrade the overall mutation score. Or maybe even you want them to improve the current score. Correct ?

  2. Bear in mind that the mutation score is related to the test base and not to rhe production code.

  3. When you talk about 'changed lines', are you referring to 'production code' lines ? Or 'test lines' or both ?

  4. If we compute the score only for the new/modifies mutants, you will not the expected guarantee: people could add and remove tests that so do not cover the modified lines. This would impact the overall score but not the marginal one.

  5. Conversely, focusing only on ´test lines' will result of changes in the production code' not impacting the marginal score.

  6. So Stryker needs to factor both. That's why it needs to retest every mutants that is covered by a changed test as well as any new or modified mutation.

As said earlier, filtering this result only for new or modified mutations would yield incorrect result.

That being said, the current logic still relies on a coarse granularity logic (file level). As such, we may look into on how to refine this granularity. We can get some line level diff level info, but this will remain more difficult to leverage for mutation coverage than it is for line coverage.

@shivambagadia23
Copy link
Author

Thanks for your prompt response @dupdob

  1. Thats correct. I want to analyse mutation score on every PR on the basis of changed lines in that PR and would want to fail it if mutation score is less than 90%. This I don't think would degarde the overall mutation score instead would help us to analyse and write unit test cases better.
  2. Yes, I did consider that assumption.
  3. I would want to check mutation score on both the changed lines of test or my production code.
  4. If people remove test cases that would be caught during PR review and that isn't my major concern. My major concern is to integrate mutation in the pipeline and make it work like how sonarQube works detecting only on the changed lines of code and checking the coverage.
  5. No, we need to focus on both test lines and production code. Not one.
  6. You can take into factor both but why only on file level?..can't this be done on line level instead. Maybe I am missing some complexity or incorrectly understood few things. But, I would want to understand this better on why can't mutation score be calculated on line level instead of file level

I do not get why would the mutation score be incorrect considering the above factors. I am happy to discuss and understand more with you on this.

@dupdob
Copy link
Member

dupdob commented Oct 29, 2024

To be precise: when I say that marginal scores (line coverage or mutations), I mean you cannot accurately determine the coverage score for the whole source base using only baseline and marginal scores.
For example: a baseline score of 100% and a marginal score of 100% do not guarantee a current score of 100%.
This is due that one neglects the impact on the flow control inside the code, which can be massively different due to the change. Here is a trivial, albeit extreme, example:
baseline:

void SomeComplexMethod(....)
{
....
}

which is 100% covered.
modified to:

void SomeComplexMethod(....)
{
if (somethingAlwaysTrue)
 return;
....
}

Modified lines are:

if (somethingAlwaysTrue)
 return;

Line coverage for this is 100%. Baseline coverage is 100%, but if you test for the whole codebase, you actually discover that none of the original code lines are actually covered.
Obviously, it would be the same for mutations: none of the mutations of the baseline versions are covered.

Yes, my example is extreme, and you could argue that unit test should be broken after such an important change.
But I am confident I could build a more life like example if I have motivation and time.

Just to be clear, I am not against using marginal coverage as an attribute. I just wanted to stress out its limitations.

Now, focusing on the feature request:

  1. Assuming we can get reliable line diff information, Stryker can filter mutations according to their locations. I think the criteria should be: every mutations that contains changed lines. That can be done
  2. Regarding tests: we can identify to which test belongs the changed lines (except for pre V5 framework for some unknown reason), but we can't indentify the impacts if lines are changed elsewhere in the source file. This means any change in test attributes, test data providing method or helper functions have non predictible impact. That's why the current logic is to assume evry test in the source file as being changed and that associated mutations need to be retested.

The part regarding test is important, because this means that there will always be a significant number of 'non changed' mutant that will be reevaluated, which means your marginal score will be significantly skewed by your baseline score.
In other words, adding a small non covered change in a code base with a coverage of 100% may result in a marginal score of 99% or more. Conversely, a large fully covered change in code base with 0% coverage may result in a marginal score of 1% or less.

@dupdob dupdob added Priority: Low An annoyance. Not of importance, choose whenever be fixed and removed Priority: Low An annoyance. Not of importance, choose whenever be fixed labels Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🚀 Feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants