Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't run Valgrind in pull request jobs #81

Open
gilles-peskine-arm opened this issue Dec 14, 2022 · 5 comments
Open

Don't run Valgrind in pull request jobs #81

gilles-peskine-arm opened this issue Dec 14, 2022 · 5 comments

Comments

@gilles-peskine-arm
Copy link
Contributor

Valgrind takes a lot of CPU time. This is a problem on the CI in terms of timeouts and cost.

Asan and Msan mostly find the same things that Valgrind finds, but we've occasionally had bugs that Valgrind found but neither Asan nor Msan. So Valgrind has some usefulness, but not a lot.

Proposal up for discussion: run Valgrind only in nightly jobs, not in pull request jobs.

@tom-daubney-arm
Copy link

Do we have enough data to have an estimate of how frequently Valgrind finds something that Asan and Msan have missed?

@gilles-peskine-arm
Copy link
Contributor Author

Unfortunately, no, we don't have data. CI logs take a lot of room so we purge them pretty quickly. This may appear in the Git history or in pull request comments, but finding relevant things would be difficult. Also, if a developer catches something locally, that wouldn't be reflected anywhere.

I do remember that a few years ago, we wondered about removing Valgrind testing completely, but decided against it because it had found something that no other tool found. That was about memory management. For constant-trace, I don't remember a bug that Valgrind found and Msan didn't, but we don't have a lot of constant-trace-tested code so we don't really have much experience.

@tom-daubney-arm
Copy link

Thanks for the information. So for me personally I think the threshold frequency of Valgrind-only bugs vs bugs found by all of the tools would have to be quite high (say more than 10%, arbitrary I know) to justify having Valgrind in every CI run, given the effect that running Valgrind has had on the CI. I think relegating Valgrind to run only on the nightly is an acceptable trade-off and therefore I support removing Valgrind from PR jobs.

@mpg
Copy link
Contributor

mpg commented Dec 14, 2022

The only case I remember is this bug in MSan where it has no problem with one of the argument points to memcpy() being derived from an uninitialized value, which I ran into when working on constant-flow Lucky 13 counter-measure (we also abuse MSan as a constant-flow checker the same way we do with valgrind), and is apparently unlikely to be fixed soon.

For what it's worth, it was not an actual bug that valgrind found and MSan missed, it was when I first wrote the tests before we had a constant-flow implementation, so I was expecting the tests to fail, and to my surprise one of them passed with MSan.

Anyway, I think it's fair to say that valgrind finding something MSan+ASan doesn't has been happening with frequency clearly less than once a year over the last few years.

@daverodgman
Copy link
Contributor

IMO, the marginal gain from running Valgrind against every PR is extremely small (assuming we retain it in the nightlies). On the other hand, we have a pressing need to improve CI performance. So I am in favour of removing from the PR jobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants