Don't run Valgrind in pull request jobs #81

gilles-peskine-arm · 2022-12-14T11:02:19Z

Valgrind takes a lot of CPU time. This is a problem on the CI in terms of timeouts and cost.

Asan and Msan mostly find the same things that Valgrind finds, but we've occasionally had bugs that Valgrind found but neither Asan nor Msan. So Valgrind has some usefulness, but not a lot.

Proposal up for discussion: run Valgrind only in nightly jobs, not in pull request jobs.

tom-daubney-arm · 2022-12-14T11:06:23Z

Do we have enough data to have an estimate of how frequently Valgrind finds something that Asan and Msan have missed?

gilles-peskine-arm · 2022-12-14T11:10:49Z

Unfortunately, no, we don't have data. CI logs take a lot of room so we purge them pretty quickly. This may appear in the Git history or in pull request comments, but finding relevant things would be difficult. Also, if a developer catches something locally, that wouldn't be reflected anywhere.

I do remember that a few years ago, we wondered about removing Valgrind testing completely, but decided against it because it had found something that no other tool found. That was about memory management. For constant-trace, I don't remember a bug that Valgrind found and Msan didn't, but we don't have a lot of constant-trace-tested code so we don't really have much experience.

tom-daubney-arm · 2022-12-14T11:24:23Z

Thanks for the information. So for me personally I think the threshold frequency of Valgrind-only bugs vs bugs found by all of the tools would have to be quite high (say more than 10%, arbitrary I know) to justify having Valgrind in every CI run, given the effect that running Valgrind has had on the CI. I think relegating Valgrind to run only on the nightly is an acceptable trade-off and therefore I support removing Valgrind from PR jobs.

mpg · 2022-12-14T11:24:55Z

The only case I remember is this bug in MSan where it has no problem with one of the argument points to memcpy() being derived from an uninitialized value, which I ran into when working on constant-flow Lucky 13 counter-measure (we also abuse MSan as a constant-flow checker the same way we do with valgrind), and is apparently unlikely to be fixed soon.

For what it's worth, it was not an actual bug that valgrind found and MSan missed, it was when I first wrote the tests before we had a constant-flow implementation, so I was expecting the tests to fail, and to my surprise one of them passed with MSan.

Anyway, I think it's fair to say that valgrind finding something MSan+ASan doesn't has been happening with frequency clearly less than once a year over the last few years.

daverodgman · 2022-12-14T12:11:33Z

IMO, the marginal gain from running Valgrind against every PR is extremely small (assuming we retain it in the nightlies). On the other hand, we have a pressing need to improve CI performance. So I am in favour of removing from the PR jobs.

gilles-peskine-arm added the needs design approval label Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't run Valgrind in pull request jobs #81

Don't run Valgrind in pull request jobs #81

gilles-peskine-arm commented Dec 14, 2022

tom-daubney-arm commented Dec 14, 2022

gilles-peskine-arm commented Dec 14, 2022

tom-daubney-arm commented Dec 14, 2022

mpg commented Dec 14, 2022 •

edited

Loading

daverodgman commented Dec 14, 2022

Don't run Valgrind in pull request jobs #81

Don't run Valgrind in pull request jobs #81

Comments

gilles-peskine-arm commented Dec 14, 2022

tom-daubney-arm commented Dec 14, 2022

gilles-peskine-arm commented Dec 14, 2022

tom-daubney-arm commented Dec 14, 2022

mpg commented Dec 14, 2022 • edited Loading

daverodgman commented Dec 14, 2022

mpg commented Dec 14, 2022 •

edited

Loading