Add ASAN/UBSAN recipe to just, make a CI that runs it + adding GDB #1474

tvami · 2024-09-24T01:02:35Z

I am updating ldmx-sw, here are the details.

What are the issues that this addresses?

Resolves #1043 (the last bit of it)

Check List

I successfully compiled ldmx-sw with my developments
I ran my developments and the following shows that they are successful.

I did

just configure-asan-ubsan
just fire .github/validation_samples/kaon_enhanced/config.py

give a lot of indirect memory leaks, which is fine, none of them are big. This is kinda expected since we fixed the ASAN / UBSAN problems in previous PRs.

Added it to the CI with the option w/o the leak sanitizer, so currently there is no output from ASAN/UBSAN (given that we cleaned all the issues up already).

Then

just configure-gdb
denv gdb build/run_test

this runs too, although I'm not sure how to actually make use of it for a real config file with fire.
I introduced

just debug

as well.

EinarElen · 2024-09-25T07:28:31Z

Thinking a bit more about this, do we need a separate sanitizer build? Why not just run all of the CI with a sanitized build?

tvami · 2024-09-25T14:20:00Z

Thinking a bit more about this, do we need a separate sanitizer build? Why not just run all of the CI with a sanitized build?

It takes forever... we usual test 10k events, I tried with 7.5k for the ecal pn, and was killed at 6 hours. I'm now converging to the point of only testing 1500 events which will take about an hour. (But for some reason I cant keep the ASAN output in the CI)

tomeichlersmith

One DRY comment that is optional and some guidance on writing the workflow.

.github/workflows/asan_ubsan_build.yml

justfile

EinarElen · 2024-09-25T16:09:41Z

Thinking a bit more about this, do we need a separate sanitizer build? Why not just run all of the CI with a sanitized build?

It takes forever... we usual test 10k events, I tried with 7.5k for the ecal pn, and was killed at 6 hours. I'm now converging to the point of only testing 1500 events which will take about an hour. (But for some reason I cant keep the ASAN output in the CI)

This sounds really really strange. The sanitizers should have an extremely small impact on performance

tvami · 2024-09-25T17:10:16Z

Thinking a bit more about this, do we need a separate sanitizer build? Why not just run all of the CI with a sanitized build?

It takes forever... we usual test 10k events, I tried with 7.5k for the ecal pn, and was killed at 6 hours. I'm now converging to the point of only testing 1500 events which will take about an hour. (But for some reason I cant keep the ASAN output in the CI)

This sounds really really strange. The sanitizers should have an extremely small impact on performance

Hmm, I dont know, here is the action that was killed after 6 hours https://github.com/LDMX-Software/ldmx-sw/actions/runs/11007030465 if it helps

tvami · 2024-09-25T19:18:41Z

Ok the output is being saved now
https://github.com/LDMX-Software/ldmx-sw/actions/runs/11038784203/artifacts/1978699120
but it still doesnt have what I want... I'll play with it more later

tvami · 2024-09-26T01:29:27Z

(sorry I'm going back and forth between draft and ready -- that's my only way to trigger tests until it's merged [after that we'll be able to do this from the Actions with the dispatch options])

tvami · 2024-09-26T02:43:56Z

Hmm, I dont understand, in local the ASAN output is in stderr. But in the action I only get

https://github.com/LDMX-Software/ldmx-sw/actions/runs/11044190876

denv fire /home/runner/work/ldmx-sw/ldmx-sw/.github/validation_samples/ecal_pn/config.py 
==5==LeakSanitizer has encountered a fatal error.
==5==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==5==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
error: Recipe `fire` failed on line 95 with exit code 1

even tho LSAN_OPTIONS=verbosity=1:log_threads=1 are enabled... (which btw I dont need when I run locally)

EinarElen · 2024-09-26T05:09:04Z

That seems strange... But the simulation finishes and the leak sanitizer issue is only at the end? I'm still a bit skeptical to the value of running leaksanitizer in CI for us so I might just turn it off isntead

tvami · 2024-09-26T14:19:30Z

But the simulation finishes and the leak sanitizer issue is only at the end?

I'm running ASAN + UBSAN, and they somehow invoke LSAN too. And yes, they only show up in the end, bc of the setting to dont stop on issues (which is a default set by you @EinarElen I believe, but correct me if I'm wrong [given that we fixed all the issues, it's hard to tell]).

I can be convinced either way about the leak part. But the issue about github dealing with stderr from ASAN/UBSAN is not going to be connected to the leak part. Maybe @tomeichlersmith has an idea about the environment settings on the machines used by the actions.

tomeichlersmith · 2024-09-26T14:39:30Z

The runners do not swallow anything from stdout or stderr, but they do not have a TTY and so programs that are "smart" and disable certain behaviors in a non-interactive environment (mainly determined by detecting if a TTY is present) will disable those behaviors.

You can mimic running without a TTY to see if that is the issue.

https://superuser.com/a/1430883

EinarElen · 2024-09-26T16:14:36Z

LSAN is a part of ASAN so it gets enabled by default. You turn if off with ASAN_OPTIONS=detect_leaks=0

tvami · 2024-09-26T17:48:44Z

You can mimic running without a TTY to see if that is the issue.

so none of this managed to reproduce what I see in the actions, but I think I found a workaround with specifying the ASAN output, I'll test that now

tvami · 2024-09-26T19:04:35Z

OK I think I'm ready now, this is the module operandi I propose: we dont save the log for now (it didnt work even with specifying the output for ASAN), we dont run the leak part --> this means currently there is no issues. Then we run this every time a new PR comes in, and if any of those introduce a problem, this test will fail. It will not show what the issue is (given the output issue), but it will show that there is an issue, so we can check out the branch locally and run this and we'll see the problem.

Co-authored-by: Tom Eichlersmith <[email protected]>

tvami · 2024-09-26T20:45:20Z

ok so w/o the the leak part, it doesnt actually take extremely long: 1500 events took 30 min, so the 10k would take 3.3 hours. That's still more than the usual 1.5-2h time. I could increase this to 7500. I think the ecal_pn at this point runs all kinda processors so we are safe to just run that with ASAN/UBSAN.

tvami · 2024-09-26T21:11:51Z

OK, tests are fine, @tomeichlersmith @EinarElen I wont push anymore, please approve if you agree with the changes

.github/workflows/asan_ubsan_build.yml

justfile

.github/workflows/asan_ubsan_build.yml

justfile

tvami marked this pull request as draft September 24, 2024 04:31

tvami marked this pull request as ready for review September 24, 2024 04:31

tvami marked this pull request as draft September 24, 2024 14:30

tvami marked this pull request as ready for review September 24, 2024 14:30

tvami marked this pull request as draft September 24, 2024 18:05

tvami marked this pull request as ready for review September 25, 2024 04:29

tvami marked this pull request as draft September 25, 2024 04:31

tvami marked this pull request as ready for review September 25, 2024 04:36

tomeichlersmith reviewed Sep 25, 2024

View reviewed changes

.github/workflows/asan_ubsan_build.yml Outdated Show resolved Hide resolved

.github/workflows/asan_ubsan_build.yml Show resolved Hide resolved

justfile Outdated Show resolved Hide resolved

tvami marked this pull request as draft September 25, 2024 18:16

tvami marked this pull request as ready for review September 25, 2024 18:16

tvami marked this pull request as draft September 25, 2024 19:17

tvami marked this pull request as ready for review September 26, 2024 01:25

tvami marked this pull request as draft September 26, 2024 01:28

tvami marked this pull request as ready for review September 26, 2024 02:00

tvami marked this pull request as draft September 26, 2024 02:00

tvami marked this pull request as ready for review September 26, 2024 17:48

tvami marked this pull request as draft September 26, 2024 18:02

tvami marked this pull request as draft September 26, 2024 18:30

tvami marked this pull request as ready for review September 26, 2024 19:01

tvami requested review from tomeichlersmith and EinarElen September 26, 2024 19:04

tvami and others added 13 commits September 26, 2024 12:23

Add ASAN/UBSAN recipe to just, make a CI that runs it

995310d

Add env variables to the CI test

e3e5611

Move env to next block in CI

2714182

Export env variables directly

73861c0

Decrease evnt number so it finishes in a reasonable time

4b3ecff

Have ASAN/UBSAN stderr in an artifact file

25145ad

Move to v4 for upload-artifact

8cd326d

Apply suggestions from code review by Tom

8181652

Co-authored-by: Tom Eichlersmith <[email protected]>

Test with 15 events

f49af91

Change LSAN options

4f7364b

Specify output filename for ASAN

de25fa5

Dont save the log, dont run LSAN, fail on failure

251f039

Increase event number to 1500

b118343

tvami force-pushed the iss1043-ASAN-UBSAN-GDB branch from 79c2089 to b118343 Compare September 26, 2024 19:23

tvami marked this pull request as draft September 26, 2024 19:23

tvami marked this pull request as ready for review September 26, 2024 19:23

Increase event number to 7500

846e2bb

tomeichlersmith approved these changes Sep 26, 2024

View reviewed changes

EinarElen approved these changes Sep 26, 2024

View reviewed changes

.github/workflows/asan_ubsan_build.yml Outdated Show resolved Hide resolved

justfile Show resolved Hide resolved

.github/workflows/asan_ubsan_build.yml Show resolved Hide resolved

tvami commented Sep 27, 2024

View reviewed changes

justfile Outdated Show resolved Hide resolved

Call gdb debug debug instead of gdb-fire

ce526eb

tvami merged commit 8efa112 into trunk Sep 27, 2024
3 checks passed

tvami deleted the iss1043-ASAN-UBSAN-GDB branch September 27, 2024 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ASAN/UBSAN recipe to just, make a CI that runs it + adding GDB #1474

Add ASAN/UBSAN recipe to just, make a CI that runs it + adding GDB #1474

tvami commented Sep 24, 2024 •

edited

Loading

EinarElen commented Sep 25, 2024

tvami commented Sep 25, 2024

tomeichlersmith left a comment

EinarElen commented Sep 25, 2024

tvami commented Sep 25, 2024

tvami commented Sep 25, 2024

tvami commented Sep 26, 2024

tvami commented Sep 26, 2024 •

edited

Loading

EinarElen commented Sep 26, 2024

tvami commented Sep 26, 2024

tomeichlersmith commented Sep 26, 2024

EinarElen commented Sep 26, 2024

tvami commented Sep 26, 2024

tvami commented Sep 26, 2024

tvami commented Sep 26, 2024

tvami commented Sep 26, 2024

Add ASAN/UBSAN recipe to just, make a CI that runs it + adding GDB #1474

Add ASAN/UBSAN recipe to just, make a CI that runs it + adding GDB #1474

Conversation

tvami commented Sep 24, 2024 • edited Loading

What are the issues that this addresses?

Check List

EinarElen commented Sep 25, 2024

tvami commented Sep 25, 2024

tomeichlersmith left a comment

Choose a reason for hiding this comment

EinarElen commented Sep 25, 2024

tvami commented Sep 25, 2024

tvami commented Sep 25, 2024

tvami commented Sep 26, 2024

tvami commented Sep 26, 2024 • edited Loading

EinarElen commented Sep 26, 2024

tvami commented Sep 26, 2024

tomeichlersmith commented Sep 26, 2024

EinarElen commented Sep 26, 2024

tvami commented Sep 26, 2024

tvami commented Sep 26, 2024

tvami commented Sep 26, 2024

tvami commented Sep 26, 2024

tvami commented Sep 24, 2024 •

edited

Loading

tvami commented Sep 26, 2024 •

edited

Loading